1 |
\section{Pre-challenge Monte Carlo Production}
|
2 |
|
3 |
The Monte Carlo Production for CSA06 started in mid July. The original aim was
|
4 |
to produce 50 million events in total to be used as input for prompt
|
5 |
reconstruction.\\
|
6 |
|
7 |
Four teams from CIEMAT, DESY/RWTH, INFN/Bari, and University of
|
8 |
Wisconsin, Madison, volunteered to run production using the Production
|
9 |
Agent for the first time at large scale. All production related
|
10 |
job submissions were Grid-based, we refrained from using local
|
11 |
submissions entirely. After a short ramp-up when sites prepared for
|
12 |
production (e.g. most of the sites received the CMSSW software via
|
13 |
a centrally managed installation mechanism, while a few managed the
|
14 |
installation manually) a total of 28 sites offered resources for the
|
15 |
pre-production step.
|
16 |
|
17 |
Table~\ref{tab:prechallenge} shows the datasets by event category and the associated
|
18 |
number of events that were requested and actually produced. All four teams started production with the simulation of minimum bias
|
19 |
events.
|
20 |
|
21 |
|
22 |
\begin{table}[htb]
|
23 |
\centering
|
24 |
\caption{CSA06 Pre-challenge Production by event category.}
|
25 |
\label{tab:prechallenge}
|
26 |
\vspace{3mm}
|
27 |
\begin{tabular}{|l|l|r|r|}
|
28 |
\hline
|
29 |
CMSSW & & Nb Events produced & Nb Events requested \\
|
30 |
\hline
|
31 |
0\_8\_1 & minbias & 39.8 & 25.0 \\
|
32 |
0\_8\_2 & TTbar & 5.8 & 5.0 \\
|
33 |
0\_8\_2 & Zmumu & 2.2 & 2.0 \\
|
34 |
0\_8\_3 & Wenu & 4.6 & 4.0 \\
|
35 |
0\_8\_3 & SoftMuon & 2.0 & 2.0 \\
|
36 |
0\_8\_3 & EWK Soup & 5.6 & 5.0 \\
|
37 |
0\_8\_3 & Jets & 1.2 & 1.2 \\
|
38 |
0\_8\_3 & Exo Soup & 1.0 & 1.0 \\
|
39 |
0\_8\_4 & HLT Soup & 5.0 & 5.0 \\
|
40 |
\hline
|
41 |
& Total & 67.2 & 50.2 \\
|
42 |
\hline
|
43 |
\end{tabular}
|
44 |
\end{table}
|
45 |
|
46 |
|
47 |
The average event processing time observed on a 3.6GHz Xeon processor and the
|
48 |
event size is shown in Figure~\ref{fig:minbias-processing-performance}.
|
49 |
|
50 |
\begin{figure}[htp]
|
51 |
\begin{center}
|
52 |
$\begin{array}{c@{\hspace{1in}}c}
|
53 |
\includegraphics[width=0.4\linewidth]{figs/Pre-prod-minbias.pdf} &
|
54 |
\includegraphics[width=0.4\linewidth]{figs/Pre-prod-minbias-size.pdf} \\ [-0.53cm]
|
55 |
\end{array}$
|
56 |
\end{center}
|
57 |
\caption{Minimum Bias Event processing time and Event size, where
|
58 |
some jobs have 1000 events/job and some less than 500 events/job.}
|
59 |
\label{fig:minbias-processing-performance}
|
60 |
\end{figure}
|
61 |
|
62 |
More complex signal events like those associated with the TTbar data sample take
|
63 |
significant more time to simulate. According to the experience, it required
|
64 |
about 4 minutes per event to complete.
|
65 |
|
66 |
Regarding the job submission strategy, the teams found that the information about
|
67 |
resource usage at sites as it is published by the Grid Information System is not
|
68 |
useful to build the ranking since it is lacking the job information associated
|
69 |
with the particular VO role. Therefore a static ranking was used that was built
|
70 |
according to the available resources as they were discovered by the ProdAgent's
|
71 |
Job Tracking component.
|
72 |
|
73 |
With an average of up to 100 jobs/hour, per agent the performance of job submission
|
74 |
by the ProdAgent was rather low. With the level of resources available to the
|
75 |
teams it took a day or more until all CPUs could be utilized.
|
76 |
Given the anticipated scale of production for CSA06 and the fact that there were
|
77 |
four teams running two instances of ProdAgent each this was not a problem for the
|
78 |
CSA pre-production, however needs to be taken into account for future production
|
79 |
activities and is an area that needs to be improved. Moving to the new gLite
|
80 |
resource Broker with its bulk submission feature may help to some extent, but
|
81 |
this is certainly not the only area that needs to be looked at.
|
82 |
|
83 |
\begin{figure}[htp]
|
84 |
\begin{center}
|
85 |
\includegraphics[width=0.6\linewidth]{figs/Pre-prod-submission-rate.pdf}
|
86 |
\end{center}
|
87 |
\caption{ProdAgent job submission rate}
|
88 |
\label{fig:submission-rate}
|
89 |
\end{figure}
|
90 |
|
91 |
Rather than using the output sandbox for the produced data, files are staged out
|
92 |
to the local Storage Element (SE). The performance of the process copying files
|
93 |
from the the Worker Node disk to the SE, illustrated by
|
94 |
Figure~\ref{fig:stage-out-performance}, is very good for all the prominent SE's
|
95 |
CMS is using at sites (Castor, dCache and DPM).
|
96 |
|
97 |
\begin{figure}[htp]
|
98 |
\begin{center}
|
99 |
\includegraphics[width=0.6\linewidth]{figs/Pre-prod-stage-out.pdf}
|
100 |
\end{center}
|
101 |
\caption{Local stage-out performance}
|
102 |
\label{fig:stage-out-performance}
|
103 |
\end{figure}
|
104 |
|
105 |
As was reported by the production teams, early production was affected by
|
106 |
instabilities in the JobTracking and MergeSensor components of ProdAgent and
|
107 |
required continuous attention by the operators. Fortunately the problems
|
108 |
were solved in late August/early September.
|
109 |
|
110 |
Regarding operational problems one that turned out to be common across almost
|
111 |
all participating sites was with data access between the farm of Worker Nodes
|
112 |
and the local SE, i.e. for stage-out and in particular the merge process.
|
113 |
Given the many processes running in parallel the latter has shown to stress
|
114 |
some of the deployed SE's up to their limit. It is therefore important to
|
115 |
maintain a suitable CPU to storage access bandwidth ratio.
|
116 |
|
117 |
To help operate the ProdAgent more efficiently people from INFN Bari
|
118 |
developed a monitoring tool that allows a comprehensive overview
|
119 |
of the current state and access to log files
|
120 |
from a single web page. A screenshot is shown in Figure~\ref{fig:pa-monitoring} and~\ref{fig:pa-monitoring-1}.
|
121 |
|
122 |
\begin{figure}[htp]
|
123 |
\begin{center}
|
124 |
\includegraphics[width=0.95\linewidth]{figs/Pre-prod-PA-mon.pdf}
|
125 |
\end{center}
|
126 |
\caption{ProdAgent monitoring tool developed by INFN/Bari}
|
127 |
\label{fig:pa-monitoring}
|
128 |
\end{figure}
|
129 |
|
130 |
\begin{figure}[htp]
|
131 |
\begin{center}
|
132 |
\includegraphics[width=0.95\linewidth]{figs/Pre-prod-PA-mon-1.pdf}
|
133 |
\end{center}
|
134 |
\caption{Bari ProdAgent monitoring summarizing important production job information}
|
135 |
\label{fig:pa-monitoring-1}
|
136 |
\end{figure}
|