1 |
\section{Tier-1 and Tier-2 Operations}
|
2 |
|
3 |
\subsection{Data Transfers}
|
4 |
|
5 |
The Tier-1 centers were expected to receive data from CERN at a rate
|
6 |
proportional to the 25\% of the 2008 pledge rate and serve the data to
|
7 |
Tier-2 centers. The expected rate into the Tier-1 centers is shown in
|
8 |
Table~\ref{tab:tier01pledge}. Note that while the listed rates are
|
9 |
significantly less than the bandwidth to the WAN (see
|
10 |
Table~\ref{tab:tier1resources}), they fit within the storage
|
11 |
capability available for a 30 day challenge.
|
12 |
|
13 |
\begin{table}[htb]
|
14 |
\centering
|
15 |
\caption{Expect transfer rates from CERN to Tier-1 centers based on the MOU pledges.}
|
16 |
\label{tab:tier01pledge}
|
17 |
\begin{tabular}{|l|l|l|}
|
18 |
\hline
|
19 |
Site & Goal Rates (MB/s) & Threshold Rates (MB/s) \\
|
20 |
\hline
|
21 |
ASGC & 15 & 7.5 \\
|
22 |
CNAF & 25 & 12.5 \\
|
23 |
FNAL & 50 & 25 \\
|
24 |
GridKa & 25 & 12.5 \\
|
25 |
IN2P3 & 25 & 12.5 \\
|
26 |
PIC & 10 & 5 \\
|
27 |
RAL & 10 & 5 \\
|
28 |
\hline
|
29 |
\end{tabular}
|
30 |
\end{table}
|
31 |
|
32 |
The Tier-2 centers are expected in the computing model \cite{model, ctdr} to transfer
|
33 |
from the Tier-1 centers in bursts. The goal rate in CSA06 was 20MB/s,
|
34 |
with a threshold for success of 5MB/s. Achieving these metrics in the
|
35 |
computing model was defined as hitting the transfer rate for a 24 hour
|
36 |
period. At the beginning of CSA06 CMS concentrated primarily on
|
37 |
moving data from the ``associated'' Tier-1 centers to the Tier-2s. By
|
38 |
the end of the challenge most of the Tier-1 to Tier-2 permutations had
|
39 |
been attempted.
|
40 |
|
41 |
The total data transferred between sites in CSA06 is shown in
|
42 |
Figure~\ref{fig:totaltran}. This plot only includes wide area data
|
43 |
transfers, additionally data was moved onto tape at the majority of
|
44 |
Tier-1 centers. Over the 45 days of the challenge CMS was able to
|
45 |
move more than 1 petabyte of data over the wide area.
|
46 |
|
47 |
\begin{figure}[htp]
|
48 |
\begin{center}
|
49 |
\includegraphics[width=0.7\linewidth]{figs/CSA06_CumTran}
|
50 |
\end{center}
|
51 |
\caption{The cumulative data volume transferred during CSA06 in TB.}
|
52 |
\label{fig:totaltran}
|
53 |
\end{figure}
|
54 |
|
55 |
Timeline:
|
56 |
\begin{itemize}
|
57 |
|
58 |
\item October 2, 2006: The Tier-0 to Tier-1 transfers began on the
|
59 |
first day of the challenge. In the first few hours 6 of 7 Tier-1
|
60 |
centers successfully received data. During the first week only
|
61 |
minimum bias was reconstructed and at 40Hz the total rate out of the
|
62 |
CERN site does not meet the 150MB/s target rate.
|
63 |
|
64 |
\item October 3, 2006: All 7 Tier-1 sites were able to successfully
|
65 |
receive data and 8 Tier-2 centers were subscribed to data samples:
|
66 |
Belgium IIHE, UC San Diego, Wisconsin, Nebraska, DESY, Aachen, and
|
67 |
Estonia. There were successful transfers to 6 Tier-2 sites.
|
68 |
|
69 |
\item October 4, 2006: An additional 11 Tier-2 sites were subscribed
|
70 |
to data samples: Pisa, Purdue, CIEMAT, Caltech, Florida, Rome, Bari,
|
71 |
CSCS, IHEP, Belgium UCL, and Imperial College. Of the 19 registered
|
72 |
Tier-2 sites, 12 were able to receive data. Of those, 5 exceeded the
|
73 |
goal transfer rates for over an hour, and an additional 3 were over
|
74 |
the threshold rate.
|
75 |
|
76 |
\item October 5, 2006: Three additional Tier-2s were added increasing
|
77 |
the number of participating sites above the goal rate of 20 Tier-2
|
78 |
centers. New hardware installed at IN2P3 for CSA06 began to exhibit
|
79 |
stability problems leading to poor transfer efficiency.
|
80 |
|
81 |
\item October 9, 2006: RAL transitioned from a dCache SE to a Castor2
|
82 |
SE. The signal samples began being reconstructed at the Tier-0.
|
83 |
|
84 |
\item October 10-12, 2006: The Tier-1 sites had stable operations
|
85 |
through the week at an aggregate rate of approximately 100MB/s from
|
86 |
CERN. IFCA has joined the Tier-1 to Tier-2 transfers and their average
|
87 |
transfer rate over the day was observed at 14MB/s at a low error
|
88 |
rate.
|
89 |
|
90 |
\item October 13, 2006: Multiple subscriptions of the minimum bias
|
91 |
samples were made to some of the Tier-1 centers to increase the total
|
92 |
rate of data transfer from CERN. The number of participating Tier-2
|
93 |
sites increased to 23.
|
94 |
|
95 |
\item October 18, 2006: The PhEDEx transfer system held a lock in the
|
96 |
Oracle database which blocked other agents from continuing with
|
97 |
transfers. This problem appeared more frequently in the latter half
|
98 |
of the challenge when the load was higher.
|
99 |
|
100 |
\item October 20, 2006: The reconstruction rate was increased at the
|
101 |
Tier-0 to improve the output from CERN and to better exercise the
|
102 |
prompt reconstruction farm. The data rate from CERN approximately
|
103 |
doubles. An average rate over an hour of 600MB/s from CERN was
|
104 |
achieved.
|
105 |
|
106 |
\item October 25, 2006: The transfer rate from CERN was large with
|
107 |
daily average rates of 250MB/s-300MB/s. The first observation of
|
108 |
transfer backlogs begin to appear.
|
109 |
|
110 |
\item October 30, 2006: Data reconstruction at the Tier-0 stopped.
|
111 |
|
112 |
\item October 31, 2006: PIC and ASGC finished transferring the assigned prompt reconstruction data from CERN.
|
113 |
|
114 |
\item November 2, 2006: FNAL and IN2P3 also completed the transfers.
|
115 |
|
116 |
\item November 3, 2006: RAL completed the transfers. The first of the
|
117 |
Tier-1 to any Tier-2 transfer validation began. The test involved
|
118 |
sending a small sample from a Tier-1 site to a validated Tier-2, in
|
119 |
the test case DESY, and then sending a small sample to all Tier-2
|
120 |
sites.
|
121 |
|
122 |
\item November 5, 2006: CNAF completed the Tier-0 transfers
|
123 |
|
124 |
\item November 6, 2006: The Tier-1 to Tier-2 transfer testing continued.
|
125 |
|
126 |
\item November 9, 2006: GridKa completed the Tier-0 transfers
|
127 |
|
128 |
\end{itemize}
|
129 |
|
130 |
|
131 |
\subsubsection{Transfers to Tier-1 Centers}
|
132 |
|
133 |
During CSA06 the Tier-1 centers met the transfer rate goals. In the
|
134 |
first week of the challenge using minimum bias events the total volume
|
135 |
of data out of CERN did not amount to 150MB/s unless the datasets were
|
136 |
subscribed to multiple sites. After the reconstruction rate was
|
137 |
increased at the Tier-0 the transfer rate easily exceeded the 150MB/s
|
138 |
target. The 30 day and 15 day averages are shown in
|
139 |
Table~\ref{tab:tier01csa06}. For the thirty day average all sites
|
140 |
except two exceeded the goal rate and for the final 15 days all sites
|
141 |
easily exceed the goal. Several sites doubled and tripled the goal
|
142 |
rate during the final two weeks of high volume transfers.
|
143 |
|
144 |
The WLCG metric for availability this year is 90\% for the Tier-1
|
145 |
sites. If we apply this to the Tier-1 participating in CSA06
|
146 |
transfers we have 6 of 7 Tier-1s reaching the availability goal.
|
147 |
|
148 |
\begin{table}[htb]
|
149 |
\caption{Transfer rates during CSA06 between CERN and Tier-1 centers and the number of outage days during the active challenge activities. In the MSS column the parentheses indicates the site either had scaling issues keeping up with the total rate to tape, or transferred only a portion of the data to tape.}
|
150 |
\label{tab:tier01csa06}
|
151 |
\begin{tabular}{|l|r|r|r|r|c|}
|
152 |
\hline
|
153 |
Site & Anticipated Rate (MB/s) & last 30 day average & last 15 day average & Outage (Days) & MSS used \\
|
154 |
\hline
|
155 |
ASGC & 15MB/s & 17MB/s & 23MB/s & 0 & (Yes) \\
|
156 |
CNAF & 25MB/s & 26MB/s & 37MB/s & 0 & (Yes) \\
|
157 |
FNAL & 50MB/s & 68MB/s & 98MB/s & 0 & Yes \\
|
158 |
GridKa & 25MB/s & 23MB/s & 28MB/s & 3 & No \\
|
159 |
IN2P3 & 25MB/s & 23MB/s & 34MB/s & 1 & Yes \\
|
160 |
PIC & 10MB/s & 22MB/s & 33MB/s & 0 & No \\
|
161 |
RAL & 10MB/s & 23MB/s & 33MB/s & 2 & Yes \\
|
162 |
\hline
|
163 |
\end{tabular}
|
164 |
\end{table}
|
165 |
|
166 |
|
167 |
The rate of data transferred averaged over 24 hours and the volume of
|
168 |
data transferred in 24 hours are shown in Figures~\ref{fig:tier01rate}
|
169 |
and~\ref{fig:tier01vol}. The start of the transfers during the first
|
170 |
week is visible on the left side of the plot as well as the transfers
|
171 |
not reaching the target rate shown as a horizontal red bar. The twin
|
172 |
peaks in excess of 300MB/s and 25TB of data moved correspond to the
|
173 |
over-subscription of data. The bottom of the graph has indicators of
|
174 |
the approximate Tier-0 reconstruction rate. Both the rate and the
|
175 |
volume figures show clearly the point when the Tier-0 trigger rate was
|
176 |
doubled to 100Hz. The daily average exceeded 350MB/s with more than
|
177 |
30TB moved. The hourly averages from CERN peaked at more than
|
178 |
650MB/s.
|
179 |
|
180 |
\begin{figure}[htp]
|
181 |
\begin{center}
|
182 |
\includegraphics[width=0.7\linewidth]{figs/Tier01rate}
|
183 |
\end{center}
|
184 |
\caption{The rate of data transferred between the Tier-0 to the Tier-1 centers in MB per second.}
|
185 |
\label{fig:tier01rate}
|
186 |
\end{figure}
|
187 |
|
188 |
|
189 |
\begin{figure}[htp]
|
190 |
\begin{center}
|
191 |
\includegraphics[width=0.7\linewidth]{figs/Tier01vol}
|
192 |
\end{center}
|
193 |
\caption{The total volume of data transferred between the Tier-0 to the Tier-1 centers in TB per day.}
|
194 |
\label{fig:tier01vol}
|
195 |
\end{figure}
|
196 |
|
197 |
The transferrable volume plot shown in Figure~\ref{fig:tier01queue} is an
|
198 |
indicator of how well the sites are keeping up with the volume of data
|
199 |
from the Tier-0 reconstruction farm. During the first three weeks of
|
200 |
the challenge almost no backlog of files is accumulated by the Tier-1
|
201 |
centers. A hardware failure at IN2P3 resulted is a small
|
202 |
accumulation. The additional data subscriptions leads to a spike in
|
203 |
data to transfer, but is quickly cleared by the Tier-1 sites. The
|
204 |
most significant volumes of data waiting for transfer come at the end
|
205 |
of the challenge. During this time GridKa has performed a dCache
|
206 |
storage upgrade that resulted in a large accumulation of data to
|
207 |
transfer. CNAF suffered a file server problem that reduced the amount
|
208 |
of available hardware. Additionally RAL turned off the import system
|
209 |
for two days over a weekend to demonstrate the ability to recover from
|
210 |
a service interruption. The Tier-1 issues combined with PhEDEx
|
211 |
database connection interruptions under the heavy load of the final
|
212 |
week of transfers to accumulate a backlog of approximately 50TB over
|
213 |
the final days of the heavy challenge transfers. During this time
|
214 |
CERN continued to serve data at 350MB/s on average.
|
215 |
|
216 |
|
217 |
\begin{figure}[htp]
|
218 |
\begin{center}
|
219 |
\includegraphics[width=0.7\linewidth]{figs/Tier01queue}
|
220 |
\end{center}
|
221 |
\caption{The total volume of data waiting for transfer between the Tier-0 to the Tier-1 centers in TB per day.}
|
222 |
\label{fig:tier01queue}
|
223 |
\end{figure}
|
224 |
|
225 |
The CERN to Tier-1 transfer quality is shown in
|
226 |
Figure~\ref{fig:tier01qual}. In CMS the transfer quality is defined
|
227 |
as the number of times a transfer has to be attempted before it
|
228 |
successfully completes. The link between two sites with 100\%
|
229 |
transfer quality would have had to attempt each transfer once, while a
|
230 |
10\% transfer quality would indicate each transfer had to be attempted
|
231 |
ten times to successfully complete. Most transfers eventually
|
232 |
complete, having low transfer quality uses the transfer resources
|
233 |
inefficiency and usually results in a low utilization of the network.
|
234 |
|
235 |
\begin{figure}[htp]
|
236 |
\begin{center}
|
237 |
\includegraphics[width=0.7\linewidth]{figs/qualt0t1}
|
238 |
\end{center}
|
239 |
\caption{Transfer quality between CERN and Tier-1 centers over 30 days}
|
240 |
\label{fig:tier01qual}
|
241 |
\end{figure}
|
242 |
|
243 |
|
244 |
The transfer quality plot compares very favorably to equivalent plots
|
245 |
made during the spring. The CERN Castor2 storage element performed
|
246 |
very stably throughout the challenge. There were two small
|
247 |
configuration issues that were very promptly addressed by the experts.
|
248 |
The Tier-1s also performed well throughout the challenge with several
|
249 |
24 hour periods to specific Tier-1s with no transfer errors. The
|
250 |
stability of the RAL SE before the transition to CASTOR2 can be seen
|
251 |
at the left side of the plot, as well as the intentional downtime to
|
252 |
demonstrate recovery on the right side of the plot. The IN2P3
|
253 |
hardware problems are visible during the first week and the GridKa
|
254 |
dCache upgrade is clearly visible during the last week. Most of the
|
255 |
other periods are solidly green. Both FNAL and PIC are above 70\%
|
256 |
efficient for every day of the challenge activities.
|
257 |
|
258 |
|
259 |
Tier-1 to Tier-1 transfers were considered to be beyond the scope of
|
260 |
CSA06, though the dataflow exists in the CMS computing model. During
|
261 |
CSA06 we had an opportunity to test Tier-1 to Tier-1 transfers while
|
262 |
recovering from backlogs of data when the samples were subscribed to
|
263 |
mulitple sites. PhEDEx is designed to take the data from source site
|
264 |
where it can be efficiently transferred from. Figure~\ref{fig:t1t1}
|
265 |
shows the total Tier-1 to Tier-1 transfers during CSA06. With 7
|
266 |
Tier-1s there are 84 permutations of Tier-1 to Tier-1 transfers,
|
267 |
counting each direction separately. During CSA06 we successfully
|
268 |
exercised about half of them.
|
269 |
|
270 |
\begin{figure}[htp]
|
271 |
\begin{center}
|
272 |
\includegraphics[width=0.7\linewidth]{figs/T1T1Rate}
|
273 |
\end{center}
|
274 |
\caption{Transfer rate between Tier-1 centers during CSA06}
|
275 |
\label{fig:t1t1}
|
276 |
\end{figure}
|
277 |
|
278 |
\subsubsection{Transfers to Tier-2 Centers}
|
279 |
In the CMS computing model the Tier-2s are expected to be able to
|
280 |
receive data from any Tier-1 site. In order to simplify CSA06
|
281 |
operations we began by concentrating on transfers from the
|
282 |
``Associated'' Tier-1 sites, and in the final two weeks of the
|
283 |
challenge began a concerted effort on transfers from any Tier-1. The
|
284 |
associated Tier-1 center is the center operating the File Transfer
|
285 |
Service (FTS) server and hosting the channels for Tier-2 transfers.
|
286 |
|
287 |
The Tier-2 transfer metrics involved both participation and
|
288 |
performance. For CSA06 CMS had 27 sites that signed up to participate
|
289 |
in the challenge. Participation was defined as having successful
|
290 |
transfers 80\% of the days during the challenge. By this metric there
|
291 |
were 21 sites that succeeded in participating in the challenge, which
|
292 |
is above the goal of 20.
|
293 |
|
294 |
The Tier-2 transfer performance goals were 20MB/s and the threshold
|
295 |
was 5MB/s. In the CMS computing model the Tier-2 transfers are
|
296 |
expected to occur in bursts. Data will be transferred to refresh a
|
297 |
Tier-2 cache, and then will be analyzed locally. The Tier-2 sites
|
298 |
were not expected to hit the goal transfer rates continuously
|
299 |
throughout the challenge. There were 12 sites that successfully
|
300 |
averaged above the goal rate for at least one 24 hour period, and an
|
301 |
additional 8 sites that rated averaged the threshold rate for at least
|
302 |
one 24 hour period.
|
303 |
|
304 |
The transfer rate over the 30 most active transfer days is shown in
|
305 |
Figure~\ref{fig:tier12rate}. The aggregate rate from Tier-1 to
|
306 |
Tier-2 centers was not as high as the total rate from CERN, which is
|
307 |
not an accurate reflection of the transfers expected from the CMS
|
308 |
computing model. In the CMS computing model there is more data
|
309 |
exported from the Tier-1s to the Tier-2s than total raw data coming
|
310 |
from CERN because data is sent to multiple Tier-2s and the Tier-2s may
|
311 |
flush data from the cache and reload at a later time. In CSA06 the
|
312 |
Tier-2 centers were subscribed to specific samples at the beginning
|
313 |
and then specific skims when available.
|
314 |
|
315 |
\begin{figure}[htp]
|
316 |
\begin{center}
|
317 |
\includegraphics[width=0.7\linewidth]{figs/tier12rate}
|
318 |
\end{center}
|
319 |
\caption{Transfer rate between Tier-1 and Tier-2 centers during the first 30 days of CSA06}
|
320 |
\label{fig:tier12rate}
|
321 |
\end{figure}
|
322 |
|
323 |
The ability of the Tier-1 centers to export data was successfully
|
324 |
demonstrated during the challenge, but several sites indicated
|
325 |
interference between receiving and exporting data. The quality of the
|
326 |
Tier-1 to Tier-2 data transfers is shown in Figure~\ref{fig:tier12qual}.
|
327 |
The quality is not nearly as consistently green as the CERN to Tier-1
|
328 |
plots, but the variation has a number of causes. Not all of the
|
329 |
Tier-1 centers are currently exporting data as efficiently as CERN,
|
330 |
especially in the presence of a high load of data ingests, in addition
|
331 |
most of the Tier-2 sites do not have as much operational experience
|
332 |
receiving data as the Tier-1 sites do.
|
333 |
|
334 |
The Tier-1 to Tier-2 transfer quality looks very similar to the CERN
|
335 |
to Tier-1 transfer quality of 9-12 months ago. With a concerted
|
336 |
effort the Tier-1 to Tier-2 transfers should be able to reach the
|
337 |
quality of the current CERN to Tier-1 transfers before they are needed
|
338 |
to move large qualities of experiment data to users.
|
339 |
|
340 |
\begin{figure}[htp]
|
341 |
\begin{center}
|
342 |
\includegraphics[width=0.7\linewidth]{figs/tier12qual}
|
343 |
\end{center}
|
344 |
\caption{Transfer quality between Tier-1 and Tier-2 centers during the first 30 days of CSA06}
|
345 |
\label{fig:tier12qual}
|
346 |
\end{figure}
|
347 |
|
348 |
There are a number of very positive examples of Tier-1 to Tier-2
|
349 |
transfers. Figure~\ref{fig:picqual} shows the results of the Tier-1
|
350 |
to all Tier-2 tests when PIC was the source of the dataset. A small
|
351 |
skim sample was chosen and within 24 hours 20 sites had successfully
|
352 |
received the dataset. The transfer quality over the 24 hour period
|
353 |
remained high with success transfers to all four continents
|
354 |
participating in CMS.
|
355 |
|
356 |
\begin{figure}[htp]
|
357 |
\begin{center}
|
358 |
\includegraphics[width=0.7\linewidth]{figs/PICQual}
|
359 |
\end{center}
|
360 |
\caption{Transfer quality between PIC and Tier-2 sites participating in the dedicated Tier-1 to Tier-2 transfer tests}
|
361 |
\label{fig:picqual}
|
362 |
\end{figure}
|
363 |
|
364 |
Figure~\ref{fig:fnalrate} is an example of the very high export rates
|
365 |
the tier-1 centers were able to achieve transferring data to Tier-2
|
366 |
centers. The peak rate on the plot is over 5Gb/s, which was
|
367 |
independently verified by the site network monitoring. This rate is
|
368 |
over 50\% of the anticipated Tier-1 data export rate expected in the
|
369 |
full sized system.
|
370 |
|
371 |
\begin{figure}[htp]
|
372 |
\begin{center}
|
373 |
\includegraphics[width=0.7\linewidth]{figs/FNAL_Rate}
|
374 |
\end{center}
|
375 |
\caption{Transfer Performance between FNAL and Tier-2 sites
|
376 |
participating in the
|
377 |
%dedicated
|
378 |
Tier-1 to Tier-2 transfer tests}
|
379 |
\label{fig:fnalrate}
|
380 |
\end{figure}
|
381 |
|
382 |
Figure~\ref{fig:FZK_DESY} is an example of the very high rates achieved at both Tier-1 export and Tier-2 import observed in CSA06. The plot shows both the hourly average and the instantaneous rate. DESY achieved an import rate to disk of higher than 400MB/s.
|
383 |
|
384 |
\begin{figure}[ht]
|
385 |
\begin{center}
|
386 |
$\begin{array}{c@{\hspace{1in}}c}
|
387 |
\includegraphics[width=0.4\linewidth]{figs/FZK_DESY_1} &
|
388 |
\includegraphics[width=0.4\linewidth]{figs/FZK_DESY_2} \\
|
389 |
%[-0.53cm]
|
390 |
\end{array}$
|
391 |
\end{center}
|
392 |
\caption{The plot on the left is the hourly average transfer rate between GridKa and DESY. The plot on the right is the instantaneous rate between the two sites measured with Ganglia.}
|
393 |
\label{fig:FZK_DESY}
|
394 |
\end{figure}
|
395 |
|
396 |
|
397 |
\subsection{Tier-1 Skim Job Production}
|
398 |
\label{sec:skims}
|
399 |
|
400 |
CSA tested the workflow to reduce primary datasets to manageable sizes
|
401 |
for analyses. Four production teams provided centralized skim job workflow at
|
402 |
the Tier-1 centers. The produced secondary datasets are registered
|
403 |
into Dataset Bookkeeping Service and accessed like any other data.
|
404 |
Common skim job tools were prepared based on Monte Carlo generator
|
405 |
information and reconstruction output, and both types were tested
|
406 |
(see Section~\ref{sec:filtering}). There was
|
407 |
overwhelming response from the analysis demonstrations, and about
|
408 |
25 filters producing nearly 60 datasets were run as compiled in
|
409 |
Table~\ref{tab:tier1skim}. A
|
410 |
variety of output formats for the secondary datasets were used (FEVT,
|
411 |
RECO, AOD, AlCaReco), and the selected number of
|
412 |
events range from $<1\%$ to $100\%$. Secondary dataset sizes ranged
|
413 |
from $<1$~GB to 2.5 TB. No requirement was imposed beforehand on the
|
414 |
restrictiveness of the filters for CSA06, hence those with very low
|
415 |
efficiencies are probably tighter than what one would apply in
|
416 |
practice.
|
417 |
|
418 |
\begin{table}[phtb]
|
419 |
\centering
|
420 |
\caption{List of requested skim filters to run during CSA06 by group,
|
421 |
filter name, primary input dataset, efficiency, and input/output
|
422 |
data formats.}
|
423 |
\label{tab:tier1skim}
|
424 |
\begin{tabular}{|l|l|l|l|l|l|}
|
425 |
\hline
|
426 |
Group & Filter & Samples & Efficiencies & Input
|
427 |
format & Output format \\
|
428 |
\hline
|
429 |
hg & CSA06\_Tau\_Zand1lFilter.cfg & EWK & 14\% & FEVT &
|
430 |
RECOSim \\
|
431 |
hg & CSA06\_HiggsTau\_1lFilter.cfg & EWK & 36\% & FEVT &
|
432 |
RECOSim \\
|
433 |
hg & CSA06\_HiggsTau\_1lFilter.cfg & T-Tbar & 47\% & FEVT &
|
434 |
RECOSim \\
|
435 |
hg & CSA06\_HiggsWW\_WWFilter.cfg (bkgnd)& EWK & 1\% & FEVT &
|
436 |
FEVT \\
|
437 |
hg & CSA06\_HiggsWW\_WWFilter.cfg (signal)& EWK & 1\% & FEVT &
|
438 |
FEVT \\
|
439 |
hg & CSA06\_HiggsWW\_TTb\_Filter.cfg & T-Tbar & 4\% & FEVT &
|
440 |
FEVT \\
|
441 |
hg & CSA06\_Higgs\_mc2l\_Filter.cfg & EWK & 10\% & FEVT &
|
442 |
RECOSim \\
|
443 |
hg & CSA06\_Higgs\_mc2l\_Filter.cfg & Jets & 2\% & FEVT &
|
444 |
RECOSim \\
|
445 |
hg & CSA06\_Higgs\_mc2l\_Filter.cfg & HLT(e,mu) & 1\% &
|
446 |
FEVT & RECOSim \\
|
447 |
hg & CSA06\_Higgs\_mc2gamma\_Filter.cfg & EWK &0 & FEVT &
|
448 |
RECOSim \\
|
449 |
hg & CSA06\_Higgs\_mc2gamma\_Filter.cfg & Jets & 34\% &
|
450 |
FEVT & RECOSim \\
|
451 |
hg & CSA06\_Higgs\_mc2gamma\_Filter.cfg & HLT(gam) &
|
452 |
0.4\% & FEVT & RECOSim \\
|
453 |
hg & CSA06\_Higgs\_mc2l\_Filter.cfg & TTbar & 14\% & FEVT & RECOSim\\
|
454 |
hg & CSA06\_Higgs\_mc2gamma\_Filter.cfg & TTbar & 8\% & FEVT &
|
455 |
RECOSim\\ \hline
|
456 |
sm & CSA06\_TTbar\_1lFilters.cfg (skim1efilter) & T-Tbar & 20\%
|
457 |
& FEVT & RECOSim \\
|
458 |
sm & CSA06\_TTbar\_1lFilters.cfg (skim1mufilter) & T-Tbar & 20\%
|
459 |
& FEVT & RECOSim \\
|
460 |
sm & CSA06\_TTbar\_1lFilters.cfg (skim1taufilter) & T-Tbar & 20\%
|
461 |
& FEVT & RECOSim \\
|
462 |
sm & CSA06\_TTbar\_dilepton.cfg & T-Tbar & $\sim$10\% & FEVT &
|
463 |
RECOSim \\
|
464 |
sm & CSA06\_MinimumBiasSkim.cfg & minbias & 100\% & FEVT & RECOSim \\
|
465 |
sm & CSA06\_UnderlyingEventJetsSkim.cfg (reco) & Jets &
|
466 |
$\sim$100\% & FEVT & RECOSim \\
|
467 |
sm & CSA06\_UnderlyingEventDYSkim.cfg & EWK & $\sim$10\% &
|
468 |
FEVT & RECOSim \\ \hline
|
469 |
eg & CSA06\_ZeeFilter.cfg (zeeFilter) & EWK & 3\% & FEVT &
|
470 |
RECOSim \\
|
471 |
eg & CSA06\_ZeeFilter.cfg (AlCaReco) & EWK & 3\% & FEVT &
|
472 |
AlcaReco \\
|
473 |
eg & CSA06\_AntiZmmFilter.cfg & Jets & 85\% & FEVT &
|
474 |
FEVT\\ \hline
|
475 |
mu & CSA06\_JPsi\_mumuFilter.cfg & SoftMuon & 50\% & FEVT &
|
476 |
FEVT \\
|
477 |
mu & CSA06\_JPsi\_mumuFilter.cfg & Zmumu & 50\% & FEVT & FEVT \\
|
478 |
mu & CSA06\_JPsi\_mumuFilter.cfg & EWK & 10\% & FEVT & FEVT \\
|
479 |
mu & CSA06\_WmunuFilter.cfg (reco) & EWK & 20\% & FEVT & AODSim \\
|
480 |
mu & CSA06\_WmunuFilter.cfg (reco) & SoftMuon & 60\% & FEVT &
|
481 |
AODSim \\
|
482 |
mu & CSA06\_ZmmFilter.cfg & Zmumu & 50\% & FEVT & RECOSim \\
|
483 |
mu & CSA06\_ZmmFilter.cfg & Jets & -- & FEVT & FEVT \\
|
484 |
mu & recoDiMuonExample.cfg (reco) & EWK & 20\% & FEVT & RECOSim \\
|
485 |
mu & recoDiMuonExample.cfg (reco) & Zmumu & 67\% & FEVT &
|
486 |
RECOSim \\ \hline
|
487 |
su & CSA06\_Exotics\_LM1Filter.cfg & Exotics & 39\% & FEVT &
|
488 |
FEVT \\
|
489 |
su & CSA06\_BSM\_mc2e\_Filter.cfg & Exotics & 2\% & FEVT &
|
490 |
FEVT \\
|
491 |
su & CSA06\_BSM\_mc2e\_Filter.cfg & EWK & $\sim 40\%$ & FEVT &
|
492 |
FEVT \\
|
493 |
su & CSA06\_BSM\_mc2e\_Filter.cfg & HLT(e) & -- & FEVT & FEVT \\
|
494 |
su & CSA06\_Exotics\_ZprimeDijetFilter.cfg & Exotics &
|
495 |
$\sim$30\% & FEVT & FEVT \\
|
496 |
su & CSA06\_Exotics\_QstarDijetFilter.cfg & Exotics &
|
497 |
$\sim$20\% & FEVT & FEVT \\
|
498 |
su & CSA06\_Exotics\_XQFilter.cfg & Exotics & 22\% & FEVT &
|
499 |
FEVT \\
|
500 |
su & CSA06\_Exotics\_ZprimeFilter.cfg & Exotics & 39\% &
|
501 |
FEVT & FEVT \\
|
502 |
su & CSA06\_Exotics\_LM1\_3IC5Jet30Filter.cfg (reco) & Exotics &
|
503 |
$25\%$ & FEVT & FEVT \\
|
504 |
su & CSA06\_TTbar\_2IC5Jet100ExoFilter.cfg (reco) & T-Tbar &
|
505 |
5\% & FEVT & FEVT \\ \hline
|
506 |
jm & CSA06\_QCD\_Skim.cfg (21 samples) & Jets & 100\% &
|
507 |
FEVT & FEVT \\ \hline
|
508 |
|
509 |
\end{tabular}
|
510 |
\end{table}
|
511 |
|
512 |
|
513 |
|
514 |
\subsection{Tier-1 Re-Reconstruction}
|
515 |
\label{sec:rereco}
|
516 |
|
517 |
The goal was to demonstrate re-reconstruction at a Tier-1 centre on
|
518 |
files first reconstructed and distributed by the Tier-0 centre,
|
519 |
including access and application of new constants from the
|
520 |
offline DB. Four teams were set up to demonstrate re-reconstruction on
|
521 |
at least 100K events at each of the Tier-1 centres.
|
522 |
|
523 |
\subsubsection{Baseline Approach}
|
524 |
|
525 |
Since re-reconstruction had not been tested before the start of CSA06,
|
526 |
a technical problem was encountered with a couple of reconstruction
|
527 |
modules when re-reconstructed was first attempted November 4. The
|
528 |
issue has to do with multiple reconstruction products stored in the
|
529 |
Event, and the proper mechanism of accessing them. Once diagnosed the
|
530 |
Tier-1 re-reconstruction workflow dropped
|
531 |
pixel tracking and vertexing out of about 100 reconstruction modules,
|
532 |
and the processing worked correctly.
|
533 |
Re-reconstruction was demonstrated on $>$100K events at 6 Tier-1 centres.
|
534 |
For the Tracker and ECAL calibration exercises (see
|
535 |
Section~\ref{sec:calib}), new constants inserted
|
536 |
into the offline DB were used for the re-reconstruction, and the
|
537 |
resulting datasets were
|
538 |
published and accessible to CRAB jobs. Thus, CSA06 also demonstrated
|
539 |
the full reprocessing workflow.
|
540 |
|
541 |
\subsubsection{Two-Step Approach}
|
542 |
|
543 |
While the reconstruction issue described above was being diagnosed, a
|
544 |
brute-force two-step procedure was conducted in parallel to ensure
|
545 |
re-reconstruction at a Tier-1 centre. The approach consisted of first
|
546 |
skimming off the original Tier-0 reconstruction products in analogy
|
547 |
with the physics skim job workflow described in
|
548 |
Section~\ref{sec:skims}, and then run reconstruction on the skimmed
|
549 |
events (i.e. two ProdAgent workflows). This approach was also
|
550 |
successfully demonstrated at the FNAL Tier-1 centre.
|
551 |
|
552 |
|
553 |
\subsection{Job Execution at Tier-1 and Tier-2}
|
554 |
\subsubsection{Job Robot}
|
555 |
The processing metrics in CSA06 as they were defined foresaw that sites
|
556 |
offering computing capacity to CMS and participating in CSA06 were expected
|
557 |
to complete an aggregate of 50k jobs per day. The goal was to exercise the
|
558 |
job submission infrastructure and to monitor the input/output rate.
|
559 |
|
560 |
\begin{itemize}
|
561 |
\item About 10k per day were intended as skimming and reconstruction jobs
|
562 |
at the Tier-1 centers
|
563 |
\item About 40k per day were expected to be a combination of user submitted
|
564 |
analysis jobs and robot submitted analysis-like jobs
|
565 |
\end{itemize}
|
566 |
|
567 |
The job robots are automated expert systems to simulate user analysis tasks
|
568 |
using the CMS Remote Analysis Builder (CRAB). Therefore they provide a reasonable
|
569 |
method to generate load on the system by running analysis on all datasamples
|
570 |
at all sites individually. They consist of a component/agent based
|
571 |
structure which enables parallel execution. Job distribution to CMS compute
|
572 |
resources is accomplished by using Condor-G direct submission on the OSG sites
|
573 |
and gLite bulk submission on the EGEE sites.\\
|
574 |
|
575 |
The job preparation phase comprises four distinct steps
|
576 |
\begin{itemize}
|
577 |
\item Job creation
|
578 |
\begin{itemize}
|
579 |
\item Data discovery using DBS/DLS
|
580 |
\item Job splitting according to user requirements
|
581 |
\item Preparation of job dependent files (incl. the jdl)
|
582 |
\end{itemize}
|
583 |
\item Job submission
|
584 |
\begin{itemize}
|
585 |
\item Check if there any compatible resources in the Grid Information System known to the submission system
|
586 |
\item Submit job to the Grid submission component (Resource Broker or Condor-G) through the CMS bookkeeping component (BOSS)
|
587 |
\end{itemize}
|
588 |
\item Job status check
|
589 |
\item Job output retrieval
|
590 |
\begin{itemize}
|
591 |
\item Retrieve job output from the sandbox located on the Resource Broker (EGEE sites) or the common filesystem (OSG sites)
|
592 |
\end{itemize}
|
593 |
\end{itemize}
|
594 |
|
595 |
The job robot executes all four steps of the above described workflow on a large scale.\\
|
596 |
|
597 |
Apart from job submission the monitoring of the job execution over the
|
598 |
entire chain of all steps involved plays an important role. CMS has
|
599 |
chosen to use a product called Dashboard, a development that is part
|
600 |
of the CMS Integration Program. It is a joint effort of LCG's
|
601 |
ARDA project and the MonAlisa team in close collaboration with the CMS
|
602 |
developers working on job submission tools for production and analysis.
|
603 |
The objective of the Dashboard is to provide a complete view of the CMS
|
604 |
activity independently of the Grid flavour (i.e. OSG vs. EGEE). The
|
605 |
Dashboard maintains and displays the quantitative characteristics of the
|
606 |
usage pattern by including CMS-specific information and it reports problems
|
607 |
of various nature.\\
|
608 |
|
609 |
The monitoring information used in CSA06 is available via a web interface
|
610 |
and includes the following categories
|
611 |
\begin{itemize}
|
612 |
\item Quantities - how many jobs are running, pending, successfully
|
613 |
completed, failed, per user, per site, per input data collection, and
|
614 |
the distribution of these quantities over time
|
615 |
\item Usage of the resources (CPU, memory consumption, I/O rates), and
|
616 |
distribution over time with aggregation on different levels
|
617 |
\item Distribution of resources between different application areas
|
618 |
(i.e. analysis vs. production), different analysis groups and individual
|
619 |
users
|
620 |
\item Grid behaviour - success rate, failure reasons as a function of time,
|
621 |
site and data collection
|
622 |
\item CMS application behaviour
|
623 |
\item Distribution of data samples over sites and analysis groups
|
624 |
\end{itemize}
|
625 |
|
626 |
Timeline:
|
627 |
\begin{itemize}
|
628 |
\item October 15, 2006: The job robots have started analysis submission. 10k
|
629 |
jobs were submitted by two robot instances, with 90\% of them going to OSG sites
|
630 |
using Codor-G direct submission and 10\% going through the traditional LCG
|
631 |
Resource Broker (RB) to EGEE sites. In preparation of moving to the gLite RB,
|
632 |
thereby improving the submission rate to EGEE sites, bulk submission was
|
633 |
integrated into CRAB and is currently being tested.
|
634 |
|
635 |
\item October 17, 2006: Job robot submissions continue at a larger scale. There
|
636 |
was an issue found with the bulk submission feature used at EGEE sites leaving
|
637 |
jobs hanging indefinitely. The explanation was parsing
|
638 |
of file names in the RB input sandbox failed for file name lengths of exactly 110
|
639 |
characters. The problem, located in the gLite User Interface (UI), was solved by
|
640 |
rebuilding the UI code to include a new version of the libtar library. A new
|
641 |
version of the UI was made available to the job robot operations team within a
|
642 |
day.\\
|
643 |
A total of 20k jobs were submitted in the past 24 hours. A large number of jobs
|
644 |
seemed not to report all the site information to the
|
645 |
Dashboard, which results into a major fraction marked as "unknown" in the report.
|
646 |
The effect needs to be understood.\\
|
647 |
Apart from the jobs being affected by the problem mentioned above the efficiency
|
648 |
regarding successfully completed jobs is very high.
|
649 |
|
650 |
\item October 19, 2006: Robotic job submission via both the Condor-G direct
|
651 |
submission and the gLite RB bulk submission is activated. The job completion efficiency
|
652 |
remains very high for some sites. Over the course of the past day nearly 2000
|
653 |
jobs were completed at Caltech with only 5 failures.
|
654 |
|
655 |
\item October 20, 2006: The number of "unknown" jobs is decreasing following
|
656 |
further investigations by the robot operations team. The job completion efficiency
|
657 |
remains high though the total number of submissions is lower than in the previous
|
658 |
days. A large number of sites running the PBS batch system have taken their
|
659 |
resources off the Grid because of a critical security vulnerability. Sites
|
660 |
applied a respective patch at short notice and were back to normal operation
|
661 |
within a day or two.
|
662 |
|
663 |
\item October 23, 2006: Over the weekend significant scaling issues were
|
664 |
encountered in the robot. Those were mainly associated with the mySQL
|
665 |
server holding the BOSS DB. On the gLite submission side a problem was
|
666 |
found with projects comprising more than 2000 jobs. A limit was
|
667 |
introduced with the consequence that the same data files are more often
|
668 |
accessed.
|
669 |
|
670 |
\item October 24, 2006: There were again scaling problems observed in the
|
671 |
job robots. Switching to a central mySQL data base for both the robots
|
672 |
has lead to the databases developing a lock state. Though the locks
|
673 |
automatically clear within 10 to 30 minutes the effect has an impact on
|
674 |
the overall job submissions rate. To resolve the issue two data bases
|
675 |
were created, one for each robot. While the Condor-G side performs well
|
676 |
the gLite robot continues to develop locking. A memory leak leading to
|
677 |
robot crashes was observed in CRAB/BOSS submission through gLite. The
|
678 |
robot operations team is working with the BOSS developers on a solution.
|
679 |
|
680 |
\item October 25, 2006: The BOSS developers have analyzed the problem
|
681 |
yesterday reported as a "scaling issue" and found that an SQL statement
|
682 |
issued by CRAB was incomplete, leading to long table rows being accessed
|
683 |
resulting in a heavy load on the data base server. The CRAB developers
|
684 |
have made a new release available the same day and the robot operations
|
685 |
team found that the robots are running fine since.
|
686 |
|
687 |
\item October 26, 2006: Following the decision to move from analysis
|
688 |
of data that has been produced with CMSSW\_1\_0\_3 to more recent data
|
689 |
that was produced with CMSSW\_1\_0\_5 a lot of sites were not selected
|
690 |
and therefore not participating since they are still lacking respective
|
691 |
datasets.
|
692 |
|
693 |
\item November 1, 2006: The submission rate reached by the job robots
|
694 |
is currently at about 25k jobs per day. To improve scaling up to the
|
695 |
desired rate 11 robots were set up and are currently submitting to OSG
|
696 |
and EGEE sites.
|
697 |
|
698 |
\item November 2, 2006: The total number of jobs was in the order of
|
699 |
21k. Due to more sites having datasets published in DBS/DLS that were
|
700 |
created with CMSSW\_1\_0\_5 the number of participating sites has increased.
|
701 |
The total application and Grid efficiency is both over 99%.
|
702 |
|
703 |
\item November 6, 2006: The number of submitted and completed jobs is still increasing.
|
704 |
30k jobs have successfully passed all steps in the past 24 hours. 24
|
705 |
Tier-2 sites are now publishing data and are accepting jobs from the robot.
|
706 |
The efficiency remains high.
|
707 |
|
708 |
\item November 7, 2006: The combined job robot, production and analysis submissions
|
709 |
exceeded the goal of 55k per day. The activity breakdown is shown in
|
710 |
Figure~\ref{fig:breakdown}.
|
711 |
|
712 |
\begin{figure}[htp]
|
713 |
\begin{center}
|
714 |
\includegraphics[width=0.9\linewidth]{figs/jobs-breakdown-1102}
|
715 |
\end{center}
|
716 |
\caption{Dashboard view of job breakdown by activity (7 Nov - 8 Nov)}
|
717 |
\label{fig:breakdown}
|
718 |
\end{figure}
|
719 |
|
720 |
The job robot submissions by site are shown in Figure~\ref{fig:jobs-per-site}.
|
721 |
Six out of seven Tier-1
|
722 |
centers are included in the job robot. As expected the Tier-2 centers are
|
723 |
still dominating the submissions. The addition of the Tier-1 centers has
|
724 |
driven the job robot submission rates past the load that can be sustained
|
725 |
by a single mySQL job monitor.
|
726 |
|
727 |
\begin{figure}[htp]
|
728 |
\begin{center}
|
729 |
\includegraphics[width=0.9\linewidth]{figs/jobs-per-site-1102}
|
730 |
\end{center}
|
731 |
\caption{Dashboard view of job breakdown by site (7 Nov - 8 Nov)}
|
732 |
\label{fig:jobs-per-site}
|
733 |
\end{figure}
|
734 |
\end{itemize}
|