1 |
fisk |
1.1 |
\section{Definition}
|
2 |
|
|
|
3 |
|
|
The combined Computing, Software, and Analysis challenge of 2006 is an
|
4 |
|
|
O(50) million event exercise to test the workflow and dataflow
|
5 |
|
|
associated with the data handling model of CMS. It is designed to be a
|
6 |
|
|
25\% capacity test of what is needed for operations in 2008. The main
|
7 |
|
|
components include:
|
8 |
|
|
|
9 |
|
|
\begin{itemize}
|
10 |
acosta |
1.3 |
\item Preparation of large simulated datasets (some with High Level
|
11 |
|
|
Trigger tags)
|
12 |
fisk |
1.1 |
\item Prompt reconstruction at Tier-0, including:
|
13 |
|
|
\begin{itemize}
|
14 |
|
|
\item Reconstruction at 40 Hz using CMSSW software
|
15 |
|
|
\item Application of calibration constants from offline DB
|
16 |
|
|
\item Generation of FEVT, AOD, and Alignment/Calibration skim datasets
|
17 |
|
|
\item Splitting of an HLT-tagged sample into O(10) streams
|
18 |
|
|
\end{itemize}
|
19 |
|
|
\item Distribution of all AOD and some FEVT to all participating Tier-1s
|
20 |
|
|
\item Calibration jobs on Alignment/Calibration datasets at some Tier-1s and CAF
|
21 |
|
|
\item Re-reconstruction performed at a Tier-1
|
22 |
|
|
\item Skim jobs at some Tier-1s with data propagated to Tier-2s
|
23 |
|
|
\item Physics jobs at Tier-2s and Tier-1s on AOD and Reco
|
24 |
|
|
\end{itemize}
|
25 |
|
|
|
26 |
|
|
While this is an exercise to test the data handling workflow under as
|
27 |
|
|
realistic conditions as possible, it is not explicitly required that
|
28 |
|
|
the software components are fully validated for physics performance at
|
29 |
|
|
the time of the challenge. However, where possible we tried to
|
30 |
|
|
maintain the maximum utility of the simulated, reconstructed, and
|
31 |
|
|
selected samples for the analysis component of the exercise. The CMS
|
32 |
|
|
Computing Model is described elsewhere. [Need references]
|
33 |
|
|
|
34 |
|
|
|
35 |
|
|
|
36 |
|
|
\section {Success Metric}
|
37 |
|
|
|
38 |
|
|
Success of the CSA06 challenge is pre-defined as meeting a series of
|
39 |
|
|
binary metrics (succeed/fail) as well as a list of quantitative
|
40 |
|
|
numbers, the latter of which are placed at two levels: a minimum
|
41 |
|
|
threshold below which we consider a definite failure, and a goal which
|
42 |
|
|
is considered achievable if everything runs well. No specific goals
|
43 |
|
|
were placed on the number or results from the calibration, alignment,
|
44 |
|
|
and analysis exercises.
|
45 |
|
|
|
46 |
|
|
\subsection{Binary metrics}
|
47 |
|
|
\begin{itemize}
|
48 |
|
|
\item Automatic FEVT+AOD transfer T0 to T1 via PhEDEx
|
49 |
|
|
\item Automatic transfer of part of FEVT+AOD T1 to T2 via PhEDEx
|
50 |
|
|
\item Offline DB accessible via FroNtier/Squid at participating sites
|
51 |
|
|
\item Insertion and use new constants in Offline DB
|
52 |
|
|
\item User submission of analysis/calibration/skim jobs via CRAB via DBS/DLS
|
53 |
|
|
\item Skim job output automatically moved to T2 via PhEDEx
|
54 |
|
|
\item Running re-reconstruction-like jobs at T1 that access updated information from the offline DB
|
55 |
|
|
\end{itemize}
|
56 |
|
|
|
57 |
|
|
\subsection {Quantitative metrics}
|
58 |
|
|
|
59 |
|
|
\begin{itemize}
|
60 |
|
|
\item Number of participating Tier-1 - Goal: 7 - Threshold: 5
|
61 |
|
|
\begin{itemize}
|
62 |
acosta |
1.2 |
\item Passing requires $<3$ days downtime during challenge
|
63 |
fisk |
1.1 |
\end{itemize}
|
64 |
|
|
\item Number of participating Tier-2 - Goal: 20 - Threshold 15
|
65 |
|
|
item Weeks of running at sustained rate - Goal: 4 - Threshold: 2
|
66 |
|
|
\begin{itemize}
|
67 |
|
|
\item This will be the period over which we measure the other metrics
|
68 |
|
|
\end{itemize}
|
69 |
|
|
\item Tier-0 Efficiency - Goal: 80\% - Threshold: 30\%
|
70 |
|
|
\begin{itemize}
|
71 |
|
|
\item Measured as unattended uptime fraction over 2 best weeks of the running period
|
72 |
|
|
\end{itemize}
|
73 |
|
|
\item Running grid jobs (T1+T2) per day (2h jobs typ.) - Goal: 50K - Threshold: 30K
|
74 |
|
|
\item Grid job efficiency - Goal: 90\% - Threshold: 70\%
|
75 |
|
|
item Data serving capability at each participating site : Goal 1MB/sec/execution slot - Threshold : 400 MB/sec (T1) or 100 MB/sec (T2)
|
76 |
acosta |
1.3 |
item Data transfer T0 to T1 to tape - Individual goals (threshold at
|
77 |
|
|
50\% of goal):
|
78 |
fisk |
1.1 |
\begin{itemize}
|
79 |
|
|
\item ASGC: 10MB/s
|
80 |
|
|
\item CNAF: 25 MB/s
|
81 |
|
|
\item FNAL: 50 MB/s
|
82 |
|
|
\item GridKa: 20MB/s
|
83 |
|
|
\item IN2P3: 25MB/s
|
84 |
|
|
\item PIC: 10 MB/s
|
85 |
|
|
\item RAL: 10MB/s
|
86 |
|
|
\end{itemize}
|
87 |
|
|
\item Data transfer T1 to T2 - Goal: 20MB/s into each T2 - Threshold: 5MB/s
|
88 |
|
|
\begin{itemize}
|
89 |
acosta |
1.3 |
\item Overall ``success'' is to have 50\% of the participants at or above goal and 90\% above the threshold
|
90 |
fisk |
1.1 |
\item Several T2s have better connectivity and we will have higher targets for those
|
91 |
|
|
\item Goal for each Tier2 is to demonstrate 50\% utilization of the WAN to the best connected T1
|
92 |
|
|
\begin{itemize}
|
93 |
|
|
\item list to be defined after SC4
|
94 |
|
|
\end{itemize}
|
95 |
|
|
\end{itemize}
|
96 |
|
|
\end{itemize} |