1 |
fisk |
1.1 |
\section{Definition}
|
2 |
|
|
|
3 |
|
|
The combined Computing, Software, and Analysis challenge of 2006 is an
|
4 |
|
|
O(50) million event exercise to test the workflow and dataflow
|
5 |
|
|
associated with the data handling model of CMS. It is designed to be a
|
6 |
|
|
25\% capacity test of what is needed for operations in 2008. The main
|
7 |
|
|
components include:
|
8 |
|
|
|
9 |
|
|
\begin{itemize}
|
10 |
acosta |
1.3 |
\item Preparation of large simulated datasets (some with High Level
|
11 |
|
|
Trigger tags)
|
12 |
fisk |
1.1 |
\item Prompt reconstruction at Tier-0, including:
|
13 |
|
|
\begin{itemize}
|
14 |
acosta |
1.7 |
\item Reconstruction at 40 Hz using the new CMSSW software framework
|
15 |
|
|
\item Application of calibration constants from the offline database (DB)
|
16 |
|
|
\item Generation of FEVT\footnote{``Full Event'' data, i.e. RAW+RECO
|
17 |
|
|
which for this challenge was combined into one file stream},
|
18 |
|
|
AOD\footnote{``Analysis Object Data'' (AOD) is a reduced collection of
|
19 |
|
|
reconstruction products (i.e. DST)}, and Alignment/Calibration skim
|
20 |
|
|
datasets known as ``AlCaReco'' skims
|
21 |
fisk |
1.1 |
\item Splitting of an HLT-tagged sample into O(10) streams
|
22 |
|
|
\end{itemize}
|
23 |
acosta |
1.8 |
\item Distribution of all AOD and some FEVT to all participating
|
24 |
|
|
Tier-1s (as well as to some Tier-2s)
|
25 |
|
|
\item Calibration jobs on Alignment/Calibration datasets at Tier-1, Tier-2 and CAF
|
26 |
fisk |
1.1 |
\item Re-reconstruction performed at a Tier-1
|
27 |
|
|
\item Skim jobs at some Tier-1s with data propagated to Tier-2s
|
28 |
|
|
\item Physics jobs at Tier-2s and Tier-1s on AOD and Reco
|
29 |
|
|
\end{itemize}
|
30 |
|
|
|
31 |
|
|
While this is an exercise to test the data handling workflow under as
|
32 |
|
|
realistic conditions as possible, it is not explicitly required that
|
33 |
|
|
the software components are fully validated for physics performance at
|
34 |
|
|
the time of the challenge. However, where possible we tried to
|
35 |
|
|
maintain the maximum utility of the simulated, reconstructed, and
|
36 |
|
|
selected samples for the analysis component of the exercise. The CMS
|
37 |
acosta |
1.7 |
Computing Model is described elsewhere.
|
38 |
|
|
%%\cite{}[Need references]
|
39 |
fisk |
1.1 |
|
40 |
|
|
|
41 |
|
|
|
42 |
|
|
\section {Success Metric}
|
43 |
|
|
|
44 |
acosta |
1.4 |
Success of the CSA06 challenge was pre-defined (June 2006) as meeting
|
45 |
|
|
a series of
|
46 |
fisk |
1.1 |
binary metrics (succeed/fail) as well as a list of quantitative
|
47 |
acosta |
1.7 |
numbers based on the performances anticipated a half-year before the
|
48 |
|
|
CSA06 challenge (and Service Challenge 4) began.
|
49 |
|
|
The quantitative metrics
|
50 |
|
|
are placed at two levels: a minimum
|
51 |
fisk |
1.1 |
threshold below which we consider a definite failure, and a goal which
|
52 |
|
|
is considered achievable if everything runs well. No specific goals
|
53 |
|
|
were placed on the number or results from the calibration, alignment,
|
54 |
acosta |
1.7 |
and analysis exercises other than to meet the overall daily job
|
55 |
|
|
submission goal and to demonstrate the workflow associated with prompt
|
56 |
|
|
calibration.
|
57 |
fisk |
1.1 |
|
58 |
fisk |
1.6 |
The metrics were chosen to exercise a variety of important elements in
|
59 |
|
|
the CMS computing model, though to enable the available effort to
|
60 |
|
|
concentrate on particular functionality not all areas were tested.
|
61 |
|
|
The metrics were chosen to ensure a broad participation of CMS
|
62 |
|
|
computing facilities, to enable the experiment to demonstrate
|
63 |
|
|
functionality critical to early experiment operations, and to
|
64 |
|
|
encourage physics analysis.
|
65 |
|
|
|
66 |
fisk |
1.1 |
\subsection{Binary metrics}
|
67 |
|
|
\begin{itemize}
|
68 |
acosta |
1.7 |
\item Automatic FEVT+AOD transfer Tier-0 to Tier-1 via PhEDEx, the
|
69 |
|
|
data placement tool
|
70 |
|
|
\item Automatic transfer of part of FEVT+AOD Tier-1 to Tier-2 via PhEDEx
|
71 |
|
|
\item Offline DB accessible via FroNtier/Squid (a caching layer
|
72 |
|
|
between the reconstruction jobs and the Oracle DB) at participating sites
|
73 |
fisk |
1.1 |
\item Insertion and use new constants in Offline DB
|
74 |
acosta |
1.7 |
\item User submission of analysis/calibration/skim jobs via the grid
|
75 |
|
|
job submission tool CRAB and using the developed Dataset Bookkeeping
|
76 |
|
|
Service (DBS)
|
77 |
|
|
and Data Location Service (DLS)
|
78 |
|
|
\item Skim job output automatically moved to Tier-2 via PhEDEx
|
79 |
|
|
\item Running re-reconstruction-like jobs at Tier-1 that access
|
80 |
|
|
updated information from the offline DB and perform a new
|
81 |
|
|
reconstruction on data distributed fromt he Tier-0 centre
|
82 |
fisk |
1.1 |
\end{itemize}
|
83 |
|
|
|
84 |
|
|
\subsection {Quantitative metrics}
|
85 |
|
|
|
86 |
|
|
\begin{itemize}
|
87 |
acosta |
1.7 |
\item Number of participating Tier-1 -- Goal: 7 -- Threshold: 5
|
88 |
fisk |
1.1 |
\begin{itemize}
|
89 |
acosta |
1.7 |
\item Passing requires 90\% uptime, or $<3$ days downtime during challenge
|
90 |
fisk |
1.1 |
\end{itemize}
|
91 |
acosta |
1.7 |
\item Number of participating Tier-2 -- Goal: 20 -- Threshold 15
|
92 |
|
|
\item Weeks of running at sustained rate -- Goal: 4 -- Threshold: 2
|
93 |
fisk |
1.1 |
\begin{itemize}
|
94 |
|
|
\item This will be the period over which we measure the other metrics
|
95 |
|
|
\end{itemize}
|
96 |
acosta |
1.7 |
\item Tier-0 Efficiency -- Goal: 80\% -- Threshold: 30\%
|
97 |
fisk |
1.1 |
\begin{itemize}
|
98 |
|
|
\item Measured as unattended uptime fraction over 2 best weeks of the running period
|
99 |
|
|
\end{itemize}
|
100 |
acosta |
1.7 |
\item Running grid jobs (Tier-1 + Tier-2) per day (2h jobs typ.) -- Goal: 50K -- Threshold: 30K
|
101 |
|
|
\item Grid job efficiency -- Goal: 90\% -- Threshold: 70\%
|
102 |
|
|
\item Data serving capability at each participating site : Goal 1MB/sec/execution slot -- Threshold : 400 MB/sec (Tier-1) or 100 MB/sec (Tier-2)
|
103 |
|
|
\item Data transfer Tier-0 to Tier-1 to tape -- Individual goals (threshold at
|
104 |
acosta |
1.3 |
50\% of goal):
|
105 |
fisk |
1.1 |
\begin{itemize}
|
106 |
|
|
\item ASGC: 10MB/s
|
107 |
|
|
\item CNAF: 25 MB/s
|
108 |
|
|
\item FNAL: 50 MB/s
|
109 |
|
|
\item GridKa: 20MB/s
|
110 |
|
|
\item IN2P3: 25MB/s
|
111 |
|
|
\item PIC: 10 MB/s
|
112 |
|
|
\item RAL: 10MB/s
|
113 |
|
|
\end{itemize}
|
114 |
acosta |
1.7 |
\item Data transfer Tier-1 to Tier-2 -- Goal: 20MB/s into each Tier-2 -- Threshold: 5MB/s
|
115 |
fisk |
1.1 |
\begin{itemize}
|
116 |
acosta |
1.3 |
\item Overall ``success'' is to have 50\% of the participants at or above goal and 90\% above the threshold
|
117 |
acosta |
1.7 |
\item Several Tier-2s have better connectivity and we will have higher targets for those
|
118 |
|
|
\item Goal for each Tier-2 is to demonstrate 50\% utilization of the WAN to the best connected Tier-1
|
119 |
fisk |
1.1 |
\begin{itemize}
|
120 |
acosta |
1.4 |
\item list was defined after SC4
|
121 |
fisk |
1.1 |
\end{itemize}
|
122 |
|
|
\end{itemize}
|
123 |
|
|
\end{itemize} |