COMP/CSA06DOC/intro.tex

\section{Definition} 

The combined Computing, Software, and Analysis challenge of 2006 is an
O(50) million event exercise to test the workflow and dataflow
associated with the data handling model of CMS. It is designed to be a
25\% capacity test of what is needed for operations in 2008. The main
components include:

\begin{itemize}
\item Preparation of large simulated datasets (some with High Level
  Trigger tags) 
\item Prompt reconstruction at Tier-0, including:
\begin{itemize} 
\item Reconstruction at 40 Hz using the new CMSSW software framework
\item Application of calibration constants from the offline database (DB) 
\item Generation of FEVT\footnote{``Full Event'' data, i.e. RAW+RECO
  which for this challenge was combined into one file stream}, 
AOD\footnote{``Analysis Object Data'' (AOD) is a reduced collection of
  reconstruction products (i.e. DST)}, and Alignment/Calibration skim
  datasets known as ``AlCaReco'' skims
\item Splitting of an HLT-tagged sample into O(10) streams
\end{itemize} 
\item Distribution of all AOD and some FEVT to all participating
  Tier-1s (as well as to some Tier-2s)
\item Calibration jobs on Alignment/Calibration datasets at Tier-1, Tier-2 and CAF 
\item Re-reconstruction performed at a Tier-1 
\item Skim jobs at some Tier-1s with data propagated to Tier-2s 
\item Physics jobs at Tier-2s and Tier-1s on AOD and Reco 
\end{itemize}

While this is an exercise to test the data handling workflow under as
realistic conditions as possible, it is not explicitly required that
the software components are fully validated for physics performance at
the time of the challenge. However, where possible we tried to
maintain the maximum utility of the simulated, reconstructed, and
selected samples for the analysis component of the exercise.  The CMS
Computing Model is described elsewhere. 
%%\cite{}[Need references]


\section {Success Metric}

Success of the CSA06 challenge was pre-defined (June 2006) as meeting
a series of 
binary metrics (succeed/fail) as well as a list of quantitative
numbers based on the performances anticipated a half-year before the
CSA06 challenge (and Service Challenge 4) began.
The quantitative metrics 
are placed at two levels: a minimum 
threshold below which we consider a definite failure, and a goal which
is considered achievable if everything runs well. No specific goals
were placed on the number or results from the calibration, alignment,
and analysis exercises other than to meet the overall daily job
submission goal and to demonstrate the workflow associated with prompt
calibration. 

The metrics were chosen to exercise a variety of important elements in
the CMS computing model, though to enable the available effort to
concentrate on particular functionality not all areas were tested.
The metrics were chosen to ensure a broad participation of CMS
computing facilities, to enable the experiment to demonstrate
functionality critical to early experiment operations, and to
encourage physics analysis.

\subsection{Binary metrics} 
\begin{itemize}
\item Automatic FEVT+AOD transfer Tier-0 to Tier-1 via PhEDEx, the
  data placement tool
\item Automatic transfer of part of FEVT+AOD Tier-1 to Tier-2 via PhEDEx 
\item Offline DB accessible via FroNtier/Squid (a caching layer
  between the reconstruction jobs and the Oracle DB) at participating sites 
\item Insertion and use new constants in Offline DB 
\item User submission of analysis/calibration/skim jobs via the grid
  job submission tool CRAB and using the developed Dataset Bookkeeping
  Service (DBS) 
  and Data Location Service (DLS) 
\item Skim job output automatically moved to Tier-2 via PhEDEx 
\item Running re-reconstruction-like jobs at Tier-1 that access
  updated information from the offline DB and perform a new
  reconstruction on data distributed fromt he Tier-0 centre
\end{itemize}
 
\subsection {Quantitative metrics} 

\begin{itemize}
\item Number of participating Tier-1 -- Goal: 7 -- Threshold: 5 
\begin{itemize}
\item Passing requires 90\% uptime, or $<3$ days downtime during challenge
\end{itemize}
\item Number of participating Tier-2 -- Goal: 20 -- Threshold 15 
\item  Weeks of running at sustained rate -- Goal: 4 -- Threshold: 2
\begin{itemize}
\item This will be the period over which we measure the other metrics 
\end{itemize}
\item Tier-0 Efficiency -- Goal: 80\% -- Threshold: 30\% 
\begin{itemize}
\item Measured as unattended uptime fraction over 2 best weeks of the running period 
\end{itemize}
\item Running grid jobs (Tier-1 + Tier-2) per day (2h jobs typ.) -- Goal: 50K -- Threshold: 30K 
\item Grid job efficiency -- Goal: 90\% -- Threshold: 70\% 
\item Data serving capability at each participating site : Goal 1MB/sec/execution slot -- Threshold : 400 MB/sec (Tier-1) or 100 MB/sec (Tier-2) 
\item Data transfer Tier-0 to Tier-1 to tape -- Individual goals (threshold at
50\% of goal): 
\begin{itemize}
\item ASGC: 10MB/s 
\item CNAF: 25 MB/s 
\item FNAL: 50 MB/s 
\item GridKa: 20MB/s 
\item IN2P3: 25MB/s 
\item PIC: 10 MB/s 
\item RAL: 10MB/s 
\end{itemize}
\item Data transfer Tier-1 to Tier-2 -- Goal: 20MB/s into each Tier-2 -- Threshold: 5MB/s
\begin{itemize}
\item Overall ``success'' is to have 50\% of the participants at or above goal and 90\% above the threshold 
\item Several Tier-2s have better connectivity and we will have higher targets for those 
\item Goal for each Tier-2 is to demonstrate 50\% utilization of the WAN to the best connected Tier-1 
\begin{itemize}
\item list was defined after SC4 
\end{itemize}
\end{itemize}
\end{itemize}
Revision:	1.8
Committed:	Fri Feb 2 03:15:46 2007 UTC (18 years, 3 months ago) by acosta
Content type:	application/x-tex
Branch:	MAIN
Changes since 1.7:	+3 -2 lines
Log Message:	suggestions from others
#	User	Rev	Content
1	fisk	1.1	\section{Definition}
2
3			The combined Computing, Software, and Analysis challenge of 2006 is an
4			O(50) million event exercise to test the workflow and dataflow
5			associated with the data handling model of CMS. It is designed to be a
6			25\% capacity test of what is needed for operations in 2008. The main
7			components include:
8
9			\begin{itemize}
10	acosta	1.3	\item Preparation of large simulated datasets (some with High Level
11			Trigger tags)
12	fisk	1.1	\item Prompt reconstruction at Tier-0, including:
13			\begin{itemize}
14	acosta	1.7	\item Reconstruction at 40 Hz using the new CMSSW software framework
15			\item Application of calibration constants from the offline database (DB)
16			\item Generation of FEVT\footnote{``Full Event'' data, i.e. RAW+RECO
17			which for this challenge was combined into one file stream},
18			AOD\footnote{``Analysis Object Data'' (AOD) is a reduced collection of
19			reconstruction products (i.e. DST)}, and Alignment/Calibration skim
20			datasets known as ``AlCaReco'' skims
21	fisk	1.1	\item Splitting of an HLT-tagged sample into O(10) streams
22			\end{itemize}
23	acosta	1.8	\item Distribution of all AOD and some FEVT to all participating
24			Tier-1s (as well as to some Tier-2s)
25			\item Calibration jobs on Alignment/Calibration datasets at Tier-1, Tier-2 and CAF
26	fisk	1.1	\item Re-reconstruction performed at a Tier-1
27			\item Skim jobs at some Tier-1s with data propagated to Tier-2s
28			\item Physics jobs at Tier-2s and Tier-1s on AOD and Reco
29			\end{itemize}
30
31			While this is an exercise to test the data handling workflow under as
32			realistic conditions as possible, it is not explicitly required that
33			the software components are fully validated for physics performance at
34			the time of the challenge. However, where possible we tried to
35			maintain the maximum utility of the simulated, reconstructed, and
36			selected samples for the analysis component of the exercise. The CMS
37	acosta	1.7	Computing Model is described elsewhere.
38			%%\cite{}[Need references]
39	fisk	1.1
40
41
42			\section {Success Metric}
43
44	acosta	1.4	Success of the CSA06 challenge was pre-defined (June 2006) as meeting
45			a series of
46	fisk	1.1	binary metrics (succeed/fail) as well as a list of quantitative
47	acosta	1.7	numbers based on the performances anticipated a half-year before the
48			CSA06 challenge (and Service Challenge 4) began.
49			The quantitative metrics
50			are placed at two levels: a minimum
51	fisk	1.1	threshold below which we consider a definite failure, and a goal which
52			is considered achievable if everything runs well. No specific goals
53			were placed on the number or results from the calibration, alignment,
54	acosta	1.7	and analysis exercises other than to meet the overall daily job
55			submission goal and to demonstrate the workflow associated with prompt
56			calibration.
57	fisk	1.1
58	fisk	1.6	The metrics were chosen to exercise a variety of important elements in
59			the CMS computing model, though to enable the available effort to
60			concentrate on particular functionality not all areas were tested.
61			The metrics were chosen to ensure a broad participation of CMS
62			computing facilities, to enable the experiment to demonstrate
63			functionality critical to early experiment operations, and to
64			encourage physics analysis.
65
66	fisk	1.1	\subsection{Binary metrics}
67			\begin{itemize}
68	acosta	1.7	\item Automatic FEVT+AOD transfer Tier-0 to Tier-1 via PhEDEx, the
69			data placement tool
70			\item Automatic transfer of part of FEVT+AOD Tier-1 to Tier-2 via PhEDEx
71			\item Offline DB accessible via FroNtier/Squid (a caching layer
72			between the reconstruction jobs and the Oracle DB) at participating sites
73	fisk	1.1	\item Insertion and use new constants in Offline DB
74	acosta	1.7	\item User submission of analysis/calibration/skim jobs via the grid
75			job submission tool CRAB and using the developed Dataset Bookkeeping
76			Service (DBS)
77			and Data Location Service (DLS)
78			\item Skim job output automatically moved to Tier-2 via PhEDEx
79			\item Running re-reconstruction-like jobs at Tier-1 that access
80			updated information from the offline DB and perform a new
81			reconstruction on data distributed fromt he Tier-0 centre
82	fisk	1.1	\end{itemize}
83
84			\subsection {Quantitative metrics}
85
86			\begin{itemize}
87	acosta	1.7	\item Number of participating Tier-1 -- Goal: 7 -- Threshold: 5
88	fisk	1.1	\begin{itemize}
89	acosta	1.7	\item Passing requires 90\% uptime, or $<3$ days downtime during challenge
90	fisk	1.1	\end{itemize}
91	acosta	1.7	\item Number of participating Tier-2 -- Goal: 20 -- Threshold 15
92			\item Weeks of running at sustained rate -- Goal: 4 -- Threshold: 2
93	fisk	1.1	\begin{itemize}
94			\item This will be the period over which we measure the other metrics
95			\end{itemize}
96	acosta	1.7	\item Tier-0 Efficiency -- Goal: 80\% -- Threshold: 30\%
97	fisk	1.1	\begin{itemize}
98			\item Measured as unattended uptime fraction over 2 best weeks of the running period
99			\end{itemize}
100	acosta	1.7	\item Running grid jobs (Tier-1 + Tier-2) per day (2h jobs typ.) -- Goal: 50K -- Threshold: 30K
101			\item Grid job efficiency -- Goal: 90\% -- Threshold: 70\%
102			\item Data serving capability at each participating site : Goal 1MB/sec/execution slot -- Threshold : 400 MB/sec (Tier-1) or 100 MB/sec (Tier-2)
103			\item Data transfer Tier-0 to Tier-1 to tape -- Individual goals (threshold at
104	acosta	1.3	50\% of goal):
105	fisk	1.1	\begin{itemize}
106			\item ASGC: 10MB/s
107			\item CNAF: 25 MB/s
108			\item FNAL: 50 MB/s
109			\item GridKa: 20MB/s
110			\item IN2P3: 25MB/s
111			\item PIC: 10 MB/s
112			\item RAL: 10MB/s
113			\end{itemize}
114	acosta	1.7	\item Data transfer Tier-1 to Tier-2 -- Goal: 20MB/s into each Tier-2 -- Threshold: 5MB/s
115	fisk	1.1	\begin{itemize}
116	acosta	1.3	\item Overall ``success'' is to have 50\% of the participants at or above goal and 90\% above the threshold
117	acosta	1.7	\item Several Tier-2s have better connectivity and we will have higher targets for those
118			\item Goal for each Tier-2 is to demonstrate 50\% utilization of the WAN to the best connected Tier-1
119	fisk	1.1	\begin{itemize}
120	acosta	1.4	\item list was defined after SC4
121	fisk	1.1	\end{itemize}
122			\end{itemize}
123			\end{itemize}