COMP/CSA06DOC/intro.tex

\section{Definition} 

The combined Computing, Software, and Analysis challenge of 2006 is an
O(50) million event exercise to test the workflow and dataflow
associated with the data handling model of CMS. It is designed to be a
25\% capacity test of what is needed for operations in 2008. The main
components include:

\begin{itemize}
\item Preparation of large simulated datasets (some with High Level
  Trigger tags) 
\item Prompt reconstruction at Tier-0, including:
\begin{itemize} 
\item Reconstruction at 40 Hz using CMSSW software
\item Application of calibration constants from offline DB 
\item Generation of FEVT, AOD, and Alignment/Calibration skim datasets 
\item Splitting of an HLT-tagged sample into O(10) streams
\end{itemize} 
\item Distribution of all AOD and some FEVT to all participating Tier-1s 
\item Calibration jobs on Alignment/Calibration datasets at some Tier-1s and CAF 
\item Re-reconstruction performed at a Tier-1 
\item Skim jobs at some Tier-1s with data propagated to Tier-2s 
\item Physics jobs at Tier-2s and Tier-1s on AOD and Reco 
\end{itemize}

While this is an exercise to test the data handling workflow under as
realistic conditions as possible, it is not explicitly required that
the software components are fully validated for physics performance at
the time of the challenge. However, where possible we tried to
maintain the maximum utility of the simulated, reconstructed, and
selected samples for the analysis component of the exercise.  The CMS
Computing Model is described elsewhere. [Need references]


\section {Success Metric}

Success of the CSA06 challenge was pre-defined (June 2006) as meeting
a series of 
binary metrics (succeed/fail) as well as a list of quantitative
numbers, the latter of which are placed at two levels: a minimum
threshold below which we consider a definite failure, and a goal which
is considered achievable if everything runs well. No specific goals
were placed on the number or results from the calibration, alignment,
and analysis exercises.

The metrics were chosen to exercise a variety of important elements in
the CMS computing model, though to enable the available effort to
concentrate on particular functionality not all areas were tested.
The metrics were chosen to ensure a broad participation of CMS
computing facilities, to enable the experiment to demonstrate
functionality critical to early experiment operations, and to
encourage physics analysis.

\subsection{Binary metrics} 
\begin{itemize}
\item Automatic FEVT+AOD transfer T0 to T1 via PhEDEx 
\item Automatic transfer of part of FEVT+AOD T1 to T2 via PhEDEx 
\item Offline DB accessible via FroNtier/Squid at participating sites 
\item Insertion and use new constants in Offline DB 
\item User submission of analysis/calibration/skim jobs via CRAB via DBS/DLS 
\item Skim job output automatically moved to T2 via PhEDEx 
\item Running re-reconstruction-like jobs at T1 that access updated information from the offline DB
\end{itemize}
 
\subsection {Quantitative metrics} 

\begin{itemize}
\item Number of participating Tier-1 - Goal: 7 - Threshold: 5 
\begin{itemize}
\item Passing requires $<3$ days downtime during challenge
\end{itemize}
\item Number of participating Tier-2 - Goal: 20 - Threshold 15 
\item  Weeks of running at sustained rate - Goal: 4 - Threshold: 2
\begin{itemize}
\item This will be the period over which we measure the other metrics 
\end{itemize}
\item Tier-0 Efficiency - Goal: 80\% - Threshold: 30\% 
\begin{itemize}
\item Measured as unattended uptime fraction over 2 best weeks of the running period 
\end{itemize}
\item Running grid jobs (T1+T2) per day (2h jobs typ.) - Goal: 50K - Threshold: 30K 
\item Grid job efficiency - Goal: 90\% - Threshold: 70\% 
\item Data serving capability at each participating site : Goal 1MB/sec/execution slot - Threshold : 400 MB/sec (T1) or 100 MB/sec (T2) 
\item Data transfer T0 to T1 to tape - Individual goals (threshold at
50\% of goal): 
\begin{itemize}
\item ASGC: 10MB/s 
\item CNAF: 25 MB/s 
\item FNAL: 50 MB/s 
\item GridKa: 20MB/s 
\item IN2P3: 25MB/s 
\item PIC: 10 MB/s 
\item RAL: 10MB/s 
\end{itemize}
\item Data transfer T1 to T2 - Goal: 20MB/s into each T2 - Threshold: 5MB/s
\begin{itemize}
\item Overall ``success'' is to have 50\% of the participants at or above goal and 90\% above the threshold 
\item Several T2s have better connectivity and we will have higher targets for those 
\item Goal for each T2 is to demonstrate 50\% utilization of the WAN to the best connected T1 
\begin{itemize}
\item list was defined after SC4 
\end{itemize}
\end{itemize}
\end{itemize}
Revision:	1.6
Committed:	Sun Jan 28 02:46:15 2007 UTC (18 years, 3 months ago) by fisk
Content type:	application/x-tex
Branch:	MAIN
Changes since 1.5:	+8 -0 lines
Log Message:	Added short section about metric choices
#	User	Rev	Content
1	fisk	1.1	\section{Definition}
2
3			The combined Computing, Software, and Analysis challenge of 2006 is an
4			O(50) million event exercise to test the workflow and dataflow
5			associated with the data handling model of CMS. It is designed to be a
6			25\% capacity test of what is needed for operations in 2008. The main
7			components include:
8
9			\begin{itemize}
10	acosta	1.3	\item Preparation of large simulated datasets (some with High Level
11			Trigger tags)
12	fisk	1.1	\item Prompt reconstruction at Tier-0, including:
13			\begin{itemize}
14			\item Reconstruction at 40 Hz using CMSSW software
15			\item Application of calibration constants from offline DB
16			\item Generation of FEVT, AOD, and Alignment/Calibration skim datasets
17			\item Splitting of an HLT-tagged sample into O(10) streams
18			\end{itemize}
19			\item Distribution of all AOD and some FEVT to all participating Tier-1s
20			\item Calibration jobs on Alignment/Calibration datasets at some Tier-1s and CAF
21			\item Re-reconstruction performed at a Tier-1
22			\item Skim jobs at some Tier-1s with data propagated to Tier-2s
23			\item Physics jobs at Tier-2s and Tier-1s on AOD and Reco
24			\end{itemize}
25
26			While this is an exercise to test the data handling workflow under as
27			realistic conditions as possible, it is not explicitly required that
28			the software components are fully validated for physics performance at
29			the time of the challenge. However, where possible we tried to
30			maintain the maximum utility of the simulated, reconstructed, and
31			selected samples for the analysis component of the exercise. The CMS
32			Computing Model is described elsewhere. [Need references]
33
34
35
36			\section {Success Metric}
37
38	acosta	1.4	Success of the CSA06 challenge was pre-defined (June 2006) as meeting
39			a series of
40	fisk	1.1	binary metrics (succeed/fail) as well as a list of quantitative
41			numbers, the latter of which are placed at two levels: a minimum
42			threshold below which we consider a definite failure, and a goal which
43			is considered achievable if everything runs well. No specific goals
44			were placed on the number or results from the calibration, alignment,
45			and analysis exercises.
46
47	fisk	1.6	The metrics were chosen to exercise a variety of important elements in
48			the CMS computing model, though to enable the available effort to
49			concentrate on particular functionality not all areas were tested.
50			The metrics were chosen to ensure a broad participation of CMS
51			computing facilities, to enable the experiment to demonstrate
52			functionality critical to early experiment operations, and to
53			encourage physics analysis.
54
55	fisk	1.1	\subsection{Binary metrics}
56			\begin{itemize}
57			\item Automatic FEVT+AOD transfer T0 to T1 via PhEDEx
58			\item Automatic transfer of part of FEVT+AOD T1 to T2 via PhEDEx
59			\item Offline DB accessible via FroNtier/Squid at participating sites
60			\item Insertion and use new constants in Offline DB
61			\item User submission of analysis/calibration/skim jobs via CRAB via DBS/DLS
62			\item Skim job output automatically moved to T2 via PhEDEx
63			\item Running re-reconstruction-like jobs at T1 that access updated information from the offline DB
64			\end{itemize}
65
66			\subsection {Quantitative metrics}
67
68			\begin{itemize}
69			\item Number of participating Tier-1 - Goal: 7 - Threshold: 5
70			\begin{itemize}
71	acosta	1.2	\item Passing requires $<3$ days downtime during challenge
72	fisk	1.1	\end{itemize}
73			\item Number of participating Tier-2 - Goal: 20 - Threshold 15
74	acosta	1.4	\item Weeks of running at sustained rate - Goal: 4 - Threshold: 2
75	fisk	1.1	\begin{itemize}
76			\item This will be the period over which we measure the other metrics
77			\end{itemize}
78			\item Tier-0 Efficiency - Goal: 80\% - Threshold: 30\%
79			\begin{itemize}
80			\item Measured as unattended uptime fraction over 2 best weeks of the running period
81			\end{itemize}
82			\item Running grid jobs (T1+T2) per day (2h jobs typ.) - Goal: 50K - Threshold: 30K
83			\item Grid job efficiency - Goal: 90\% - Threshold: 70\%
84	acosta	1.4	\item Data serving capability at each participating site : Goal 1MB/sec/execution slot - Threshold : 400 MB/sec (T1) or 100 MB/sec (T2)
85			\item Data transfer T0 to T1 to tape - Individual goals (threshold at
86	acosta	1.3	50\% of goal):
87	fisk	1.1	\begin{itemize}
88			\item ASGC: 10MB/s
89			\item CNAF: 25 MB/s
90			\item FNAL: 50 MB/s
91			\item GridKa: 20MB/s
92			\item IN2P3: 25MB/s
93			\item PIC: 10 MB/s
94			\item RAL: 10MB/s
95			\end{itemize}
96			\item Data transfer T1 to T2 - Goal: 20MB/s into each T2 - Threshold: 5MB/s
97			\begin{itemize}
98	acosta	1.3	\item Overall ``success'' is to have 50\% of the participants at or above goal and 90\% above the threshold
99	fisk	1.1	\item Several T2s have better connectivity and we will have higher targets for those
100	acosta	1.5	\item Goal for each T2 is to demonstrate 50\% utilization of the WAN to the best connected T1
101	fisk	1.1	\begin{itemize}
102	acosta	1.4	\item list was defined after SC4
103	fisk	1.1	\end{itemize}
104			\end{itemize}
105			\end{itemize}