COMP/CSA06DOC/tier12ops.tex

\section{Tier-1 and Tier-2 Operations}

\subsection{Data Transfers}

The Tier-1 centers were expected to receive data from CERN at a rate
proportional to the 25\% of the 2008 pledge rate and serve the data to
Tier-2 centers.  The expected rate into the Tier-1 centers is shown in
Table~\ref{tab:tier01pledge}. Note that while the listed rates are
significantly less than the bandwidth to the WAN (see
Table~\ref{tab:tier1resources}), they fit within the storage
capability available for a 30 day challenge. 

\begin{table}[htb]
\centering
\caption{Expect transfer rates from CERN to Tier-1 centers based on the MOU pledges.}
\label{tab:tier01pledge}
\begin{tabular}{|l|l|l|}
\hline
Site & Goal Rates (MB/s) & Threshold Rates (MB/s) \\
\hline
ASGC & 15 & 7.5 \\
CNAF & 25 & 12.5 \\
FNAL & 50 & 25 \\
GridKa &  25 & 12.5 \\
IN2P3 & 25 & 12.5 \\ 
PIC & 10 & 5 \\
RAL & 10 & 5 \\
\hline
\end{tabular}
\end{table}

The Tier-2 centers are expected in the computing model \cite{model, ctdr} to transfer
from the Tier-1 centers in bursts.  The goal rate in CSA06 was 20MB/s,
with a threshold for success of 5MB/s.  Achieving these metrics in the
computing model was defined as hitting the transfer rate for a 24 hour
period.  At the beginning of CSA06 CMS concentrated primarily on
moving data from the ``associated'' Tier-1 centers to the Tier-2s.  By
the end of the challenge most of the Tier-1 to Tier-2 permutations had
been attempted.

The total data transferred between sites in CSA06 is shown in
Figure~\ref{fig:totaltran}.  This plot only includes wide area data
transfers, additionally data was moved onto tape at the majority of
Tier-1 centers.  Over the 45 days of the challenge CMS was able to
move more than 1 petabyte of data over the wide area.

\begin{figure}[htp]
   \begin{center}
      \includegraphics[width=0.7\linewidth]{figs/CSA06_CumTran}
   \end{center}
   \caption{The cumulative data volume transferred during CSA06 in TB.}
   \label{fig:totaltran}
\end{figure}

Timeline:
\begin{itemize}

\item October 2, 2006: The Tier-0 to Tier-1 transfers began on the
first day of the challenge.  In the first few hours 6 of 7 Tier-1
centers successfully received data.  During the first week only
minimum bias was reconstructed and at 40Hz the total rate out of the
CERN site does not meet the 150MB/s target rate.

\item October 3, 2006: All 7 Tier-1 sites were able to successfully
receive data and 8 Tier-2 centers were subscribed to data samples:
Belgium IIHE, UC San Diego, Wisconsin, Nebraska, DESY, Aachen, and
Estonia.   There were successful transfers to 6 Tier-2 sites.

\item October 4, 2006: An additional 11 Tier-2 sites were subscribed
to data samples: Pisa, Purdue, CIEMAT, Caltech, Florida, Rome, Bari,
CSCS, IHEP, Belgium UCL, and Imperial College.  Of the 19 registered
Tier-2 sites, 12 were able to receive data.  Of those, 5 exceeded the
goal transfer rates for over an hour, and an additional 3 were over
the threshold rate.

\item October 5, 2006: Three additional Tier-2s were added increasing
the number of participating sites above the goal rate of 20 Tier-2
centers.  New hardware installed at IN2P3 for CSA06 began to exhibit
stability problems leading to poor transfer efficiency.   

\item October 9, 2006: RAL transitioned from a dCache SE to a Castor2
SE.  The signal samples began being reconstructed at the Tier-0.

\item October 10-12, 2006: The Tier-1 sites had stable operations
through the week at an aggregate rate of approximately 100MB/s from
CERN. IFCA has joined the Tier-1 to Tier-2 transfers and their average
transfer rate over the day was observed at 14MB/s at a low error 
rate.

\item October 13, 2006: Multiple subscriptions of the minimum bias
samples were made to some of the Tier-1 centers to increase the total
rate of data transfer from CERN.  The number of participating Tier-2
sites increased to 23.

\item October 18, 2006: The PhEDEx transfer system held a lock in the
Oracle database which blocked other agents from continuing with
transfers.  This problem appeared more frequently in the latter half
of the challenge when the load was higher.

\item October 20, 2006: The reconstruction rate was increased at the
Tier-0 to improve the output from CERN and to better exercise the
prompt reconstruction farm.  The data rate from CERN approximately
doubles.  An average rate over an hour of 600MB/s from CERN was
achieved.

\item October 25, 2006: The transfer rate from CERN was large with
daily average rates of 250MB/s-300MB/s.  The first observation of
transfer backlogs begin to appear.

\item October 30, 2006: Data reconstruction at the Tier-0 stopped.

\item October 31, 2006: PIC and ASGC finished transferring the assigned prompt reconstruction data from CERN.

\item November 2, 2006: FNAL and IN2P3 also completed the transfers. 

\item November 3, 2006: RAL completed the transfers. The first of the
Tier-1 to any Tier-2 transfer validation began.  The test involved
sending a small sample from a Tier-1 site to a validated Tier-2, in
the test case DESY, and then sending a small sample to all Tier-2
sites.

\item November 5, 2006: CNAF completed the Tier-0 transfers

\item November 6, 2006: The Tier-1 to Tier-2 transfer testing continued.

\item November 9, 2006: GridKa completed the Tier-0 transfers

\end{itemize} 


\subsubsection{Transfers to Tier-1 Centers}

During CSA06 the Tier-1 centers met the transfer rate goals.  In the
first week of the challenge using minimum bias events the total volume
of data out of CERN did not amount to 150MB/s unless the datasets were
subscribed to multiple sites.  After the reconstruction rate was
increased at the Tier-0 the transfer rate easily exceeded the 150MB/s
target.  The 30 day and 15 day averages are shown in
Table~\ref{tab:tier01csa06}.  For the thirty day average all sites
except two exceeded the goal rate and for the final 15 days all sites
easily exceed the goal.  Several sites doubled and tripled the goal
rate during the final two weeks of high volume transfers.

The WLCG metric for availability this year is 90\% for the Tier-1
sites.  If we apply this to the Tier-1 participating in CSA06
transfers we have 6 of 7 Tier-1s reaching the availability goal.

\begin{table}[htb]
\caption{Transfer rates during CSA06 between CERN and Tier-1 centers and the number of outage days during the active challenge activities.    In the MSS column the parentheses indicates the site either had scaling issues keeping up with the total rate to tape, or transferred only a portion of the data to tape.}
\label{tab:tier01csa06}
\begin{tabular}{|l|r|r|r|r|c|}
\hline
Site & Anticipated Rate (MB/s) & last 30 day average & last 15 day average & Outage (Days) & MSS used \\
\hline
ASGC & 15MB/s & 17MB/s & 23MB/s & 0 & (Yes) \\
CNAF & 25MB/s & 26MB/s & 37MB/s & 0 & (Yes) \\
FNAL & 50MB/s & 68MB/s & 98MB/s & 0 & Yes  \\
GridKa &  25MB/s & 23MB/s & 28MB/s & 3 & No \\
IN2P3 & 25MB/s & 23MB/s & 34MB/s & 1 & Yes \\ 
PIC & 10MB/s & 22MB/s & 33MB/s & 0 & No \\
RAL & 10MB/s & 23MB/s & 33MB/s & 2 & Yes \\
\hline
\end{tabular}
\end{table}


The rate of data transferred averaged over 24 hours and the volume of
data transferred in 24 hours are shown in Figures~\ref{fig:tier01rate}
and~\ref{fig:tier01vol}.  The start of the transfers during the first
week is visible on the left side of the plot as well as the transfers
not reaching the target rate shown as a horizontal red bar.  The twin
peaks in excess of 300MB/s and 25TB of data moved correspond to the
over-subscription of data.  The bottom of the graph has indicators of
the approximate Tier-0 reconstruction rate.  Both the rate and the
volume figures show clearly the point when the Tier-0 trigger rate was
doubled to 100Hz.  The daily average exceeded 350MB/s with more than
30TB moved.  The hourly averages from CERN peaked at more than
650MB/s.

\begin{figure}[htp]
   \begin{center}
      \includegraphics[width=0.7\linewidth]{figs/Tier01rate}
   \end{center}
  \caption{The rate of data transferred between the Tier-0 to the Tier-1 centers in MB per second.} 
  \label{fig:tier01rate}
\end{figure}


\begin{figure}[htp]
   \begin{center}
      \includegraphics[width=0.7\linewidth]{figs/Tier01vol}
   \end{center}
   \caption{The total volume of data transferred between the Tier-0 to the Tier-1 centers in TB per day.}
   \label{fig:tier01vol}
\end{figure}

The transferrable volume plot shown in Figure~\ref{fig:tier01queue} is an
indicator of how well the sites are keeping up with the volume of data
from the Tier-0 reconstruction farm.  During the first three weeks of
the challenge almost no backlog of files is accumulated by the Tier-1
centers.  A hardware failure at IN2P3 resulted is a small
accumulation.  The additional data subscriptions leads to a spike in
data to transfer, but is quickly cleared by the Tier-1 sites.  The
most significant volumes of data waiting for transfer come at the end
of the challenge.  During this time GridKa has performed a dCache
storage upgrade that resulted in a large accumulation of data to
transfer.  CNAF suffered a file server problem that reduced the amount
of available hardware.  Additionally RAL turned off the import system
for two days over a weekend to demonstrate the ability to recover from
a service interruption.  The Tier-1 issues combined with PhEDEx
database connection interruptions under the heavy load of the final
week of transfers to accumulate a backlog of approximately 50TB over
the final days of the heavy challenge transfers.  During this time
CERN continued to serve data at 350MB/s on average.


\begin{figure}[htp]
   \begin{center}
      \includegraphics[width=0.7\linewidth]{figs/Tier01queue}
   \end{center}
   \caption{The total volume of data waiting for transfer between the Tier-0 to the Tier-1 centers in TB per day.}
   \label{fig:tier01queue}
\end{figure}

The CERN to Tier-1 transfer quality is shown in
Figure~\ref{fig:tier01qual}.  In CMS the transfer quality is defined
as the number of times a transfer has to be attempted before it
successfully completes.  The link between two sites with 100\%
transfer quality would have had to attempt each transfer once, while a
10\% transfer quality would indicate each transfer had to be attempted
ten times to successfully complete.  Most transfers eventually
complete, having low transfer quality uses the transfer resources
inefficiency and usually results in a low utilization of the network.

\begin{figure}[htp]
   \begin{center}
      \includegraphics[width=0.7\linewidth]{figs/qualt0t1}
   \end{center}
   \caption{Transfer quality between CERN and Tier-1 centers over 30 days}
   \label{fig:tier01qual}
\end{figure}


The transfer quality plot compares very favorably to equivalent plots
made during the spring.  The CERN Castor2 storage element performed
very stably throughout the challenge.  There were two small
configuration issues that were very promptly addressed by the experts.
The Tier-1s also performed well throughout the challenge with several
24 hour periods to specific Tier-1s with no transfer errors.  The
stability of the RAL SE before the transition to CASTOR2 can be seen
at the left side of the plot, as well as the intentional downtime to
demonstrate recovery on the right side of the plot.  The IN2P3
hardware problems are visible during the first week and the GridKa
dCache upgrade is clearly visible during the last week.  Most of the
other periods are solidly green. Both FNAL and PIC are above 70\%
efficient for every day of the challenge activities.


Tier-1 to Tier-1 transfers were considered to be beyond the scope of
CSA06, though the dataflow exists in the CMS computing model.  During
CSA06 we had an opportunity to test Tier-1 to Tier-1 transfers while
recovering from backlogs of data when the samples were subscribed to
mulitple sites.  PhEDEx is designed to take the data from source site
where it can be efficiently transferred from.  Figure~\ref{fig:t1t1}
shows the total Tier-1 to Tier-1 transfers during CSA06.  With 7
Tier-1s there are 84 permutations of Tier-1 to Tier-1 transfers,
counting each direction separately.  During CSA06 we successfully
exercised about half of them.

\begin{figure}[htp]
   \begin{center}
      \includegraphics[width=0.7\linewidth]{figs/T1T1Rate}
   \end{center}
   \caption{Transfer rate between Tier-1 centers during CSA06}
   \label{fig:t1t1}
\end{figure}

\subsubsection{Transfers to Tier-2 Centers}
In the CMS computing model the Tier-2s are expected to be able to
receive data from any Tier-1 site.  In order to simplify CSA06
operations we began by concentrating on transfers from the
``Associated'' Tier-1 sites, and in the final two weeks of the
challenge began a concerted effort on transfers from any Tier-1.  The
associated Tier-1 center is the center operating the File Transfer
Service (FTS) server and hosting the channels for Tier-2 transfers.   

The Tier-2 transfer metrics involved both participation and
performance.  For CSA06 CMS had 27 sites that signed up to participate
in the challenge.  Participation was defined as having successful
transfers 80\% of the days during the challenge.  By this metric there
were 21 sites that succeeded in participating in the challenge, which
is above the goal of 20.

The Tier-2 transfer performance goals were 20MB/s and the threshold
was 5MB/s.  In the CMS computing model the Tier-2 transfers are
expected to occur in bursts.  Data will be transferred to refresh a
Tier-2 cache, and then will be analyzed locally.  The Tier-2 sites
were not expected to hit the goal transfer rates continuously
throughout the challenge.  There were 12 sites that successfully
averaged above the goal rate for at least one 24 hour period, and an
additional 8 sites that rated averaged the threshold rate for at least
one 24 hour period.   

The transfer rate over the 30 most active transfer days is shown in
Figure~\ref{fig:tier12rate}.  The aggregate rate from Tier-1 to
Tier-2 centers was not as high as the total rate from CERN, which is
not an accurate reflection of the transfers expected from the CMS
computing model.  In the CMS computing model there is more data
exported from the Tier-1s to the Tier-2s than total raw data coming
from CERN because data is sent to multiple Tier-2s and the Tier-2s may
flush data from the cache and reload at a later time.  In CSA06 the
Tier-2 centers were subscribed to specific samples at the beginning
and then specific skims when available.

\begin{figure}[htp]
   \begin{center}
      \includegraphics[width=0.7\linewidth]{figs/tier12rate}
   \end{center}
   \caption{Transfer rate between Tier-1 and Tier-2 centers during the first 30 days of CSA06}
   \label{fig:tier12rate}
\end{figure}

The ability of the Tier-1 centers to export data was successfully
demonstrated during the challenge, but several sites indicated
interference between receiving and exporting data.  The quality of the
Tier-1 to Tier-2 data transfers is shown in Figure~\ref{fig:tier12qual}.
The quality is not nearly as consistently green as the CERN to Tier-1
plots, but the variation has a number of causes.  Not all of the
Tier-1 centers are currently exporting data as efficiently as CERN,
especially in the presence of a high load of data ingests, in addition
most of the Tier-2 sites do not have as much operational experience
receiving data as the Tier-1 sites do.

The Tier-1 to Tier-2 transfer quality looks very similar to the CERN
to Tier-1 transfer quality of 9-12 months ago.  With a concerted
effort the Tier-1 to Tier-2 transfers should be able to reach the
quality of the current CERN to Tier-1 transfers before they are needed
to move large qualities of experiment data to users.

\begin{figure}[htp]
   \begin{center}
      \includegraphics[width=0.7\linewidth]{figs/tier12qual}
   \end{center}
   \caption{Transfer quality between Tier-1 and Tier-2 centers during the first 30 days of CSA06}
   \label{fig:tier12qual}
\end{figure}

There are a number of very positive examples of Tier-1 to Tier-2
transfers.  Figure~\ref{fig:picqual} shows the results of the Tier-1
to all Tier-2 tests when PIC was the source of the dataset.  A small
skim sample was chosen and within 24 hours 20 sites had successfully
received the dataset.  The transfer quality over the 24 hour period
remained high with success transfers to all four continents
participating in CMS.

\begin{figure}[htp]
   \begin{center}
      \includegraphics[width=0.7\linewidth]{figs/PICQual}
   \end{center}
   \caption{Transfer quality between PIC and Tier-2 sites participating in the dedicated Tier-1 to Tier-2 transfer tests}
   \label{fig:picqual}
\end{figure}

Figure~\ref{fig:fnalrate} is an example of the very high export rates
the tier-1 centers were able to achieve transferring data to Tier-2
centers.  The peak rate on the plot is over 5Gb/s, which was
independently verified by the site network monitoring.  This rate is
over 50\% of the anticipated Tier-1 data export rate expected in the
full sized system.

\begin{figure}[htp]
   \begin{center}
      \includegraphics[width=0.7\linewidth]{figs/FNAL_Rate}
   \end{center}
   \caption{Transfer Performance between FNAL and Tier-2 sites
     participating in the 
%dedicated 
Tier-1 to Tier-2 transfer tests}
   \label{fig:fnalrate}
\end{figure}

Figure~\ref{fig:FZK_DESY} is an example of the very high rates achieved at both Tier-1 export and Tier-2 import observed in CSA06.    The plot shows both the hourly average and the instantaneous rate.   DESY achieved an import rate to disk of higher than 400MB/s.

\begin{figure}[ht]
\begin{center}
$\begin{array}{c@{\hspace{1in}}c}
 \includegraphics[width=0.4\linewidth]{figs/FZK_DESY_1} &
\includegraphics[width=0.4\linewidth]{figs/FZK_DESY_2}  \\ 
%[-0.53cm]
\end{array}$
\end{center}
\caption{The plot on the left is the hourly average transfer rate between GridKa and DESY.   The plot on the right is the instantaneous rate between the two sites measured with Ganglia.}
\label{fig:FZK_DESY}
\end{figure}


\subsection{Tier-1 Skim Job Production}
\label{sec:skims}

CSA tested the workflow to reduce primary datasets to manageable sizes
  for analyses. Four production teams provided centralized skim job workflow at
  the Tier-1 centers. The produced secondary datasets are registered
  into Dataset Bookkeeping Service and accessed like any other data. 
  Common skim job tools were prepared based on Monte Carlo generator
  information and reconstruction output, and both types were tested 
 (see Section~\ref{sec:filtering}). There was
overwhelming response from the  analysis demonstrations, and about
25 filters producing nearly 60 datasets were run as compiled in
  Table~\ref{tab:tier1skim}. A 
variety of output formats for the secondary datasets were used (FEVT,
  RECO, AOD, AlCaReco), and the selected number of 
events range from $<1\%$ to $100\%$. Secondary dataset sizes ranged
  from $<1$~GB to 2.5 TB. No requirement was imposed beforehand on the
  restrictiveness of the filters for CSA06, hence those with very low
  efficiencies are probably tighter than what one would apply in
  practice.  

\begin{table}[phtb]
\centering
\caption{List of requested skim filters to run during CSA06 by group,
  filter name, primary input dataset, efficiency, and input/output
  data formats.}
\label{tab:tier1skim}
\begin{tabular}{|l|l|l|l|l|l|}
\hline
Group &  Filter &        Samples &       Efficiencies &  Input
format &         Output format \\
\hline
hg &    CSA06\_Tau\_Zand1lFilter.cfg &  EWK &   14\% &  FEVT &
RECOSim \\
hg &    CSA06\_HiggsTau\_1lFilter.cfg & EWK &   36\% &  FEVT &
RECOSim \\
hg &    CSA06\_HiggsTau\_1lFilter.cfg & T-Tbar &        47\% &  FEVT &
RECOSim \\
hg &    CSA06\_HiggsWW\_WWFilter.cfg (bkgnd)&   EWK &   1\% &   FEVT &
FEVT \\
hg &    CSA06\_HiggsWW\_WWFilter.cfg (signal)&  EWK &   1\% &   FEVT &
FEVT \\
hg &    CSA06\_HiggsWW\_TTb\_Filter.cfg &       T-Tbar &        4\% &   FEVT &
FEVT \\
hg &    CSA06\_Higgs\_mc2l\_Filter.cfg &        EWK &   10\% &  FEVT &
RECOSim \\
hg &    CSA06\_Higgs\_mc2l\_Filter.cfg &        Jets &  2\% &   FEVT &
RECOSim \\
hg &    CSA06\_Higgs\_mc2l\_Filter.cfg &        HLT(e,mu) &     1\% &
FEVT &  RECOSim \\
hg &    CSA06\_Higgs\_mc2gamma\_Filter.cfg &    EWK     &0 &    FEVT &
RECOSim \\
hg &    CSA06\_Higgs\_mc2gamma\_Filter.cfg &    Jets    & 34\% &
FEVT &  RECOSim \\
hg &    CSA06\_Higgs\_mc2gamma\_Filter.cfg &    HLT(gam) & 
0.4\% &         FEVT    & RECOSim \\ 
hg &    CSA06\_Higgs\_mc2l\_Filter.cfg &        TTbar & 14\% &  FEVT &  RECOSim\\
hg &    CSA06\_Higgs\_mc2gamma\_Filter.cfg & TTbar &    8\% &   FEVT &
RECOSim\\ \hline
sm &    CSA06\_TTbar\_1lFilters.cfg (skim1efilter) &    T-Tbar  & 20\%
&       FEVT &  RECOSim \\ 
sm &    CSA06\_TTbar\_1lFilters.cfg (skim1mufilter) &   T-Tbar  & 20\%
&       FEVT &  RECOSim \\
sm &    CSA06\_TTbar\_1lFilters.cfg (skim1taufilter) &  T-Tbar  & 20\%
&       FEVT &  RECOSim \\
sm &    CSA06\_TTbar\_dilepton.cfg &    T-Tbar  & $\sim$10\% &  FEVT &
RECOSim \\
sm &    CSA06\_MinimumBiasSkim.cfg &    minbias &       100\%  &        FEVT &  RECOSim \\
sm &    CSA06\_UnderlyingEventJetsSkim.cfg (reco) &     Jets  &
$\sim$100\% &   FEVT &  RECOSim \\
sm &    CSA06\_UnderlyingEventDYSkim.cfg &      EWK &   $\sim$10\% &
FEVT &  RECOSim \\ \hline
eg &    CSA06\_ZeeFilter.cfg (zeeFilter) &      EWK &   3\% &   FEVT &
RECOSim \\
eg &    CSA06\_ZeeFilter.cfg (AlCaReco) &       EWK &   3\% &   FEVT &
AlcaReco \\ 
eg &     CSA06\_AntiZmmFilter.cfg &      Jets &  85\% &  FEVT &
FEVT\\ \hline
mu &    CSA06\_JPsi\_mumuFilter.cfg &   SoftMuon & 50\% &       FEVT &
FEVT \\
mu &    CSA06\_JPsi\_mumuFilter.cfg &   Zmumu & 50\% &  FEVT &  FEVT \\
mu &    CSA06\_JPsi\_mumuFilter.cfg &   EWK &   10\% &  FEVT &  FEVT \\
mu &    CSA06\_WmunuFilter.cfg (reco) & EWK &   20\% &  FEVT &  AODSim \\
mu &    CSA06\_WmunuFilter.cfg (reco) & SoftMuon &      60\% &  FEVT &
AODSim \\
mu &    CSA06\_ZmmFilter.cfg &  Zmumu & 50\% &  FEVT &  RECOSim \\
mu &    CSA06\_ZmmFilter.cfg &  Jets &  -- &    FEVT &  FEVT \\ 
mu &    recoDiMuonExample.cfg (reco) &  EWK &   20\% &  FEVT & RECOSim \\
mu &    recoDiMuonExample.cfg (reco) &  Zmumu & 67\% &  FEVT &
RECOSim \\ \hline
su &    CSA06\_Exotics\_LM1Filter.cfg & Exotics &       39\% &  FEVT &
FEVT \\
su &    CSA06\_BSM\_mc2e\_Filter.cfg &  Exotics &       2\% &   FEVT &
FEVT \\
su &    CSA06\_BSM\_mc2e\_Filter.cfg &  EWK &   $\sim 40\%$ &   FEVT &
FEVT \\
su &    CSA06\_BSM\_mc2e\_Filter.cfg &  HLT(e) &        -- &    FEVT &  FEVT \\
su &    CSA06\_Exotics\_ZprimeDijetFilter.cfg & Exotics &
$\sim$30\% &    FEVT &  FEVT \\
su &    CSA06\_Exotics\_QstarDijetFilter.cfg &  Exotics &
$\sim$20\% &    FEVT &  FEVT \\
su &    CSA06\_Exotics\_XQFilter.cfg &  Exotics &       22\% &  FEVT &
FEVT \\
su &    CSA06\_Exotics\_ZprimeFilter.cfg &      Exotics &       39\% &
FEVT &  FEVT \\
su &    CSA06\_Exotics\_LM1\_3IC5Jet30Filter.cfg (reco) &       Exotics &
$25\%$ &        FEVT &  FEVT \\
su &    CSA06\_TTbar\_2IC5Jet100ExoFilter.cfg (reco) &  T-Tbar & 
5\% &   FEVT &  FEVT \\ \hline
jm &    CSA06\_QCD\_Skim.cfg (21 samples) &     Jets &  100\% &
FEVT &  FEVT \\ \hline

\end{tabular}
\end{table}


\subsection{Tier-1 Re-Reconstruction}
\label{sec:rereco}

The goal was to demonstrate re-reconstruction at a Tier-1 centre on
files first reconstructed and distributed by the Tier-0 centre,
including access and application of new constants from the
offline DB. Four teams were set up to demonstrate re-reconstruction on
at least 100K events at each of the Tier-1 centres. 

\subsubsection{Baseline Approach}

Since re-reconstruction had not been tested before the start of CSA06,
a technical problem was encountered with a couple of reconstruction
modules when re-reconstructed was first attempted November 4. The
issue has to do with multiple reconstruction products stored in the
Event, and the proper mechanism of accessing them. Once diagnosed the
Tier-1 re-reconstruction workflow dropped
pixel tracking and vertexing out of about 100 reconstruction modules,
and the processing worked correctly.
Re-reconstruction was demonstrated on $>$100K events at 6 Tier-1 centres.
For the Tracker and ECAL calibration exercises (see
Section~\ref{sec:calib}), new constants inserted 
into the offline DB were used for the re-reconstruction, and the
resulting datasets were
published and accessible to CRAB jobs. Thus, CSA06 also demonstrated
the full reprocessing workflow.

\subsubsection{Two-Step Approach}

While the reconstruction issue described above was being diagnosed, a
brute-force two-step procedure was conducted in parallel to ensure
re-reconstruction at a Tier-1 centre. The approach consisted of first
skimming off the original Tier-0 reconstruction products in analogy
with the physics skim job workflow described in
Section~\ref{sec:skims}, and then run reconstruction on the skimmed
events (i.e. two ProdAgent workflows). This approach was also
successfully demonstrated at the FNAL Tier-1 centre.


\subsection{Job Execution at Tier-1 and Tier-2}
\subsubsection{Job Robot}
The processing metrics in CSA06 as they were defined foresaw that sites 
offering computing capacity to CMS and participating in CSA06 were expected 
to complete an aggregate of 50k jobs per day. The goal was to exercise the
job submission infrastructure and to monitor the input/output rate.

\begin{itemize}
\item About 10k per day were intended as skimming and reconstruction jobs
at the Tier-1 centers
\item About 40k per day were expected to be a combination of user submitted 
analysis jobs and robot submitted analysis-like jobs
\end{itemize}  

The job robots are automated expert systems to simulate user analysis tasks
using the CMS Remote Analysis Builder (CRAB). Therefore they provide a reasonable 
method to generate load on the system by running analysis on all datasamples 
at all sites individually. They consist of a component/agent based
structure which enables parallel execution. Job distribution to CMS compute
resources is accomplished by using Condor-G direct submission on the OSG sites 
and gLite bulk submission on the EGEE sites.\\

The job preparation phase comprises four distinct steps
\begin{itemize}
\item Job creation
\begin{itemize}
\item Data discovery using DBS/DLS
\item Job splitting according to user requirements
\item Preparation of job dependent files (incl. the jdl)
\end{itemize}
\item Job submission
\begin{itemize}
\item Check if there any compatible resources in the Grid Information System known to the submission system
\item Submit job to the Grid submission component (Resource Broker or Condor-G) through the CMS bookkeeping component (BOSS)
\end{itemize}
\item Job status check
\item Job output retrieval
\begin{itemize}
\item Retrieve job output from the sandbox located on the Resource Broker (EGEE sites) or the common filesystem (OSG sites)
\end{itemize}
\end{itemize}
  
The job robot executes all four steps of the above described workflow on a large scale.\\

Apart from job submission the monitoring of the job execution over the
entire chain of all steps involved plays an important role. CMS has 
chosen to use a product called Dashboard, a development that is part
of the CMS Integration Program. It is a joint effort of LCG's
ARDA project and the MonAlisa team in close collaboration with the CMS
developers working on job submission tools for production and analysis.
The objective of the Dashboard is to provide a complete view of the CMS
activity independently of the Grid flavour (i.e. OSG vs. EGEE). The
Dashboard maintains and displays the quantitative characteristics of the 
usage pattern by including CMS-specific information and it reports problems
of various nature.\\

The monitoring information used in CSA06 is available via a web interface 
and includes the following categories
\begin{itemize}
\item Quantities - how many jobs are running, pending, successfully
completed, failed, per user, per site, per input data collection, and
the distribution of these quantities over time
\item Usage of the resources (CPU, memory consumption, I/O rates), and
distribution over time with aggregation on different levels
\item Distribution of resources between different application areas
(i.e. analysis vs. production), different analysis groups and individual
users
\item Grid behaviour - success rate, failure reasons as a function of time,
site and data collection
\item CMS application behaviour
\item Distribution of data samples over sites and analysis groups
\end{itemize}   

Timeline:
\begin{itemize}
\item October 15, 2006: The job robots have started analysis submission. 10k
jobs were submitted by two robot instances, with 90\% of them going to OSG sites
using Codor-G direct submission and 10\% going through the traditional LCG 
Resource Broker (RB) to EGEE sites. In preparation of moving to the gLite RB,
thereby improving the submission rate to EGEE sites, bulk submission was
integrated into CRAB and is currently being tested.

\item October 17, 2006: Job robot submissions continue at a larger scale. There
was an issue found with the bulk submission feature used at EGEE sites leaving
jobs hanging indefinitely. The explanation was parsing
of file names in the RB input sandbox failed for file name lengths of exactly 110
characters. The problem, located in the gLite User Interface (UI), was solved by 
rebuilding the UI code to include a new version of the libtar library. A new
version of the UI was made available to the job robot operations team within a 
day.\\
A total of 20k jobs were submitted in the past 24 hours. A large number of jobs 
seemed not to report all the site information to the 
Dashboard, which results into a major fraction marked as "unknown" in the report.
The effect needs to be understood.\\
Apart from the jobs being affected by the problem mentioned above the efficiency
regarding successfully completed jobs is very high.

\item October 19, 2006: Robotic job submission via both the Condor-G direct 
submission and the gLite RB bulk submission is activated. The job completion efficiency 
remains very high for some sites. Over the course of the past day nearly 2000 
jobs were completed at Caltech with only 5 failures. 

\item October 20, 2006: The number of "unknown" jobs is decreasing following 
further investigations by the robot operations team. The job completion efficiency
remains high though the total number of submissions is lower than in the previous
days. A large number of sites running the PBS batch system have taken their 
resources off the Grid because of a critical security vulnerability. Sites 
applied a respective patch at short notice and were back to normal operation 
within a day or two.

\item October 23, 2006: Over the weekend significant scaling issues were
encountered in the robot. Those were mainly associated with the mySQL 
server holding the BOSS DB. On the gLite submission side a problem was
found with projects comprising more than 2000 jobs. A limit was
introduced with the consequence that the same data files are more often
accessed.  

\item October 24, 2006: There were again scaling problems observed in the
job robots. Switching to a central mySQL data base for both the robots 
has lead to the databases developing a lock state. Though the locks 
automatically clear within 10 to 30 minutes the effect has an impact on
the overall job submissions rate. To resolve the issue two data bases
were created, one for each robot. While the Condor-G side performs well 
the gLite robot continues to develop locking. A memory leak leading to 
robot crashes was observed in CRAB/BOSS submission through gLite. The
robot operations team is working with the BOSS developers on a solution.

\item October 25, 2006: The BOSS developers have analyzed the problem 
yesterday reported as a "scaling issue" and found that an SQL statement
issued by CRAB was incomplete, leading to long table rows being accessed
resulting in a heavy load on the data base server. The CRAB developers 
have made a new release available the same day and the robot operations
team found that the robots are running fine since.

\item October 26, 2006: Following the decision to move from analysis
of data that has been produced with CMSSW\_1\_0\_3 to more recent data
that was produced with CMSSW\_1\_0\_5 a lot of sites were not selected
and therefore not participating since they are still lacking respective 
datasets.

\item November 1, 2006: The submission rate reached by the job robots 
is currently at about 25k jobs per day. To improve scaling up to the
desired rate 11 robots were set up and are currently submitting to OSG
and EGEE sites.

\item November 2, 2006: The total number of jobs was in the order of
21k. Due to more sites having datasets published in DBS/DLS that were 
created with CMSSW\_1\_0\_5 the number of participating sites has increased.
The total application and Grid efficiency is both over 99%.

\item November 6, 2006: The number of submitted and completed jobs is still increasing.
30k jobs have successfully passed all steps in the past 24 hours. 24
Tier-2 sites are now publishing data and are accepting jobs from the robot.
The efficiency remains high.

\item November 7, 2006: The combined job robot, production and analysis submissions
exceeded the goal of 55k per day. The activity breakdown is shown in 
Figure~\ref{fig:breakdown}.
 
\begin{figure}[htp]
   \begin{center}
      \includegraphics[width=0.9\linewidth]{figs/jobs-breakdown-1102}
   \end{center}
   \caption{Dashboard view of job breakdown by activity (7 Nov - 8 Nov)}
   \label{fig:breakdown}
\end{figure}

The job robot submissions by site are shown in Figure~\ref{fig:jobs-per-site}. 
Six out of seven Tier-1
centers are included in the job robot. As expected the Tier-2 centers are 
still dominating the submissions. The addition of the Tier-1 centers has
driven the job robot submission rates past the load that can be sustained 
by a single mySQL job monitor.

\begin{figure}[htp]
   \begin{center}
      \includegraphics[width=0.9\linewidth]{figs/jobs-per-site-1102}
   \end{center}
   \caption{Dashboard view of job breakdown by site (7 Nov - 8 Nov)}
   \label{fig:jobs-per-site}
\end{figure}
\end{itemize}
Revision:	1.27
Committed:	Mon Mar 12 13:58:03 2007 UTC (18 years, 1 month ago) by acosta
Content type:	application/x-tex
Branch:	MAIN
CVS Tags:	HEAD
Changes since 1.26:	+10 -4 lines
Log Message:	suggestions from management