1 |
\section{Data Management}
|
2 |
\subsection{Dataset Bookkeeping System}
|
3 |
The Dataset Bookkeeping System (DBS) for CSA06 included the
|
4 |
functionality needed for cataloging Monte Carlo data and tracking some
|
5 |
of the processing history. Included were data-related concepts of
|
6 |
Dataset, File, File Block and Data Tier. The processing related
|
7 |
concepts of Application and Application Configuration were provided to
|
8 |
track the actual operations that were performed to produce the
|
9 |
data. In addition, data parentage relationships were provided. A
|
10 |
client level API enabled the creation of each of the entities
|
11 |
described above. File information including size, number of events,
|
12 |
status and Logical File Name (LFN) are included as attributes of each
|
13 |
file. A discovery service was developed that enabled users to find
|
14 |
data of interest for further processing and analysis.
|
15 |
|
16 |
\subsubsection{Deployment and Operation}
|
17 |
The architecture of the DBS service included a middle tier server
|
18 |
running a CG script under Apache. All client access to the server was
|
19 |
through an HTTP API. The CG script was written in PERL and access the
|
20 |
database via the PERL DBI module. All activity for CSA06 was
|
21 |
established on the CMSR production Oracle database server, with one
|
22 |
Global DBS account and a half dozen so-called Local DBS accounts. The
|
23 |
procedure was to produce Monte Carlo data under the control of four
|
24 |
``Prod Agents'', each with access to its own Local DBS instance. When the
|
25 |
data was appropriately merged and validated its catalog entries were
|
26 |
migrated to the global catalog for use by CMS at large. This migration
|
27 |
task used allowed block-by-block transfer of Datasets to be done
|
28 |
through a simple API.
|
29 |
|
30 |
There were two servers provided for CSA06, a ``test'' and
|
31 |
``production'' machine. The production servers were both dual Pentium
|
32 |
2.8 Hz processor with 2GB memory. The test server was heavily used
|
33 |
by remote sites, including the initial data production, CMOS Robot
|
34 |
submissions, and final skimming operations. The production server was
|
35 |
used by the Tier-0 reconstruction farm,and some skimming operations
|
36 |
near the end of CSA06. There were also ongoing CMOS activities
|
37 |
included in the loads for the test server that are not related to
|
38 |
CSA06. Access statistics for the service were obtained by mining the
|
39 |
Apache access log files for each of the servers. The activity for the
|
40 |
production server is shown during the month of November in
|
41 |
Fig.~\ref{fig:dbs-prod-stats-chart} and
|
42 |
Fig.~\ref{fig:dbs-prod-stats-table}. The important features to
|
43 |
observe in this data are the number of pages served (Pages), and the
|
44 |
total amount of data transferred (Bandwidth). As an example of a
|
45 |
particularly busy day, November 27 showed 220k pages (query requests)
|
46 |
and over 10 GB of data. This is a request rate of over 2.5 Hz and the
|
47 |
server CPU was around 50\% loaded. Demand on the Test server was
|
48 |
heavy in October and the first part of November with peak rates of
|
49 |
around 3 Hz in Mid October and again in early November.
|
50 |
\begin{figure}[hbtp]
|
51 |
\begin{center}
|
52 |
\resizebox{15cm}{!}{\includegraphics{figs/dbs-prod-server-stats-nov-chart}}
|
53 |
\caption{DBS production server November daily statistics. The bars on the chart for each day represent the number/amount of ``Visits'', ``Pages'', ``Hits'', and ``Bandwidth'' served. The scale and legend for the bar chart can be determined from the data in the table in Fig.~\ref{fig:dbs-prod-stats-table}.}
|
54 |
\label{fig:dbs-prod-stats-chart}
|
55 |
\end{center}
|
56 |
\end{figure}
|
57 |
|
58 |
\begin{figure}[hbtp]
|
59 |
\begin{center}
|
60 |
\resizebox{12cm}{!}{\includegraphics{figs/dbs-prod-server-stats-nov-table}}
|
61 |
\caption{DBS production server November daily statistics. }
|
62 |
\label{fig:dbs-prod-stats-table}
|
63 |
\end{center}
|
64 |
\end{figure}
|
65 |
\subsubsection{Experience from CSA06}
|
66 |
The overall operation and performance of the system was very good
|
67 |
throughout the course of the CSA06 exercise. The limited functionality
|
68 |
provided by the schema and API was sufficient for the test although
|
69 |
many additional features are needed for the ultimate system. The
|
70 |
dataset propagation from Local-scopes to Global-scope worked
|
71 |
seamlessly. Maintenance of the server code was simple and
|
72 |
straightforward. The clients were easily integrated with DLS, CRAB and
|
73 |
Prod agent. The support needed for the DBS production server was at
|
74 |
its minimal.
|
75 |
|
76 |
Clients occasionally reported slow response during peak periods, but
|
77 |
the servers held up well. The service can be easily scaled by adding
|
78 |
additional machines and a load balancing mechanism, such as round
|
79 |
robin DNS, and this will be examined for the final system. The loads
|
80 |
of the CSA06 operation were artificially inflated because many Local
|
81 |
DBS instances were being managed centrally at CERN, in addition to the
|
82 |
Global instance. In the final system the local instances will not be
|
83 |
operated at CERN. There were two incidents which resulted in service
|
84 |
interruption, both caused by problems with the central CMSR database
|
85 |
system. Ultimately, the DBS service rates needed will be reduced by
|
86 |
the fact that only Global-instance traffic will go through the central
|
87 |
CERN service.
|
88 |
|
89 |
|
90 |
There were several specific problems and concerns that can be noted
|
91 |
based on the CSA06 experience:
|
92 |
\begin{enumerate}
|
93 |
\item Lack of proper communication between FNAL and Tier 0 for DBS needs resulted in some concepts missing in the schema and API functionality.
|
94 |
\item Parameter Set information was not properly stored and there were not enough APIs to relate datasets with these parameters sets.
|
95 |
\item Provenance information for a dataset and a file were not properly stored in DBS.
|
96 |
\item Block Management was not automated and was done externally on an irregular basis. This led to transfer of both opened and closed blocks. Also, initially blocks that were transferred could not be uniquely and universally identified but this was fixed.
|
97 |
\item The merge remapping API were not used and thus not tested because of unavailability of needed functionality with the Framework Job Report and Prod Agent.
|
98 |
\item Dataset migration from Local-scope to Global-scope was found to be slower than desired. It performed at rate of 1000 files per minute which caused problems for the ``test'' server behind the cmsdoc proxy server that timed out after 15 minutes.
|
99 |
|
100 |
\end{enumerate}
|
101 |
These will be addressed in the next generation DBS being implemented post CSA06.
|
102 |
|
103 |
|
104 |
\subsection{Data Location Services}
|
105 |
|
106 |
The data location services (DLS) operated in CMS was based on the
|
107 |
Local File Catalog (LFC) infrastructure provided by the EGEE grid
|
108 |
project. Data is CMS is divided into blocks, which are logical
|
109 |
groupings of data files. The block to file mapping is maintained in
|
110 |
the DBS. The advantage of file blocks is that they reduce the number
|
111 |
of entries that need to be tracked in the DLS catalog. Instead of an
|
112 |
entry for each file there is an entry for every block, which in CSA06
|
113 |
typically contained a few hundred files.
|
114 |
|
115 |
\subsubsection{Deployment and Operation}
|
116 |
|
117 |
The LFC was deployed as a service provided by the WLCG and the DLS
|
118 |
tools developed by CMS were deployed at locations that needed to query
|
119 |
or update DLS entries. The tools were deployed and compiled from CVS
|
120 |
and were deployed on a standard WLCG user interface machine (UI).
|
121 |
|
122 |
\subsubsection{CSA06 Experience}
|
123 |
The DLS performed stably over the challenge. New data blocks were
|
124 |
created once per day for each dataset. This created a maximum of
|
125 |
about twenty new entries per day, so the load for creating production
|
126 |
datasets was small. The user analysis jobs and load generating job
|
127 |
robot jobs queried the DLS to determine the location of data blocks,
|
128 |
but only when creating new work flows. The query rate was larger than the new entry rate, but the DLS performed well.
|
129 |
|
130 |
The largest load in the system came from PhEDEx agents updating the
|
131 |
DLS entries with data block locations at sites. The PhEDEx agents
|
132 |
update the DLS with the status of all the complete blocks for a site
|
133 |
on a ten minute time interval. This kept the latency for publishing
|
134 |
complete blocks low, and was manageable with the small number of
|
135 |
blocks used in CSA06. As the number of blocks grows, CMS may need to
|
136 |
investigate local site caching of the DLS information and only update
|
137 |
the DLS with changes from the previous block publication.
|
138 |
|
139 |
\subsection{Data File Catalogs}
|
140 |
|
141 |
CMS utilized a technique called the trivial file catalog (TFC) to
|
142 |
provide the data catalogs for the site. The TFC utilizes a consistent
|
143 |
namespace on each site to provide the catalog functionality that maps
|
144 |
a logical file name to a physical file name in the storage system.
|
145 |
There is a local site configuration file that points the applications
|
146 |
to the common namespace.
|
147 |
|
148 |
\subsubsection{Deployment and Operation}
|
149 |
The TFC was deployed during the site validation phase of service
|
150 |
challenge 4. The local site configurations were entered into the
|
151 |
common CMS CVS repository, which provided tracking and aided with
|
152 |
debugging from remote experts. The TFC was successfully deployed at
|
153 |
all sites participating in the challenge activities and the feedback
|
154 |
on deployment and operations was generally positive. All underlying
|
155 |
storage systems could be accommodated and the site instructions were
|
156 |
detailed. The local configuration file also provides the location of
|
157 |
the local database cache and the local storage element to the
|
158 |
application and could be used for other site specific elements.
|
159 |
|
160 |
\subsubsection{CSA06 Experience}
|
161 |
The TFC scaled well during the challenge. Even on sites with a high
|
162 |
load of applications the logical to physical file name mappings were
|
163 |
reliably resolved. The TFC did not represent too high a load on the
|
164 |
name spaces of the underlying storage systems. An additional factor
|
165 |
of four in the TFC rate should be possible in all the currently used
|
166 |
storage systems.
|
167 |
|
168 |
\subsection{Data Transfer Mechanism}
|
169 |
|
170 |
Data in CMS was transferred between sites using the Physics Data
|
171 |
Exporter (PhEDEx) system. The PhEDEx system relies on underlying grid
|
172 |
file transfer protocols to physically move the files. While PhEDEx is
|
173 |
capable of using bare gridFTP to replicate files, only File Transfer
|
174 |
Service (FTS) driven transfers and Storage Resource Manager (SRM) transfers were operated during CSA06.
|
175 |
|
176 |
\subsubsection{Deployment and Operation}
|
177 |
|
178 |
CMS deployed an architecture where the FTS servers were located at
|
179 |
each Tier-1 and supported channels for groups of ``associated'' Tier-2
|
180 |
centers. The association between Tier-1 and Tier-2 centers was
|
181 |
intended for channel hosting, as the data can be sourced from any
|
182 |
Tier-1 center in the CMS computing model \cite{model, ctdr}. The FTS channels relied on
|
183 |
SRM transfers and the FNAL Tier-1 center also supported SRM transfers
|
184 |
driven directly from srmcp. The stability of the SRM service at the
|
185 |
sites varied, but the percentage of time the transfer succeeded on the
|
186 |
first attempt was improved over similar tests during service challenge
|
187 |
4, indicating the services are maturing.
|
188 |
|
189 |
\subsubsection{CSA06 Experience}
|
190 |
|
191 |
The architecture deployed in CMS for FTS transfers with channels
|
192 |
hosted at the associated Tier-1 centers for the supported Tier-2
|
193 |
centers leads to a large number of FTS channels. The number of FTS
|
194 |
channels supported at Tier-1 centers was larger than the number of FTS
|
195 |
channels supported at CERN. The deployed FTS architecture will be
|
196 |
re-examined for scalability and supportability.
|
197 |
|
198 |
\subsection{Data Access}
|
199 |
|
200 |
The CMS application was able to successfully read from the local
|
201 |
storage element using RFIO, RFIO2, and dCache during CSA06. During
|
202 |
the challenge the local file access was largely sequential and all
|
203 |
protocols were able to meet the application input and output needs.
|
204 |
The initial goal of the challenge was to reach 1MB/s per batch slot
|
205 |
for Tier-1 and Tier-2 centers. On average CMS was able to reach
|
206 |
approximately half the anticipated rate, which was improved after the
|
207 |
end of the challenge with protocol specific tuning for the
|
208 |
application.
|