COMP/CSA06DOC/datamanagement.tex

\section{Data Management}
\subsection{Dataset Bookkeeping System}
The Dataset Bookkeeping System (DBS) for CSA06 included the
functionality needed for cataloging Monte Carlo data and tracking some
of the processing history. Included were data-related concepts of
Dataset, File, File Block and Data Tier. The processing related
concepts of Application and Application Configuration were provided to
track the actual operations that were performed to produce the
data. In addition, data parentage relationships were provided. A
client level API enabled the creation of each of the entities
described above. File information  including size, number of events,
status and Logical File Name (LFN) are included as attributes of each
file. A discovery service was developed that enabled users to find
data of interest for further processing and analysis.  

\subsubsection{Deployment and Operation}
The architecture of the DBS service included a middle tier server
running a CG script under Apache. All client access to the server was
through an HTTP API. The CG script was written in PERL and access the
database via the PERL DBI module. All activity for CSA06 was
established on the CMSR production Oracle database server, with one
Global DBS account and a half dozen so-called Local DBS accounts. The
procedure was to produce Monte Carlo data under the control of four
``Prod Agents'', each with access to its own Local DBS instance. When the
data was appropriately merged and validated its catalog entries were
migrated to the global catalog for use by CMS at large. This migration
task used allowed block-by-block transfer of Datasets to be done
through a simple API.  

 There were two servers provided for CSA06, a ``test'' and
 ``production'' machine. The production servers were both dual Pentium
 2.8 Hz processor with  2GB memory. The test server was heavily used
 by remote sites, including the initial data production, CMOS Robot
 submissions, and final skimming operations. The production server was
 used by the Tier-0 reconstruction farm,and some skimming operations
 near the end of CSA06. There were also ongoing CMOS activities
 included in the loads for the test server that are not related to
 CSA06. Access statistics for the service were obtained by mining the
 Apache access log files for each of the servers. The activity for the
 production server is shown during the month of November in
 Fig.~\ref{fig:dbs-prod-stats-chart} and
 Fig.~\ref{fig:dbs-prod-stats-table}. The important features to
 observe in this data are the number of pages served (Pages), and the
 total amount of data transferred (Bandwidth). As an example of a
 particularly busy day, November 27 showed 220k pages (query requests)
 and over 10 GB of data. This is a request rate of over 2.5 Hz and the
 server CPU was around 50\% loaded.  Demand on the Test server was
 heavy in October and the first part of November with peak rates of
 around 3 Hz in Mid October and again in early November.  
\begin{figure}[hbtp]
  \begin{center}
    \resizebox{15cm}{!}{\includegraphics{figs/dbs-prod-server-stats-nov-chart}}
    \caption{DBS production server November daily statistics. The bars on the chart for each day represent the number/amount of ``Visits'', ``Pages'', ``Hits'', and ``Bandwidth'' served.  The scale and legend for the bar chart can be determined from the data in the table in Fig.~\ref{fig:dbs-prod-stats-table}.}
    \label{fig:dbs-prod-stats-chart}
  \end{center}
\end{figure} 

\begin{figure}[hbtp]
  \begin{center}
    \resizebox{12cm}{!}{\includegraphics{figs/dbs-prod-server-stats-nov-table}}
    \caption{DBS production server November daily statistics. }
    \label{fig:dbs-prod-stats-table}
  \end{center}
\end{figure} 
\subsubsection{Experience from CSA06} 
The overall operation and performance of the system was very good
throughout the course of the CSA06 exercise. The limited functionality
provided by the schema and API was sufficient for the test although
many additional features are needed for the ultimate system. The
dataset propagation from Local-scopes to Global-scope worked
seamlessly. Maintenance of the server code was simple and
straightforward. The clients were easily integrated with DLS, CRAB and
Prod agent. The support needed for the DBS production server was at
its minimal.  

Clients occasionally reported slow response during peak periods, but
the servers held up well. The service can be easily scaled by adding
additional machines and a load balancing mechanism, such as round
robin DNS, and this will be examined for the final system. The loads
of the CSA06 operation were artificially inflated because many Local
DBS instances were being managed centrally at CERN, in addition to the
Global instance. In the final system the local instances will not be
operated at CERN.  There were two incidents which resulted in service
interruption, both caused by problems with the central CMSR database
system. Ultimately, the DBS  service rates needed will be reduced by
the fact that only Global-instance traffic will go through the central
CERN service.  

 
There were several specific problems and concerns that can be noted
based on the CSA06 experience: 
\begin{enumerate}
\item Lack of proper communication between FNAL and Tier 0 for DBS needs resulted in some concepts missing in the schema and  API functionality. 
\item   Parameter Set information was not properly stored and there were not enough APIs to relate datasets with these parameters sets.
\item Provenance information for a dataset and a file were not properly stored in DBS.
\item Block Management was not automated and was done externally on an irregular basis. This led to transfer of both opened and closed blocks. Also, initially blocks that were transferred could not be uniquely and universally identified but this was fixed.
 \item  The merge remapping API were not used and thus not tested because of unavailability of needed functionality with the Framework Job Report and Prod Agent. 
\item Dataset migration from Local-scope to Global-scope was found to be slower than desired. It performed at rate of 1000 files per minute which caused problems for the ``test'' server behind the cmsdoc proxy server that timed out after 15 minutes.

\end{enumerate}
These will be addressed in the next generation DBS being implemented post CSA06.


\subsection{Data Location Services}

The data location services (DLS) operated in CMS was based on the
Local File Catalog (LFC) infrastructure provided by the EGEE grid
project.  Data is CMS is divided into blocks, which are logical
groupings of data files.  The block to file mapping is maintained in
the DBS.  The advantage of file blocks is that they reduce the number
of entries that need to be tracked in the DLS catalog.  Instead of an
entry for each file there is an entry for every block, which in CSA06
typically contained a few hundred files.

\subsubsection{Deployment and Operation}

The LFC was deployed as a service provided by the WLCG and the DLS
tools developed by CMS were deployed at locations that needed to query
or update DLS entries.  The tools were deployed and compiled from CVS
and were deployed on a standard WLCG user interface machine (UI).   

\subsubsection{CSA06 Experience}
The DLS performed stably over the challenge.  New data blocks were
created once per day for each dataset.  This created a maximum of
about twenty new entries per day, so the load for creating production
datasets was small.   The user analysis jobs and load generating job
robot jobs queried the DLS to determine the location of data blocks,
but only when creating new work flows.   The query rate was larger than the new entry rate, but the DLS performed well.

The largest load in the system came from PhEDEx agents updating the
DLS entries with data block locations at sites.  The PhEDEx agents
update the DLS with the status of all the complete blocks for a site
on a ten minute time interval.  This kept the latency for publishing
complete blocks low, and was manageable with the small number of
blocks used in CSA06.  As the number of blocks grows, CMS may need to
investigate local site caching of the DLS information and only update
the DLS with changes from the previous block publication.

\subsection{Data File Catalogs} 

CMS utilized a technique called the trivial file catalog (TFC) to
provide the data catalogs for the site.  The TFC utilizes a consistent
namespace on each site to provide the catalog functionality that maps
a logical file name to a physical file name in the storage system.
There is a local site configuration file that points the applications
to the common namespace.

\subsubsection{Deployment and Operation}
The TFC was deployed during the site validation phase of service
challenge 4.  The local site configurations were entered into the
common CMS CVS repository, which provided tracking and aided with
debugging from remote experts.  The TFC was successfully deployed at
all sites participating in the challenge activities and the feedback
on deployment and operations was generally positive.  All underlying
storage systems could be accommodated and the site instructions were
detailed.  The local configuration file also provides the location of
the local database cache and the local storage element to the
application and could be used for other site specific elements.

\subsubsection{CSA06 Experience}
The TFC scaled well during the challenge.  Even on sites with a high
load of applications the logical to physical file name mappings were
reliably resolved.  The TFC did not represent too high a load on the
name spaces of the underlying storage systems.  An additional factor
of four in the TFC rate should be possible in all the currently used
storage systems.

\subsection{Data Transfer Mechanism}

Data in CMS was transferred between sites using the Physics Data
Exporter (PhEDEx) system.  The PhEDEx system relies on underlying grid
file transfer protocols to physically move the files.  While PhEDEx is
capable of using bare gridFTP to replicate files, only File Transfer
Service (FTS) driven transfers and Storage Resource Manager (SRM) transfers were operated during CSA06.

\subsubsection{Deployment and Operation}

CMS deployed an architecture where the FTS servers were located at
each Tier-1 and supported channels for groups of ``associated'' Tier-2
centers.  The association between Tier-1 and Tier-2 centers was
intended for channel hosting, as the data can be sourced from any
Tier-1 center in the CMS computing model \cite{model, ctdr}.  The FTS channels relied on
SRM transfers and the FNAL Tier-1 center also supported SRM transfers
driven directly from srmcp.  The stability of the SRM service at the
sites varied, but the percentage of time the transfer succeeded on the
first attempt was improved over similar tests during service challenge
4, indicating the services are maturing.

\subsubsection{CSA06 Experience}

The architecture deployed in CMS for FTS transfers with channels
hosted at the associated Tier-1 centers for the supported Tier-2
centers leads to a large number of FTS channels.  The number of FTS
channels supported at Tier-1 centers was larger than the number of FTS
channels supported at CERN.  The deployed FTS architecture will be
re-examined for scalability and supportability.

\subsection{Data Access}

The CMS application was able to successfully read from the local
storage element using RFIO, RFIO2, and dCache during CSA06.  During
the challenge the local file access was largely sequential and all
protocols were able to meet the application input and output needs.
The initial goal of the challenge was to reach 1MB/s per batch slot
for Tier-1 and Tier-2 centers.  On average CMS was able to reach
approximately half the anticipated rate, which was improved after the
end of the challenge with protocol specific tuning for the
application.
Revision:	1.8
Committed:	Sat Mar 3 11:09:30 2007 UTC (18 years, 2 months ago) by acosta
Content type:	application/x-tex
Branch:	MAIN
CVS Tags:	HEAD
Changes since 1.7:	+1 -1 lines
Log Message:	minor edits
#	Content
1	\section{Data Management}
2	\subsection{Dataset Bookkeeping System}
3	The Dataset Bookkeeping System (DBS) for CSA06 included the
4	functionality needed for cataloging Monte Carlo data and tracking some
5	of the processing history. Included were data-related concepts of
6	Dataset, File, File Block and Data Tier. The processing related
7	concepts of Application and Application Configuration were provided to
8	track the actual operations that were performed to produce the
9	data. In addition, data parentage relationships were provided. A
10	client level API enabled the creation of each of the entities
11	described above. File information including size, number of events,
12	status and Logical File Name (LFN) are included as attributes of each
13	file. A discovery service was developed that enabled users to find
14	data of interest for further processing and analysis.
15
16	\subsubsection{Deployment and Operation}
17	The architecture of the DBS service included a middle tier server
18	running a CG script under Apache. All client access to the server was
19	through an HTTP API. The CG script was written in PERL and access the
20	database via the PERL DBI module. All activity for CSA06 was
21	established on the CMSR production Oracle database server, with one
22	Global DBS account and a half dozen so-called Local DBS accounts. The
23	procedure was to produce Monte Carlo data under the control of four
24	``Prod Agents'', each with access to its own Local DBS instance. When the
25	data was appropriately merged and validated its catalog entries were
26	migrated to the global catalog for use by CMS at large. This migration
27	task used allowed block-by-block transfer of Datasets to be done
28	through a simple API.
29
30	There were two servers provided for CSA06, a ``test'' and
31	``production'' machine. The production servers were both dual Pentium
32	2.8 Hz processor with 2GB memory. The test server was heavily used
33	by remote sites, including the initial data production, CMOS Robot
34	submissions, and final skimming operations. The production server was
35	used by the Tier-0 reconstruction farm,and some skimming operations
36	near the end of CSA06. There were also ongoing CMOS activities
37	included in the loads for the test server that are not related to
38	CSA06. Access statistics for the service were obtained by mining the
39	Apache access log files for each of the servers. The activity for the
40	production server is shown during the month of November in
41	Fig.~\ref{fig:dbs-prod-stats-chart} and
42	Fig.~\ref{fig:dbs-prod-stats-table}. The important features to
43	observe in this data are the number of pages served (Pages), and the
44	total amount of data transferred (Bandwidth). As an example of a
45	particularly busy day, November 27 showed 220k pages (query requests)
46	and over 10 GB of data. This is a request rate of over 2.5 Hz and the
47	server CPU was around 50\% loaded. Demand on the Test server was
48	heavy in October and the first part of November with peak rates of
49	around 3 Hz in Mid October and again in early November.
50	\begin{figure}[hbtp]
51	\begin{center}
52	\resizebox{15cm}{!}{\includegraphics{figs/dbs-prod-server-stats-nov-chart}}
53	\caption{DBS production server November daily statistics. The bars on the chart for each day represent the number/amount of ``Visits'', ``Pages'', ``Hits'', and ``Bandwidth'' served. The scale and legend for the bar chart can be determined from the data in the table in Fig.~\ref{fig:dbs-prod-stats-table}.}
54	\label{fig:dbs-prod-stats-chart}
55	\end{center}
56	\end{figure}
57
58	\begin{figure}[hbtp]
59	\begin{center}
60	\resizebox{12cm}{!}{\includegraphics{figs/dbs-prod-server-stats-nov-table}}
61	\caption{DBS production server November daily statistics. }
62	\label{fig:dbs-prod-stats-table}
63	\end{center}
64	\end{figure}
65	\subsubsection{Experience from CSA06}
66	The overall operation and performance of the system was very good
67	throughout the course of the CSA06 exercise. The limited functionality
68	provided by the schema and API was sufficient for the test although
69	many additional features are needed for the ultimate system. The
70	dataset propagation from Local-scopes to Global-scope worked
71	seamlessly. Maintenance of the server code was simple and
72	straightforward. The clients were easily integrated with DLS, CRAB and
73	Prod agent. The support needed for the DBS production server was at
74	its minimal.
75
76	Clients occasionally reported slow response during peak periods, but
77	the servers held up well. The service can be easily scaled by adding
78	additional machines and a load balancing mechanism, such as round
79	robin DNS, and this will be examined for the final system. The loads
80	of the CSA06 operation were artificially inflated because many Local
81	DBS instances were being managed centrally at CERN, in addition to the
82	Global instance. In the final system the local instances will not be
83	operated at CERN. There were two incidents which resulted in service
84	interruption, both caused by problems with the central CMSR database
85	system. Ultimately, the DBS service rates needed will be reduced by
86	the fact that only Global-instance traffic will go through the central
87	CERN service.
88
89
90	There were several specific problems and concerns that can be noted
91	based on the CSA06 experience:
92	\begin{enumerate}
93	\item Lack of proper communication between FNAL and Tier 0 for DBS needs resulted in some concepts missing in the schema and API functionality.
94	\item Parameter Set information was not properly stored and there were not enough APIs to relate datasets with these parameters sets.
95	\item Provenance information for a dataset and a file were not properly stored in DBS.
96	\item Block Management was not automated and was done externally on an irregular basis. This led to transfer of both opened and closed blocks. Also, initially blocks that were transferred could not be uniquely and universally identified but this was fixed.
97	\item The merge remapping API were not used and thus not tested because of unavailability of needed functionality with the Framework Job Report and Prod Agent.
98	\item Dataset migration from Local-scope to Global-scope was found to be slower than desired. It performed at rate of 1000 files per minute which caused problems for the ``test'' server behind the cmsdoc proxy server that timed out after 15 minutes.
99
100	\end{enumerate}
101	These will be addressed in the next generation DBS being implemented post CSA06.
102
103
104	\subsection{Data Location Services}
105
106	The data location services (DLS) operated in CMS was based on the
107	Local File Catalog (LFC) infrastructure provided by the EGEE grid
108	project. Data is CMS is divided into blocks, which are logical
109	groupings of data files. The block to file mapping is maintained in
110	the DBS. The advantage of file blocks is that they reduce the number
111	of entries that need to be tracked in the DLS catalog. Instead of an
112	entry for each file there is an entry for every block, which in CSA06
113	typically contained a few hundred files.
114
115	\subsubsection{Deployment and Operation}
116
117	The LFC was deployed as a service provided by the WLCG and the DLS
118	tools developed by CMS were deployed at locations that needed to query
119	or update DLS entries. The tools were deployed and compiled from CVS
120	and were deployed on a standard WLCG user interface machine (UI).
121
122	\subsubsection{CSA06 Experience}
123	The DLS performed stably over the challenge. New data blocks were
124	created once per day for each dataset. This created a maximum of
125	about twenty new entries per day, so the load for creating production
126	datasets was small. The user analysis jobs and load generating job
127	robot jobs queried the DLS to determine the location of data blocks,
128	but only when creating new work flows. The query rate was larger than the new entry rate, but the DLS performed well.
129
130	The largest load in the system came from PhEDEx agents updating the
131	DLS entries with data block locations at sites. The PhEDEx agents
132	update the DLS with the status of all the complete blocks for a site
133	on a ten minute time interval. This kept the latency for publishing
134	complete blocks low, and was manageable with the small number of
135	blocks used in CSA06. As the number of blocks grows, CMS may need to
136	investigate local site caching of the DLS information and only update
137	the DLS with changes from the previous block publication.
138
139	\subsection{Data File Catalogs}
140
141	CMS utilized a technique called the trivial file catalog (TFC) to
142	provide the data catalogs for the site. The TFC utilizes a consistent
143	namespace on each site to provide the catalog functionality that maps
144	a logical file name to a physical file name in the storage system.
145	There is a local site configuration file that points the applications
146	to the common namespace.
147
148	\subsubsection{Deployment and Operation}
149	The TFC was deployed during the site validation phase of service
150	challenge 4. The local site configurations were entered into the
151	common CMS CVS repository, which provided tracking and aided with
152	debugging from remote experts. The TFC was successfully deployed at
153	all sites participating in the challenge activities and the feedback
154	on deployment and operations was generally positive. All underlying
155	storage systems could be accommodated and the site instructions were
156	detailed. The local configuration file also provides the location of
157	the local database cache and the local storage element to the
158	application and could be used for other site specific elements.
159
160	\subsubsection{CSA06 Experience}
161	The TFC scaled well during the challenge. Even on sites with a high
162	load of applications the logical to physical file name mappings were
163	reliably resolved. The TFC did not represent too high a load on the
164	name spaces of the underlying storage systems. An additional factor
165	of four in the TFC rate should be possible in all the currently used
166	storage systems.
167
168	\subsection{Data Transfer Mechanism}
169
170	Data in CMS was transferred between sites using the Physics Data
171	Exporter (PhEDEx) system. The PhEDEx system relies on underlying grid
172	file transfer protocols to physically move the files. While PhEDEx is
173	capable of using bare gridFTP to replicate files, only File Transfer
174	Service (FTS) driven transfers and Storage Resource Manager (SRM) transfers were operated during CSA06.
175
176	\subsubsection{Deployment and Operation}
177
178	CMS deployed an architecture where the FTS servers were located at
179	each Tier-1 and supported channels for groups of ``associated'' Tier-2
180	centers. The association between Tier-1 and Tier-2 centers was
181	intended for channel hosting, as the data can be sourced from any
182	Tier-1 center in the CMS computing model \cite{model, ctdr}. The FTS channels relied on
183	SRM transfers and the FNAL Tier-1 center also supported SRM transfers
184	driven directly from srmcp. The stability of the SRM service at the
185	sites varied, but the percentage of time the transfer succeeded on the
186	first attempt was improved over similar tests during service challenge
187	4, indicating the services are maturing.
188
189	\subsubsection{CSA06 Experience}
190
191	The architecture deployed in CMS for FTS transfers with channels
192	hosted at the associated Tier-1 centers for the supported Tier-2
193	centers leads to a large number of FTS channels. The number of FTS
194	channels supported at Tier-1 centers was larger than the number of FTS
195	channels supported at CERN. The deployed FTS architecture will be
196	re-examined for scalability and supportability.
197
198	\subsection{Data Access}
199
200	The CMS application was able to successfully read from the local
201	storage element using RFIO, RFIO2, and dCache during CSA06. During
202	the challenge the local file access was largely sequential and all
203	protocols were able to meet the application input and output needs.
204	The initial goal of the challenge was to reach 1MB/s per batch slot
205	for Tier-1 and Tier-2 centers. On average CMS was able to reach
206	approximately half the anticipated rate, which was improved after the
207	end of the challenge with protocol specific tuning for the
208	application.