COMP/CSA06DOC/datamanagement.tex

\section{Data Management}
\subsection{Dataset Bookkeeping System}
The Dataset Bookkeeping System (DBS) for CSA06 included the
functionality needed for cataloging Monte Carlo data and tracking some
of the processing history. Included were data-related concepts of
Dataset, File, File Block and Data Tier. The processing related
concepts of Application and Application Configuration were provided to
track the actual operations that were performed to produce the
data. In addition, data parentage relationships were provided. A
client level API enabled the creation of each of the entities
described above. File information  including size, number of events,
status and Logical File Name (LFN) are included as attributes of each
file. A discovery service was developed that enabled users to find
data of interest for further processing and analysis.  

\subsubsection{Deployment and Operation}
The architecture of the DBS service included a middle tier server
running a CG script under Apache. All client access to the server was
through an HTTP API. The CG script was written in PERL and access the
database via the PERL DBI module. All activity for CSA06 was
established on the CMSR production Oracle database server, with one
Global DBS account and a half dozen so-called Local DBS accounts. The
procedure was to produce Monte Carlo data under the control of four
``Prod Agents'', each with access to its own Local DBS instance. When the
data was appropriately merged and validated its catalog entries were
migrated to the global catalog for use by CMS at large. This migration
task used allowed block-by-block transfer of Datasets to be done
through a simple API.  

 There were two servers provided for CSA06, a ``test'' and
 ``production'' machine. The production servers were both dual Pentium
 2.8 Hz processor with  2GB memory. The test server was heavily used
 by remote sites, including the initial data production, CMOS Robot
 submissions, and final skimming operations. The production server was
 used by the Tier-0 reconstruction farm,and some skimming operations
 near the end of CSA06. There were also ongoing CMOS activities
 included in the loads for the test server that are not related to
 CSA06. Access statistics for the service were obtained by mining the
 Apache access log files for each of the servers. The activity for the
 production server is shown during the month of November in
 Fig.~\ref{fig:dbs-prod-stats-chart} and
 Fig.~\ref{fig:dbs-prod-stats-table}. The important features to
 observe in this data are the number of pages served (Pages), and the
 total amount of data transferred (Bandwidth). As an example of a
 particularly busy day, November 27 showed 220k pages (query requests)
 and over 10 GB of data. This is a request rate of over 2.5 Hz and the
 server CPU was around 50\% loaded.  Demand on the Test server was
 heavy in October and the first part of November with peak rates of
 around 3 Hz in Mid October and again in early November.  
\begin{figure}[hbtp]
  \begin{center}
    \resizebox{15cm}{!}{\includegraphics{figs/dbs-prod-server-stats-nov-chart}}
    \caption{DBS production server November daily statistics. The bars on the chart for each day represent the number/amount of ``Visits'', ``Pages'', ``Hits'', and ``Bandwidth'' served.  The scale and legend for the bar chart can be determined from the data in the table in Fig.~\ref{fig:dbs-prod-stats-table}.}
    \label{fig:dbs-prod-stats-chart}
  \end{center}
\end{figure} 

\begin{figure}[hbtp]
  \begin{center}
    \resizebox{12cm}{!}{\includegraphics{figs/dbs-prod-server-stats-nov-table}}
    \caption{DBS production server November daily statistics. }
    \label{fig:dbs-prod-stats-table}
  \end{center}
\end{figure} 
\subsubsection{Experience from CSA06} 
The overall operation and performance of the system was very good
throughout the course of the CSA06 exercise. The limited functionality
provided by the schema and API was sufficient for the test although
many additional features are needed for the ultimate system. The
dataset propagation from Local-scopes to Global-scope worked
seamlessly. Maintenance of the server code was simple and
straightforward. The clients were easily integrated with DLS, CRAB and
Prod agent. The support needed for the DBS production server was at
its minimal.  

Clients occasionally reported slow response during peak periods, but
the servers held up well. The service can be easily scaled by adding
additional machines and a load balancing mechanism, such as round
robin DNS, and this will be examined for the final system. The loads
of the CSA06 operation were artificially inflated because many Local
DBS instances were being managed centrally at CERN, in addition to the
Global instance. In the final system the local instances will not be
operated at CERN.  There were two incidents which resulted in service
interruption, both caused by problems with the central CMSR database
system. Ultimately, the DBS  service rates needed will be reduced by
the fact that only Global-instance traffic will go through the central
CERN service.  

 
There were several specific problems and concerns that can be noted
based on the CSA06 experience: 
\begin{enumerate}
\item Lack of proper communication between FNAL and Tier 0 for DBS needs resulted in some concepts missing in the schema and  API functionality. 
\item   Parameter Set information was not properly stored and there were not enough APIs to relate datasets with these parameters sets.
\item Provenance information for a dataset and a file were not properly stored in DBS.
\item Block Management was not automated and was done externally on an irregular basis. This led to transfer of both opened and closed blocks. Also, initially blocks that were transferred could not be uniquely and universally identified but this was fixed.
 \item  The merge remapping API were not used and thus not tested because of unavailability of needed functionality with the Framework Job Report and Prod Agent. 
\item Dataset migration from Local-scope to Global-scope was found to be slower than desired. It performed at rate of 1000 files per minute which caused problems for the ``test'' server behind the cmsdoc proxy server that timed out after 15 minutes.

\end{enumerate}
These will be addressed in the next generation DBS being implemented post CSA06.


\subsection{Data Location Services}

The data location services (DLS) operated in CMS was based on the
Local File Catalog (LFC) infrastructure provided by the EGEE grid
project.  Data is CMS is divided into blocks, which are logical
groupings of data files.  The block to file mapping is maintained in
the DBS.  The advantage of file blocks is that they reduce the number
of entries that need to be tracked in the DLS catalog.  Instead of an
entry for each file there is an entry for every block, which in CSA06
typically contained a few hundred files.

\subsubsection{Deployment and Operation}

The LFC was deployed as a service provided by the WLCG and the DLS
tools developed by CMS were deployed at locations that needed to query
or update DLS entries.  The tools were deployed and compiled from CVS
and were deployed on a standard WLCG user interface machine (UI).   

\subsubsection{CSA06 Experience}
The DLS performed stably over the challenge.  New data blocks were
created once per day for each dataset.  This created a maximum of
about twenty new entries per day, so the load for creating production
datasets was small.   The user analysis jobs and load generating job
robot jobs queried the DLS to determine the location of data blocks,
but only when creating new work flows.   The query rate was larger than the new entry rate, but the DLS performed well.

The largest load in the system came from PhEDEx agents updating the
DLS entries with data block locations at sites.  The PhEDEx agents
update the DLS with the status of all the complete blocks for a site
on a ten minute time interval.  This kept the latency for publishing
complete blocks low, and was manageable with the small number of
blocks used in CSA06.  As the number of blocks grows, CMS may need to
investigate local site caching of the DLS information and only update
the DLS with changes from the previous block publication.

\subsection{Data File Catalogs} 

CMS utilized a technique called the trivial file catalog (TFC) to
provide the data catalogs for the site.  The TFC utilizes a consistent
namespace on each site to provide the catalog functionality that maps
a logical file name to a physical file name in the storage system.
There is a local site configuration file that points the applications
to the common namespace.

\subsubsection{Deployment and Operation}
The TFC was deployed during the site validation phase of service
challenge 4.  The local site configurations were entered into the
common CMS CVS repository, which provided tracking and aided with
debugging from remote experts.  The TFC was successfully deployed at
all sites participating in the challenge activities and the feedback
on deployment and operations was generally positive.  All underlying
storage systems could be accommodated and the site instructions were
detailed.  The local configuration file also provides the location of
the local database cache and the local storage element to the
application and could be used for other site specific elements.

\subsubsection{CSA06 Experience}
The TFC scaled well during the challenge.  Even on sites with a high
load of applications the logical to physical file name mappings were
reliably resolved.  The TFC did not represent too high a load on the
name spaces of the underlying storage systems.  An additional factor
of four in the TFC rate should be possible in all the currently used
storage systems.

\subsection{Data Transfer Mechanism}

Data in CMS was transferred between sites using the Physics Data
Exporter (PhEDEx) system.  The PhEDEx system relies on underlying grid
file transfer protocols to physically move the files.  While PhEDEx is
capable of using bare gridFTP to replicate files, only File Transfer
Service (FTS) driven transfers and Storage Resource Manager (SRM) transfers were operated during CSA06.

\subsubsection{Deployment and Operation}

CMS deployed an architecture where the FTS servers were located at
each Tier-1 and supported channels for groups of ``associated'' Tier-2
centers.  The association between Tier-1 and Tier-2 centers was
intended for channel hosting, as the data can be sourced from any
Tier-1 center in the CMS computing model \cite{model, ctdr}.  The FTS channels relied on
SRM transfers and the FNAL Tier-1 center also supported SRM transfers
driven directly from srmcp.  The stability of the SRM service at the
sites varied, but the percentage of time the transfer succeeded on the
first attempt was improved over similar tests during service challenge
4, indicating the services are maturing.

\subsubsection{CSA06 Experience}

The architecture deployed in CMS for FTS transfers with channels
hosted at the associated Tier-1 centers for the supported Tier-2
centers leads to a large number of FTS channels.  The number of FTS
channels supported at Tier-1 centers was larger than the number of FTS
channels supported at CERN.  The deployed FTS architecture will be
re-examined for scalability and supportability.

\subsection{Data Access}

The CMS application was able to successfully read from the local
storage element using RFIO, RFIO2, and dCache during CSA06.  During
the challenge the local file access was largely sequential and all
protocols were able to meet the application input and output needs.
The initial goal of the challenge was to reach 1MB/s per batch slot
for Tier-1 and Tier-2 centers.  On average CMS was able to reach
approximately half the anticipated rate, which was improved after the
end of the challenge with protocol specific tuning for the
application.
Revision:	1.8
Committed:	Sat Mar 3 11:09:30 2007 UTC (18 years, 2 months ago) by acosta
Content type:	application/x-tex
Branch:	MAIN
CVS Tags:	HEAD
Changes since 1.7:	+1 -1 lines
Log Message:	minor edits
#	User	Rev	Content
1	fisk	1.1	\section{Data Management}
2	lueking	1.2	\subsection{Dataset Bookkeeping System}
3	acosta	1.5	The Dataset Bookkeeping System (DBS) for CSA06 included the
4			functionality needed for cataloging Monte Carlo data and tracking some
5			of the processing history. Included were data-related concepts of
6			Dataset, File, File Block and Data Tier. The processing related
7			concepts of Application and Application Configuration were provided to
8			track the actual operations that were performed to produce the
9			data. In addition, data parentage relationships were provided. A
10			client level API enabled the creation of each of the entities
11			described above. File information including size, number of events,
12			status and Logical File Name (LFN) are included as attributes of each
13			file. A discovery service was developed that enabled users to find
14			data of interest for further processing and analysis.
15	lueking	1.2
16	lueking	1.4	\subsubsection{Deployment and Operation}
17	acosta	1.5	The architecture of the DBS service included a middle tier server
18			running a CG script under Apache. All client access to the server was
19			through an HTTP API. The CG script was written in PERL and access the
20			database via the PERL DBI module. All activity for CSA06 was
21			established on the CMSR production Oracle database server, with one
22			Global DBS account and a half dozen so-called Local DBS accounts. The
23			procedure was to produce Monte Carlo data under the control of four
24			``Prod Agents'', each with access to its own Local DBS instance. When the
25			data was appropriately merged and validated its catalog entries were
26			migrated to the global catalog for use by CMS at large. This migration
27			task used allowed block-by-block transfer of Datasets to be done
28			through a simple API.
29
30			There were two servers provided for CSA06, a ``test'' and
31			``production'' machine. The production servers were both dual Pentium
32			2.8 Hz processor with 2GB memory. The test server was heavily used
33			by remote sites, including the initial data production, CMOS Robot
34			submissions, and final skimming operations. The production server was
35			used by the Tier-0 reconstruction farm,and some skimming operations
36			near the end of CSA06. There were also ongoing CMOS activities
37			included in the loads for the test server that are not related to
38			CSA06. Access statistics for the service were obtained by mining the
39			Apache access log files for each of the servers. The activity for the
40			production server is shown during the month of November in
41			Fig.~\ref{fig:dbs-prod-stats-chart} and
42			Fig.~\ref{fig:dbs-prod-stats-table}. The important features to
43			observe in this data are the number of pages served (Pages), and the
44	fisk	1.6	total amount of data transferred (Bandwidth). As an example of a
45	acosta	1.5	particularly busy day, November 27 showed 220k pages (query requests)
46			and over 10 GB of data. This is a request rate of over 2.5 Hz and the
47			server CPU was around 50\% loaded. Demand on the Test server was
48			heavy in October and the first part of November with peak rates of
49			around 3 Hz in Mid October and again in early November.
50	lueking	1.2	\begin{figure}[hbtp]
51			\begin{center}
52	malgeri	1.3	\resizebox{15cm}{!}{\includegraphics{figs/dbs-prod-server-stats-nov-chart}}
53	lueking	1.4	\caption{DBS production server November daily statistics. The bars on the chart for each day represent the number/amount of ``Visits'', ``Pages'', ``Hits'', and ``Bandwidth'' served. The scale and legend for the bar chart can be determined from the data in the table in Fig.~\ref{fig:dbs-prod-stats-table}.}
54	lueking	1.2	\label{fig:dbs-prod-stats-chart}
55			\end{center}
56			\end{figure}
57
58			\begin{figure}[hbtp]
59			\begin{center}
60	acosta	1.5	\resizebox{12cm}{!}{\includegraphics{figs/dbs-prod-server-stats-nov-table}}
61	lueking	1.2	\caption{DBS production server November daily statistics. }
62			\label{fig:dbs-prod-stats-table}
63			\end{center}
64			\end{figure}
65	lueking	1.4	\subsubsection{Experience from CSA06}
66	acosta	1.5	The overall operation and performance of the system was very good
67			throughout the course of the CSA06 exercise. The limited functionality
68			provided by the schema and API was sufficient for the test although
69			many additional features are needed for the ultimate system. The
70			dataset propagation from Local-scopes to Global-scope worked
71			seamlessly. Maintenance of the server code was simple and
72			straightforward. The clients were easily integrated with DLS, CRAB and
73			Prod agent. The support needed for the DBS production server was at
74			its minimal.
75
76			Clients occasionally reported slow response during peak periods, but
77			the servers held up well. The service can be easily scaled by adding
78			additional machines and a load balancing mechanism, such as round
79			robin DNS, and this will be examined for the final system. The loads
80			of the CSA06 operation were artificially inflated because many Local
81			DBS instances were being managed centrally at CERN, in addition to the
82			Global instance. In the final system the local instances will not be
83			operated at CERN. There were two incidents which resulted in service
84			interruption, both caused by problems with the central CMSR database
85			system. Ultimately, the DBS service rates needed will be reduced by
86			the fact that only Global-instance traffic will go through the central
87			CERN service.
88	lueking	1.4
89	lueking	1.2
90	acosta	1.5	There were several specific problems and concerns that can be noted
91			based on the CSA06 experience:
92	lueking	1.4	\begin{enumerate}
93			\item Lack of proper communication between FNAL and Tier 0 for DBS needs resulted in some concepts missing in the schema and API functionality.
94			\item Parameter Set information was not properly stored and there were not enough APIs to relate datasets with these parameters sets.
95	fisk	1.6	\item Provenance information for a dataset and a file were not properly stored in DBS.
96			\item Block Management was not automated and was done externally on an irregular basis. This led to transfer of both opened and closed blocks. Also, initially blocks that were transferred could not be uniquely and universally identified but this was fixed.
97	lueking	1.4	\item The merge remapping API were not used and thus not tested because of unavailability of needed functionality with the Framework Job Report and Prod Agent.
98			\item Dataset migration from Local-scope to Global-scope was found to be slower than desired. It performed at rate of 1000 files per minute which caused problems for the ``test'' server behind the cmsdoc proxy server that timed out after 15 minutes.
99
100			\end{enumerate}
101			These will be addressed in the next generation DBS being implemented post CSA06.
102
103
104	lueking	1.2	\subsection{Data Location Services}
105	fisk	1.6
106			The data location services (DLS) operated in CMS was based on the
107			Local File Catalog (LFC) infrastructure provided by the EGEE grid
108			project. Data is CMS is divided into blocks, which are logical
109			groupings of data files. The block to file mapping is maintained in
110			the DBS. The advantage of file blocks is that they reduce the number
111			of entries that need to be tracked in the DLS catalog. Instead of an
112			entry for each file there is an entry for every block, which in CSA06
113			typically contained a few hundred files.
114
115			\subsubsection{Deployment and Operation}
116
117			The LFC was deployed as a service provided by the WLCG and the DLS
118			tools developed by CMS were deployed at locations that needed to query
119			or update DLS entries. The tools were deployed and compiled from CVS
120			and were deployed on a standard WLCG user interface machine (UI).
121
122			\subsubsection{CSA06 Experience}
123			The DLS performed stably over the challenge. New data blocks were
124			created once per day for each dataset. This created a maximum of
125			about twenty new entries per day, so the load for creating production
126			datasets was small. The user analysis jobs and load generating job
127			robot jobs queried the DLS to determine the location of data blocks,
128			but only when creating new work flows. The query rate was larger than the new entry rate, but the DLS performed well.
129
130			The largest load in the system came from PhEDEx agents updating the
131			DLS entries with data block locations at sites. The PhEDEx agents
132			update the DLS with the status of all the complete blocks for a site
133			on a ten minute time interval. This kept the latency for publishing
134			complete blocks low, and was manageable with the small number of
135	fisk	1.7	blocks used in CSA06. As the number of blocks grows, CMS may need to
136	fisk	1.6	investigate local site caching of the DLS information and only update
137			the DLS with changes from the previous block publication.
138
139	lueking	1.2	\subsection{Data File Catalogs}
140	fisk	1.6
141			CMS utilized a technique called the trivial file catalog (TFC) to
142			provide the data catalogs for the site. The TFC utilizes a consistent
143			namespace on each site to provide the catalog functionality that maps
144			a logical file name to a physical file name in the storage system.
145			There is a local site configuration file that points the applications
146			to the common namespace.
147
148			\subsubsection{Deployment and Operation}
149			The TFC was deployed during the site validation phase of service
150			challenge 4. The local site configurations were entered into the
151			common CMS CVS repository, which provided tracking and aided with
152			debugging from remote experts. The TFC was successfully deployed at
153			all sites participating in the challenge activities and the feedback
154	fisk	1.7	on deployment and operations was generally positive. All underlying
155			storage systems could be accommodated and the site instructions were
156			detailed. The local configuration file also provides the location of
157			the local database cache and the local storage element to the
158			application and could be used for other site specific elements.
159	fisk	1.6
160			\subsubsection{CSA06 Experience}
161			The TFC scaled well during the challenge. Even on sites with a high
162			load of applications the logical to physical file name mappings were
163	fisk	1.7	reliably resolved. The TFC did not represent too high a load on the
164	fisk	1.6	name spaces of the underlying storage systems. An additional factor
165			of four in the TFC rate should be possible in all the currently used
166			storage systems.
167
168	lueking	1.2	\subsection{Data Transfer Mechanism}
169	fisk	1.6
170			Data in CMS was transferred between sites using the Physics Data
171			Exporter (PhEDEx) system. The PhEDEx system relies on underlying grid
172			file transfer protocols to physically move the files. While PhEDEx is
173			capable of using bare gridFTP to replicate files, only File Transfer
174			Service (FTS) driven transfers and Storage Resource Manager (SRM) transfers were operated during CSA06.
175
176			\subsubsection{Deployment and Operation}
177
178			CMS deployed an architecture where the FTS servers were located at
179			each Tier-1 and supported channels for groups of ``associated'' Tier-2
180			centers. The association between Tier-1 and Tier-2 centers was
181			intended for channel hosting, as the data can be sourced from any
182	acosta	1.8	Tier-1 center in the CMS computing model \cite{model, ctdr}. The FTS channels relied on
183	fisk	1.6	SRM transfers and the FNAL Tier-1 center also supported SRM transfers
184			driven directly from srmcp. The stability of the SRM service at the
185			sites varied, but the percentage of time the transfer succeeded on the
186			first attempt was improved over similar tests during service challenge
187	fisk	1.7	4, indicating the services are maturing.
188	fisk	1.6
189			\subsubsection{CSA06 Experience}
190
191			The architecture deployed in CMS for FTS transfers with channels
192			hosted at the associated Tier-1 centers for the supported Tier-2
193			centers leads to a large number of FTS channels. The number of FTS
194			channels supported at Tier-1 centers was larger than the number of FTS
195			channels supported at CERN. The deployed FTS architecture will be
196			re-examined for scalability and supportability.
197
198	lueking	1.2	\subsection{Data Access}
199	fisk	1.6
200			The CMS application was able to successfully read from the local
201			storage element using RFIO, RFIO2, and dCache during CSA06. During
202			the challenge the local file access was largely sequential and all
203			protocols were able to meet the application input and output needs.
204			The initial goal of the challenge was to reach 1MB/s per batch slot
205			for Tier-1 and Tier-2 centers. On average CMS was able to reach
206			approximately half the anticipated rate, which was improved after the
207			end of the challenge with protocol specific tuning for the
208			application.