ViewVC Help
View File | Revision Log | Show Annotations | Root Listing
root/cvsroot/COMP/CSA06DOC/offlinedb.tex
Revision: 1.6
Committed: Tue Jan 16 02:51:03 2007 UTC (18 years, 3 months ago) by acosta
Content type: application/x-tex
Branch: MAIN
CVS Tags: HEAD
Changes since 1.5: +29 -5 lines
Log Message:
edits from DA

File Contents

# Content
1 \section{Offline Database and Frontier}
2 \subsection{Frontier}
3 The Frontier infrastructure was installed and tested prior to CSA06 and
4 was used for the T0 operation and at T1 and T2 Centers. The goal was
5 to observe the behavior of the frontier central servers at CERN,
6 referred to as the ``launchpad'', and monitor the squids deployed at
7 each participating site. The setup at CERN is shown in
8 Fig.~\ref{fig:frontier-setup}. There were three production servers,
9 each running in tandem a tomcat server and squid server in
10 accelerator mode. Load balancing and failover among the three servers
11 is done via DBS round robin, and this worked flawlessly. The squids
12 were configured in cache peer sharing mode which reduces traffic to
13 the database for non cached objects.
14
15
16
17 \begin{figure}[hbtp]
18 \begin{center}
19 \resizebox{15cm}{!}{\includegraphics{figs/frontier-setup}}
20 \caption{Frontier overview of launchpad and connection to WAN and T0 Farm.}
21 \label{fig:frontier-setup}
22 \end{center}
23 \end{figure}
24
25
26
27 Monitoring was in place to observe the activity for each squid through its SNMP interface and plots for 1) request rate, 2) data throughput and 3) number of cached objects was available for each installed squid. Lemon was used to monitor CPU, Network I/O, and other important machine operating parameters on the servers at CERN.
28
29 Initial tests with 200 T0 clients running CMSSW\_0\_8\_1 were
30 successful for the calibrations available at the time, ECAL and
31 HCAL. However, when CMSSW\_1\_0\_3 was tried a significant fraction
32 ($\sim 5\%$) of jobs ending with segmentation faults was
33 observed. Several additional problems emerged associated with the Si
34 alignment when the software was run for the first time on the T0
35 system for the CSA exercise.
36 The Si Alignment C++ object comprises a vector of vectors,
37 which are translated in POOL-ORA into a very large number of tiny
38 queries to the database. This makes loading the object quite slow, and
39 frontier somewhat slower than direct oracle access. Due to the large
40 number of calls to frontier, the squid access logs filled more quickly
41 than we had observed in our previous testing and we were forced to
42 temporarily turn them off.
43
44 A patch was found for the segmentation fault problem and this was implement and released in CMSSW\_1\_0\_6. The root cause of the seg faults was non-threadsafe code in the SEAL library. By commenting out logging in the CORAL Frontier access libraries it was found the failure rate could be reduced to a few per mil. Additional work is underway to solve this problem.
45
46
47
48 \subsubsection{Performance under T0 processing load}
49 After the problems were resolved, there was a week of extensive operation and the T0 farm was ramped up to 1000 nodes. The number of requests and data throughput is shown in Fig.~\ref{fig:frontier-t0-requests} and Fig.~\ref{fig:frontier-t0-throughput}. This shows how the system behaved under loads ranging from 200 (Sunday) to 1000 (Wednesday) concurrent clients. These figures are for one of the three frontier server machine, although the other two servers looked very similar indicating that the load balancing was working as expected. The blue line in these plots indicate the requests that were not in the squid cache and had to be retrieved from the central database. The observed throughputs for each of three servers was at a maximum of around 660kB/s, which indicates that the 100Mbps network was not a bottleneck. The total throughput for the three servers was 1.8 MB/s.
50
51 Fig.~\ref{fig:launchpad-t0-cpu} shows the server CPU for one of the servers during this same time period. Spikes are observed when new objects are brought into the cache, but there are no severe loads observed. Fig.~\ref{fig:launchpad-t0-1000node-cpu} shows the CPU load for the same server under steady load during the 1000 client T0 test and it remains below 10\% for the duration. The I/O during this same time is shown in Fig.~\ref{fig:launchpad-t0-1000node-io}. The fact that the input to the server is almost two-thirds that of the output was somewhat surprising, but is the result of the HTTP and TCP overhead, and it is significant because its size is the same order as the payload itself for the small objects.
52
53 \begin{figure}[hbtp]
54 \begin{center}
55 \resizebox{15cm}{!}{\includegraphics{figs/frontier-t0-requests}}
56 \caption{Requests to one of the three frontier servers from the T0 processing farm. The number of T0 nodes is ramped up form 200 to 1000 nodes during the time shown on the chart.}
57 \label{fig:frontier-t0-requests}
58 \end{center}
59 \end{figure}
60 \begin{figure}[hbtp]
61 \begin{center}
62 \resizebox{15cm}{!}{\includegraphics{figs/frontier-t0-throughput}}
63 \caption{Data throughput for one of the three frontier servers from the T0 processing farm. The number of T0 nodes is ramped up form 200 to 1000 nodes during the time shown on the chart.}
64 \label{fig:frontier-t0-throughput}
65 \end{center}
66 \end{figure}
67 \begin{figure}[hbtp]
68 \begin{center}
69 \resizebox{15cm}{!}{\includegraphics{figs/launchpad-t0-cpu}}
70 \caption{Frontier server CPU usage during the ramp up of T0 activity.}
71 \label{fig:launchpad-t0-cpu}
72 \end{center}
73 \end{figure}
74 \begin{figure}[hbtp]
75 \begin{center}
76 \resizebox{15cm}{!}{\includegraphics{figs/launchpad-t0-1000node-cpu}}
77 \caption{Steady state CPU usage on Frontier server node during 1000 node T0 operation.}
78 \label{fig:launchpad-t0-1000node-cpu}
79 \end{center}
80 \end{figure}
81 \begin{figure}[hbtp]
82 \begin{center}
83 \resizebox{15cm}{!}{\includegraphics{figs/launchpad-t0-1000node-io}}
84 \caption{Steady state IO on Frontier server node during 1000 node T0 operation.}
85 \label{fig:launchpad-t0-1000node-io}
86 \end{center}
87 \end{figure}
88
89
90 \subsubsection{Tier N operation}
91
92 In addition to the launchpad at CERN, there were 28 Tier-1 and Tier-2 sites where squid was installed and properly configured. Each of these squids is monitored through the SNMP interface and activity and history is available at the web site http://cdfdbfrontier4.fnal.gov:8888/indexcms.html. No remarkable issues were observed during this testing, however the large number of tiny objects problem makes a typical client startup take 15 minutes or more. For data not in the local squid caches, the startup was observed to be as long as 40 minutes.
93
94
95
96
97 \subsubsection{Si Alignment Object Characteristics}
98 To understand the characteristics of the Si object better we looked at the size and number of objects that were being requested.
99 A single run of the RECO081\_onlyCkf.cfg with the patched FrontierAccess had the following frontier statistics:
100 \begin{verbatim}
101 28116 queries
102 138 no-cache queries
103 342 queries of the database version
104 27502 unique queries
105
106 These are the largest payloads (full size = uncompressed):
107
108 1369 byte (full size 12630), 25033 byte (full size 152389)
109 20221 byte (full size 157849), 54251 byte (full size 575482)
110 57316 byte (full size 597911), 109821 byte (full size 843757)
111 392046 byte (full size 2948642), 419859 byte (full size 3250885)
112 411531 byte (full size 3555809), 431981 byte (full size 6728489)
113
114 \end{verbatim}
115
116 Everything else is under 4000 bytes full size, and 99\% of the total
117 queries are 317 bytes full size or smaller. The data was compressed by
118 Frontier for the network transfer, the ``full size'' numbers refer to
119 the uncompressed size.
120
121
122 The performance effect of this very large number (27k+) of small
123 requests per job is being investigated. Job startup time for
124 frontier is about 25\% longer than direct oracle access at CERN (when
125 running one job at a time and when the squid cache has been
126 preloaded). We have prototyped reusing a single persistent TCP
127 connection for all the frontier queries, but it only appears to
128 account for about half of that difference in job startup time. Even
129 with the persistent TCP connection, the small packets keeps the
130 maximum network throughput with many parallel jobs down to around 1
131 Megabyte/second. By contrast, we have seen as high as 35
132 Megabytes/second throughput with larger queries over Gigabit
133 ethernet (at Fermilab). The large number of requests are also
134 responsible for producing squid access log entries at the rate of
135 about 2GB/hour when 400 jobs run in parallel. The bottom line is
136 that many tiny objects is not good for overall performance and must
137 be avoided.
138
139
140 \subsubsection{Site Configuration}
141 Admins at each site are responsible for configuring their squid(s) to coincide with the hardware being used. One important question was whether the instrumentation we have is sufficient to diagnose specific site problems and help the site administrators fix them. One example we encountered during the CSA06 tests was an improperly configured cache at one of the sites.
142 We noticed cached objects (\# in cache) had ''hair'' as seen in Fig.~\ref{fig:frontier-bari-cache-problem}. The requests per minute chart, Fig.~\ref{fig:frontier-bari-cache-problem-requests}, showed that there was a correlation to the unusual features. The precise problem was that their squid configuration had a very small disk cache, causing the objects in the cache to be ``thrashed'' quickly out.
143
144 The other important part of the site configuration is the so-called site-local-config file. This file is a bootstrap for jobs running at the site, and contains the frontier server URL and local squid proxy URLs. Many local-site-configs have been debugged and fixed over the course of CSA06. The CMS job robot started submitting jobs that include frontier access, to an ever increasing number of Tier-1 and Tier-2 sites.
145
146
147 \begin{figure}[hbtp]
148 \begin{center}
149 \resizebox{15cm}{!}{\includegraphics{figs/frontier-bari-cache-problem}}
150 \caption{Squid configuration problem caused cache thrashing as indicated by the ``hair'' on the number of cached objects chart.}
151 \label{fig:frontier-bari-cache-problem}
152 \end{center}
153 \end{figure}
154 \begin{figure}[hbtp]
155 \begin{center}
156 \resizebox{15cm}{!}{\includegraphics{figs/frontier-bari-cache-problem-requests}}
157 \caption{Requests rate during cache configuration problem.}
158 \label{fig:frontier-bari-cache-problem-requests}
159 \end{center}
160 \end{figure}
161
162 \subsubsection{Cache Coherency}
163 One issue of concern for the objects in the Squid cache is cache coherency with the object stored in the central database. CMS has agreed to a policy of never changing objects that are stored into the central database, and ultimately this and other cache refresh options will be implemented. During the startup period, however, it was desired to have a mechanism that would provide periodic cache refresh in case the object was changed. This mechanism is implemented as an expiration time included in the HTTP header of each object which causes it to expire at 5 AM CERN time (3:00 AM UTC) the next day. The effect of this can be seen in Fig.~\ref{fig:frontier-cach-expire-objects} and Fig.~\ref{fig:frontier-cach-expire-requests}. At 22:00 UTC the cache was dumped by hand, and the servelet installed that writes the expiration time in the header. Subsequently, it is observed that the objects expire and are refreshed between 3:00 and 4:00 UTC time.
164
165 \begin{figure}[hbtp]
166 \begin{center}
167 \resizebox{15cm}{!}{\includegraphics{figs/frontier-cach-expire-objects}}
168 \caption{Object count on launchpad server when refresh is done by hand, and through the expiration at 3 am UTC.}
169 \label{fig:frontier-cach-expire-objects}
170 \end{center}
171 \end{figure}
172 \begin{figure}[hbtp]
173 \begin{center}
174 \resizebox{15cm}{!}{\includegraphics{figs/frontier-cach-expire-requests}}
175 \caption{Requests on launchpad server during cache refresh and expiration.}
176 \label{fig:frontier-cach-expire-requests}
177 \end{center}
178 \end{figure}
179 \begin{figure}[hbtp]
180 \begin{center}
181 \resizebox{15cm}{!}{\includegraphics{figs/frontier-cach-expire-cpu}}
182 \caption{CPU usage on launchpad server during cache refresh and expiration.}
183 \label{fig:frontier-cach-expire-cpu}
184 \end{center}
185 \end{figure}
186 \begin{figure}[hbtp]
187 \begin{center}
188 \resizebox{15cm}{!}{\includegraphics{figs/frontier-cach-expire-io}}
189 \caption{I/O on launchpad server during cache refresh and expiration.}
190 \label{fig:frontier-cach-expire-io}
191 \end{center}
192 \end{figure}
193
194 This is an adequate solution for the short term and solves the cache coherency problem to within a few hours. However, reloading every cached object at every site all over the world will have significant performance implications and we must have a better solution for the final system. The impact on the launchpad servers is apparent in Fig.~\ref{fig:frontier-cach-expire-cpu} and Fig.~\ref{fig:frontier-cach-expire-io}, which show spikes in the the CPU and I/O for one of the Frontier server machines as the caches are reloaded.
195 \subsubsection{Conclusion}
196
197 CSA06 Calibration and Alignment DB access via The Frontier infrastructure has been successfully exercised for up to 1000 clients on the T0 farm, and at T1 and T2 sites. The monitoring we have in place is extremely useful to observe the activity and understand performance at several levels of the system.
198 The activity helped to uncover several issues that need additional work including:
199 \begin{itemize}
200 \item Threading problem found that causes seg faults.
201 \item Lots of tiny objects in the SI alignment need to be consolidated.
202 \item Logging of Squid access information can be copious.
203 \item TCP connection overhead should be improved.
204 \item Squid config and local-site-config.
205 \item Cache coherency concern has temporary solution, but needs more work.
206 \end{itemize}
207 These areas and others will be addressed in the future. The configuration of the service at CERN is not final and work is needed to provide a dedicated Squid for the T0 farm. More work at Tier-1 centers will be done to provide failover solutions that will make the service more reliable at that layer, although there were no problems encountered over the course of the CSA06 tests.
208
209