ViewVC Help
View File | Revision Log | Show Annotations | Root Listing
root/cvsroot/COMP/CRAB/python/crab_help.py
Revision: 1.72
Committed: Wed Oct 15 13:55:42 2008 UTC (16 years, 6 months ago) by fanzago
Content type: text/x-python
Branch: MAIN
CVS Tags: CRAB_2_4_2_pre2, CRAB_2_4_2_pre1, CRAB_2_4_1, CRAB_2_4_1_pre4, CRAB_2_4_1_pre3, CRAB_2_4_1_pre2, CRAB_2_4_1_pre1, CRAB_2_4_0_Tutorial, CRAB_2_4_0_Tutorial_pre1, CRAB_2_4_0
Changes since 1.71: +5 -0 lines
Log Message:
added storage_pool

File Contents

# User Rev Content
1 nsmirnov 1.1
2     ###########################################################################
3     #
4     # H E L P F U N C T I O N S
5     #
6     ###########################################################################
7    
8     import common
9    
10     import sys, os, string
11 spiga 1.34
12 nsmirnov 1.1 import tempfile
13    
14     ###########################################################################
15     def usage():
16 slacapra 1.43 print 'in usage()'
17 nsmirnov 1.1 usa_string = common.prog_name + """ [options]
18 slacapra 1.3
19     The most useful general options (use '-h' to get complete help):
20    
21 slacapra 1.26 -create -- Create all the jobs.
22     -submit n -- Submit the first n available jobs. Default is all.
23 slacapra 1.46 -status [range] -- check status of all jobs.
24 spiga 1.60 -getoutput|-get [range] -- get back the output of all jobs: if range is defined, only of selected jobs.
25 ewv 1.64 -extend -- Extend an existing task to run on new fileblocks if there.
26 spiga 1.60 -publish [dbs_url] -- after the getouput, publish the data user in a local DBS instance.
27     -kill [range] -- kill submitted jobs.
28     -resubmit [range] -- resubmit killed/aborted/retrieved jobs.
29 slacapra 1.62 -copyLocal [range] -- copy locally the output stored on remote SE.
30 spiga 1.60 -renewProxy -- renew the proxy on the server.
31     -clean -- gracefully cleanup the directory of a task.
32     -testJdl [range] -- check if resources exist which are compatible with jdl.
33     -list [range] -- show technical job details.
34     -postMortem [range] -- provide a file with information useful for post-mortem analysis of the jobs.
35     -printId [range] -- print the job SID or Task Unique ID while using the server.
36 ewv 1.64 -createJdl [range] -- provide files with a complete Job Description (JDL).
37 slacapra 1.20 -continue|-c [dir] -- Apply command to task stored in [dir].
38 spiga 1.60 -h [format] -- Detailed help. Formats: man (default), tex, html, txt.
39     -cfg fname -- Configuration file name. Default is 'crab.cfg'.
40     -debug N -- set the verbosity level to N.
41     -v -- Print version and exit.
42 nsmirnov 1.1
43 slacapra 1.4 "range" has syntax "n,m,l-p" which correspond to [n,m,l,l+1,...,p-1,p] and all possible combination
44    
45 nsmirnov 1.1 Example:
46 slacapra 1.26 crab -create -submit 1
47 nsmirnov 1.1 """
48 slacapra 1.43 print usa_string
49 nsmirnov 1.1 sys.exit(2)
50    
51     ###########################################################################
52     def help(option='man'):
53     help_string = """
54     =pod
55    
56     =head1 NAME
57    
58     B<CRAB>: B<C>ms B<R>emote B<A>nalysis B<B>uilder
59    
60 slacapra 1.3 """+common.prog_name+""" version: """+common.prog_version_str+"""
61 nsmirnov 1.1
62 slacapra 1.19 This tool B<must> be used from an User Interface and the user is supposed to
63 fanzago 1.37 have a valid Grid certificate.
64 nsmirnov 1.1
65     =head1 SYNOPSIS
66    
67 slacapra 1.13 B<"""+common.prog_name+"""> [I<options>] [I<command>]
68 nsmirnov 1.1
69     =head1 DESCRIPTION
70    
71 ewv 1.52 CRAB is a Python program intended to simplify the process of creation and submission of CMS analysis jobs to the Grid environment .
72 nsmirnov 1.1
73 slacapra 1.3 Parameters for CRAB usage and configuration are provided by the user changing the configuration file B<crab.cfg>.
74 nsmirnov 1.1
75 spiga 1.48 CRAB generates scripts and additional data files for each job. The produced scripts are submitted directly to the Grid. CRAB makes use of BossLite to interface to the Grid scheduler, as well as for logging and bookkeeping.
76 nsmirnov 1.1
77 ewv 1.52 CRAB supports any CMSSW based executable, with any modules/libraries, including user provided ones, and deals with the output produced by the executable. CRAB provides an interface to CMS data discovery services (DBS and DLS), which are completely hidden to the final user. It also splits a task (such as analyzing a whole dataset) into smaller jobs, according to user requirements.
78 nsmirnov 1.1
79 slacapra 1.46 CRAB can be used in two ways: StandAlone and with a Server.
80     The StandAlone mode is suited for small task, of the order of O(100) jobs: it submits the jobs directly to the scheduler, and these jobs are under user responsibility.
81 ewv 1.52 In the Server mode, suited for larger tasks, the jobs are prepared locally and then passed to a dedicated CRAB server, which then interacts with the scheduler on behalf of the user, including additional services, such as automatic resubmission, status caching, output retrieval, and more.
82 slacapra 1.46 The CRAB commands are exactly the same in both cases.
83    
84 slacapra 1.13 CRAB web page is available at
85    
86     I<http://cmsdoc.cern.ch/cms/ccs/wm/www/Crab/>
87 slacapra 1.6
88 slacapra 1.19 =head1 HOW TO RUN CRAB FOR THE IMPATIENT USER
89    
90 ewv 1.52 Please, read all the way through in any case!
91 slacapra 1.19
92     Source B<crab.(c)sh> from the CRAB installation area, which have been setup either by you or by someone else for you.
93    
94 ewv 1.52 Modify the CRAB configuration file B<crab.cfg> according to your need: see below for a complete list. A template and commented B<crab.cfg> can be found on B<$CRABDIR/python/crab.cfg>
95 slacapra 1.19
96 ewv 1.44 ~>crab -create
97 slacapra 1.19 create all jobs (no submission!)
98    
99 spiga 1.25 ~>crab -submit 2 -continue [ui_working_dir]
100 slacapra 1.19 submit 2 jobs, the ones already created (-continue)
101    
102 slacapra 1.26 ~>crab -create -submit 2
103 slacapra 1.19 create _and_ submit 2 jobs
104    
105 spiga 1.25 ~>crab -status
106 slacapra 1.19 check the status of all jobs
107    
108 spiga 1.25 ~>crab -getoutput
109 slacapra 1.19 get back the output of all jobs
110    
111 ewv 1.44 ~>crab -publish
112     publish all user outputs in the DBS specified in the crab.cfg (dbs_url_for_publication) or written as argument of this option
113 fanzago 1.42
114 slacapra 1.20 =head1 RUNNING CMSSW WITH CRAB
115 nsmirnov 1.1
116 slacapra 1.3 =over 4
117    
118     =item B<A)>
119    
120 ewv 1.52 Develop your code in your CMSSW working area. Do anything which is needed to run interactively your executable, including the setup of run time environment (I<eval `scramv1 runtime -sh|csh`>), a suitable I<ParameterSet>, etc. It seems silly, but B<be extra sure that you actually did compile your code> I<scramv1 b>.
121 slacapra 1.3
122 ewv 1.44 =item B<B)>
123 slacapra 1.3
124 slacapra 1.20 Source B<crab.(c)sh> from the CRAB installation area, which have been setup either by you or by someone else for you. Modify the CRAB configuration file B<crab.cfg> according to your need: see below for a complete list.
125    
126     The most important parameters are the following (see below for complete description of each parameter):
127    
128     =item B<Mandatory!>
129    
130     =over 6
131    
132     =item B<[CMSSW]> section: datasetpath, pset, splitting parameters, output_file
133    
134     =item B<[USER]> section: output handling parameters, such as return_data, copy_data etc...
135    
136     =back
137    
138     =item B<Run it!>
139    
140 fanzago 1.37 You must have a valid voms-enabled Grid proxy. See CRAB web page for details.
141 slacapra 1.20
142     =back
143    
144 slacapra 1.19 =head1 HOW TO RUN ON CONDOR-G
145    
146     The B<Condor-G> mode for B<CRAB> is a special submission mode next to the standard Resource Broker submission. It is designed to submit jobs directly to a site and not using the Resource Broker.
147    
148 ewv 1.52 Due to the nature of B<Condor-G> submission, the B<Condor-G> mode is restricted to OSG sites within the CMS Grid, currently the 7 US T2: Florida(ufl.edu), Nebraska(unl.edu), San Diego(ucsd.edu), Purdue(purdue.edu), Wisconsin(wisc.edu), Caltech(ultralight.org), MIT(mit.edu).
149 slacapra 1.19
150     =head2 B<Requirements:>
151    
152     =over 2
153    
154     =item installed and running local Condor scheduler
155    
156     (either installed by the local Sysadmin or self-installed using the VDT user interface: http://www.uscms.org/SoftwareComputing/UserComputing/Tutorials/vdt.html)
157    
158     =item locally available LCG or OSG UI installation
159    
160 ewv 1.44 for authentication via Grid certificate proxies ("voms-proxy-init -voms cms" should result in valid proxy)
161 slacapra 1.19
162 ewv 1.52 =item set the environment variable EDG_WL_LOCATION to the edg directory of the local LCG or OSG UI installation
163 slacapra 1.19
164     =back
165    
166     =head2 B<What the Condor-G mode can do:>
167    
168     =over 2
169    
170 ewv 1.52 =item submission directly to multiple OSG sites,
171 slacapra 1.19
172 ewv 1.52 the requested dataset must be published correctly by the site in the local and global services.
173     Previous restrictions on submitting only to a single site have been removed. SE and CE whitelisting
174     and blacklisting work as in the other modes.
175 slacapra 1.19
176     =back
177    
178     =head2 B<What the Condor-G mode cannot do:>
179    
180     =over 2
181    
182     =item submit jobs if no condor scheduler is running on the submission machine
183    
184     =item submit jobs if the local condor installation does not provide Condor-G capabilities
185    
186 ewv 1.52 =item submit jobs to an LCG site
187 slacapra 1.19
188 fanzago 1.37 =item support Grid certificate proxy renewal via the myproxy service
189 slacapra 1.19
190     =back
191    
192     =head2 B<CRAB configuration for Condor-G mode:>
193    
194 ewv 1.52 The CRAB configuration for the Condor-G mode only requires one change in crab.cfg:
195 nsmirnov 1.1
196 slacapra 1.19 =over 2
197 slacapra 1.3
198 slacapra 1.19 =item select condor_g Scheduler:
199 slacapra 1.4
200 slacapra 1.19 scheduler = condor_g
201 slacapra 1.4
202 slacapra 1.19 =back
203 slacapra 1.4
204 ewv 1.52 =head1 COMMANDS
205 slacapra 1.4
206     =over 4
207    
208 slacapra 1.26 =item B<-create>
209 slacapra 1.4
210 slacapra 1.26 Create the jobs: from version 1_3_0 it is only possible to create all jobs.
211 ewv 1.52 The maximum number of jobs depends on dataset and splitting directives. This set of identical jobs accessing the same dataset are defined as a task.
212 slacapra 1.4 This command create a directory with default name is I<crab_0_date_time> (can be changed via ui_working_dir parameter, see below). Inside this directory it is placed whatever is needed to submit your jobs. Also the output of your jobs (once finished) will be place there (see after). Do not cancel by hand this directory: rather use -clean (see).
213     See also I<-continue>.
214    
215 slacapra 1.46 =item B<-submit [range]>
216 slacapra 1.4
217 slacapra 1.46 Submit n jobs: 'n' is either a positive integer or 'all' or a [range]. Default is all.
218 ewv 1.52 If 'n' is passed as argument, the first 'n' suitable jobs will be submitted. Please note that this is behaviour is different from other commands, where -command N means act the command to the job N, and not to the first N jobs. If a [range] is passed, the selected jobs will be submitted.
219 slacapra 1.46 This option must be used in conjunction with -create (to create and submit immediately) or with -continue (which is assumed by default), to submit previously created jobs. Failure to do so will stop CRAB and generate an error message. See also I<-continue>.
220 slacapra 1.4
221     =item B<-continue [dir] | -c [dir]>
222    
223     Apply the action on the task stored on directory [dir]. If the task directory is the standard one (crab_0_date_time), the more recent in time is taken. Any other directory must be specified.
224 slacapra 1.46 Basically all commands (but -create) need -continue, so it is automatically assumed. Of course, the standard task directory is used in this case.
225 slacapra 1.4
226 slacapra 1.26 =item B<-status>
227 nsmirnov 1.1
228 spiga 1.48 Check the status of the jobs, in all states. All the info (e.g. application and wrapper exit codes) will be available only after the output retrieval.
229 nsmirnov 1.1
230 slacapra 1.20 =item B<-getoutput|-get [range]>
231 nsmirnov 1.1
232 slacapra 1.20 Retrieve the output declared by the user via the output sandbox. By default the output will be put in task working dir under I<res> subdirectory. This can be changed via config parameters. B<Be extra sure that you have enough free space>. See I<range> below for syntax.
233 nsmirnov 1.1
234 fanzago 1.42 =item B<-publish [dbs_url]>
235    
236     Publish user output in a local DBS instance after retrieving of output. By default the publish uses the dbs_url_for_publication specified in the crab.cfg file, otherwise you can write it as argument of this option.
237    
238 slacapra 1.4 =item B<-resubmit [range]>
239 nsmirnov 1.1
240 fanzago 1.37 Resubmit jobs which have been previously submitted and have been either I<killed> or are I<aborted>. See I<range> below for syntax.
241     The resubmit option can be used only with CRAB without server. For the server this option will be implemented as soon as possible
242 nsmirnov 1.1
243 spiga 1.60 =item B<-extend>
244    
245 ewv 1.64 Create new jobs for an existing task, checking if new blocks are available for the given dataset.
246 spiga 1.60
247 slacapra 1.4 =item B<-kill [range]>
248 nsmirnov 1.1
249 slacapra 1.4 Kill (cancel) jobs which have been submitted to the scheduler. A range B<must> be used in all cases, no default value is set.
250 nsmirnov 1.1
251 slacapra 1.58 =item B<-copyLocal [range]>
252    
253     Copy locally (on current working directory) the output previously stored on remote SE by the jobs. Of course, only if copy_data option has been set. Uses I<lcg-cp>
254    
255 mcinquil 1.59 =item B<-renewProxy >
256    
257     If using the server modality, this command allows to delegate a valid long proxy to the server associated with the task.
258    
259 slacapra 1.4 =item B<-testJdl [range]>
260 nsmirnov 1.1
261 fanzago 1.71 Check if the job can find compatible resources. It is equivalent of doing I<edg-job-list-match> on edg.
262 nsmirnov 1.1
263 slacapra 1.20 =item B<-printId [range]>
264    
265 slacapra 1.46 Just print the job identifier, which can be the SID (Grid job identifier) of the job(s) or the taskId if you are using CRAB with the server or local scheduler Id.
266 slacapra 1.20
267 spiga 1.53 =item B<-printJdl [range]>
268    
269 ewv 1.64 Collect the full Job Description in a file located under share directory. The file base name is File- .
270 spiga 1.53
271 slacapra 1.4 =item B<-postMortem [range]>
272 nsmirnov 1.1
273 slacapra 1.46 Try to collect more information of the job from the scheduler point of view.
274 nsmirnov 1.1
275 slacapra 1.13 =item B<-list [range]>
276    
277 ewv 1.52 Dump technical information about jobs: for developers only.
278 slacapra 1.13
279 slacapra 1.4 =item B<-clean [dir]>
280 nsmirnov 1.1
281 slacapra 1.26 Clean up (i.e. erase) the task working directory after a check whether there are still running jobs. In case, you are notified and asked to kill them or retrieve their output. B<Warning> this will possibly delete also the output produced by the task (if any)!
282 nsmirnov 1.1
283 slacapra 1.4 =item B<-help [format] | -h [format]>
284 nsmirnov 1.1
285 slacapra 1.4 This help. It can be produced in three different I<format>: I<man> (default), I<tex> and I<html>.
286 nsmirnov 1.1
287 slacapra 1.4 =item B<-v>
288 nsmirnov 1.1
289 slacapra 1.4 Print the version and exit.
290 nsmirnov 1.1
291 slacapra 1.4 =item B<range>
292 nsmirnov 1.1
293 slacapra 1.13 The range to be used in many of the above commands has the following syntax. It is a comma separated list of jobs ranges, each of which may be a job number, or a job range of the form first-last.
294 slacapra 1.4 Example: 1,3-5,8 = {1,3,4,5,8}
295 nsmirnov 1.1
296 ewv 1.44 =back
297 slacapra 1.6
298 slacapra 1.4 =head1 OPTION
299 nsmirnov 1.1
300 slacapra 1.6 =over 4
301    
302 slacapra 1.4 =item B<-cfg [file]>
303 nsmirnov 1.1
304 slacapra 1.4 Configuration file name. Default is B<crab.cfg>.
305 nsmirnov 1.1
306 slacapra 1.4 =item B<-debug [level]>
307 nsmirnov 1.1
308 slacapra 1.13 Set the debug level: high number for high verbosity.
309 nsmirnov 1.1
310 ewv 1.44 =back
311 slacapra 1.6
312 slacapra 1.5 =head1 CONFIGURATION PARAMETERS
313    
314 spiga 1.25 All the parameter describe in this section can be defined in the CRAB configuration file. The configuration file has different sections: [CRAB], [USER], etc. Each parameter must be defined in its proper section. An alternative way to pass a config parameter to CRAB is via command line interface; the syntax is: crab -SECTION.key value . For example I<crab -USER.outputdir MyDirWithFullPath> .
315 slacapra 1.5 The parameters passed to CRAB at the creation step are stored, so they cannot be changed by changing the original crab.cfg . On the other hand the task is protected from any accidental change. If you want to change any parameters, this require the creation of a new task.
316 slacapra 1.6 Mandatory parameters are flagged with a *.
317 slacapra 1.5
318     B<[CRAB]>
319 slacapra 1.6
320 slacapra 1.13 =over 4
321 slacapra 1.5
322 slacapra 1.6 =item B<jobtype *>
323 slacapra 1.5
324 slacapra 1.26 The type of the job to be executed: I<cmssw> jobtypes are supported
325 slacapra 1.6
326     =item B<scheduler *>
327    
328 ewv 1.52 The scheduler to be used: I<glitecoll> is the more efficient grid scheduler and should be used. Other choice are I<glite>, same as I<glitecoll> but without bulk submission (and so slower) or I<condor_g> (see specific paragraph) or I<edg> which is the former Grid scheduler, which will be dismissed in some future
329     From version 210, also local scheduler are supported, for the time being only at CERN. I<LSF> is the standard CERN local scheduler or I<CAF> which is LSF dedicated to CERN Analysis Facilities.
330 slacapra 1.5
331 mcinquil 1.35 =item B<server_name>
332    
333 spiga 1.49 To use the CRAB-server support it is needed to fill this key with server name as <Server_DOMAIN> (e.g. cnaf,fnal). If I<server_name=None> crab works in standalone way.
334 spiga 1.48 The server available to users can be found from CRAB web page.
335 mcinquil 1.35
336 slacapra 1.5 =back
337    
338 slacapra 1.20 B<[CMSSW]>
339    
340     =over 4
341    
342 slacapra 1.22 =item B<datasetpath *>
343 slacapra 1.20
344 slacapra 1.22 the path of processed dataset as defined on the DBS. It comes with the format I</PrimaryDataset/DataTier/Process> . In case no input is needed I<None> must be specified.
345 slacapra 1.20
346 afanfani 1.50 =item B<runselection *>
347 ewv 1.52
348 afanfani 1.50 within a dataset you can restrict to run on a specific run number or run number range. For example runselection=XYZ or runselection=XYZ1-XYZ2 .
349    
350 spiga 1.57 =item B<use_parent *>
351    
352 ewv 1.65 within a dataset you can ask to run over the related parent files too. E.g., this will give you access to the RAW data while running over a RECO sample. Setting use_parent=True CRAB determines the parent files from DBS and will add secondaryFileNames = cms.untracked.vstring( <LIST of parent FIles> ) to the pool source section of your parameter set.
353 spiga 1.57
354 slacapra 1.22 =item B<pset *>
355 slacapra 1.20
356 ewv 1.64 the ParameterSet to be used. Both .cfg and .py parameter sets are supported for the relevant versions of CMSSW.
357 slacapra 1.20
358 slacapra 1.26 =item I<Of the following three parameter exactly two must be used, otherwise CRAB will complain.>
359 slacapra 1.20
360 slacapra 1.22 =item B<total_number_of_events *>
361    
362 slacapra 1.26 the number of events to be processed. To access all available events, use I<-1>. Of course, the latter option is not viable in case of no input. In this case, the total number of events will be used to split the task in jobs, together with I<event_per_job>.
363 slacapra 1.22
364 slacapra 1.26 =item B<events_per_job*>
365 slacapra 1.22
366 slacapra 1.26 number of events to be accessed by each job. Since a job cannot cross the boundary of a fileblock it might be that the actual number of events per job is not exactly what you asked for. It can be used also with No input.
367 slacapra 1.22
368     =item B<number_of_jobs *>
369    
370     Define the number of job to be run for the task. The number of event for each job is computed taking into account the total number of events required as well as the granularity of EventCollections. Can be used also with No input.
371    
372     =item B<output_file *>
373    
374 slacapra 1.63 the output files produced by your application (comma separated list). From CRAB 2_2_2 onward, if TFileService is defined in user Pset, the corresponding output file is automatically added to the list of output files. User can avoid this by setting B<skip_TFileService_output> = 1 (default is 0 == file included). The Edm output produced via PoolOutputModule can be automatically added by setting B<get_edm_output> = 1 (default is 0 == no)
375 slacapra 1.61
376     =item B<skip_TFileService_output>
377    
378     Force CRAB to skip the inclusion of file produced by TFileService to list of output files. Default is I<0>, namely the file is included.
379 slacapra 1.20
380 slacapra 1.63 =item B<get_edm_output>
381    
382     Force CRAB to add the EDM output file, as defined in PSET in PoolOutputModule (if any) to be added to the list of output files. Default is 0 (== no inclusion)
383    
384 ewv 1.47 =item B<increment_seeds>
385    
386     Specifies a comma separated list of seeds to increment from job to job. The initial value is taken
387     from the CMSSW config file. I<increment_seeds=sourceSeed,g4SimHits> will set sourceSeed=11,12,13 and g4SimHits=21,22,23 on
388     subsequent jobs if the values of the two seeds are 10 and 20 in the CMSSW config file.
389    
390     See also I<preserve_seeds>. Seeds not listed in I<increment_seeds> or I<preserve_seeds> are randomly set for each job.
391    
392     =item B<preserve_seeds>
393    
394 fanzago 1.71 Specifies a comma separated list of seeds to which CRAB will not change from their values in the user
395 ewv 1.47 CMSSW config file. I<preserve_seeds=sourceSeed,g4SimHits> will leave the Pythia and GEANT seeds the same for every job.
396    
397     See also I<increment_seeds>. Seeds not listed in I<increment_seeds> or I<preserve_seeds> are randomly set for each job.
398    
399 slacapra 1.30 =item B<first_run>
400    
401     First run to be generated in a generation jobs. Relevant only for no-input workflow.
402    
403 slacapra 1.31 =item B<executable>
404 slacapra 1.30
405 slacapra 1.31 The name of the executable to be run on remote WN. The default is cmsrun. The executable is either to be found on the release area of the WN, or has been built on user working area on the UI and is (automatically) shipped to WN. If you want to run a script (which might internally call I<cmsrun>, use B<USER.script_exe> instead.
406 slacapra 1.30
407     =item I<DBS and DLS parameters:>
408    
409 slacapra 1.26 =item B<dbs_url>
410 slacapra 1.6
411 slacapra 1.40 The URL of the DBS query page. For expert only.
412 slacapra 1.13
413     =back
414    
415     B<[USER]>
416    
417     =over 4
418    
419 slacapra 1.6 =item B<additional_input_files>
420    
421 spiga 1.67 Any additional input file you want to ship to WN: comma separated list. IMPORTANT NOTE: they will be placed in the WN working dir, and not in ${CMS_SEARCH_PATH}. Specific files required by CMSSW application must be placed in the local data directory, which will be automatically shipped by CRAB itself. You do not need to specify the I<ParameterSet> you are using, which will be included automatically. Wildcards are allowed.
422 slacapra 1.6
423 slacapra 1.31 =item B<script_exe>
424    
425 spiga 1.67 A user script that will be run on WN (instead of default cmsrun). It is up to the user to setup properly the script itself to run on WN enviroment. CRAB guarantees that the CMSSW environment is setup (e.g. scram is in the path) and that the modified pset.cfg will be placed in the working directory, with name CMSSW.cfg . The user must ensure that a job report named crab_fjr.xml will be written. This can be guaranteed by passing the arguments "-j crab_fjr.xml" to cmsRun in the script. The script itself will be added automatically to the input sandbox so user MUST NOT add it within the B<USER.additional_input_files>.
426 slacapra 1.31
427 slacapra 1.6 =item B<ui_working_dir>
428    
429 ewv 1.52 Name of the working directory for the current task. By default, a name I<crab_0_(date)_(time)> will be used. If this card is set, any CRAB command which require I<-continue> need to specify also the name of the working directory. A special syntax is also possible, to reuse the name of the dataset provided before: I<ui_working_dir : %(dataset)s> . In this case, if e.g. the dataset is SingleMuon, the ui_working_dir will be set to SingleMuon as well.
430 slacapra 1.6
431 mcinquil 1.35 =item B<thresholdLevel>
432    
433     This has to be a value between 0 and 100, that indicates the percentage of task completeness (jobs in a ended state are complete, even if failed). The server will notify the user by e-mail (look at the field: B<eMail>) when the task will reach the specified threshold. Works just with the server_mode = 1.
434    
435     =item B<eMail>
436    
437 ewv 1.52 The server will notify the specified e-mail when the task will reaches the specified B<thresholdLevel>. A notification is also sent when the task will reach the 100\% of completeness. This field can also be a list of e-mail: "B<eMail = user1@cern.ch, user2@cern.ch>". Works just with the server_mode = 1.
438 mcinquil 1.35
439 slacapra 1.6 =item B<return_data *>
440    
441 ewv 1.52 The output produced by the executable on WN is returned (via output sandbox) to the UI, by issuing the I<-getoutput> command. B<Warning>: this option should be used only for I<small> output, say less than 10MB, since the sandbox cannot accommodate big files. Depending on Resource Broker used, a size limit on output sandbox can be applied: bigger files will be truncated. To be used in alternative to I<copy_data>.
442 slacapra 1.6
443     =item B<outputdir>
444    
445 ewv 1.52 To be used together with I<return_data>. Directory on user interface where to store the output. Full path is mandatory, "~/" is not allowed: the default location of returned output is ui_working_dir/res .
446 slacapra 1.6
447     =item B<logdir>
448    
449 ewv 1.52 To be used together with I<return_data>. Directory on user interface where to store the standard output and error. Full path is mandatory, "~/" is not allowed: the default location of returned output is ui_working_dir/res .
450 slacapra 1.6
451     =item B<copy_data *>
452    
453 ewv 1.52 The output (only that produced by the executable, not the std-out and err) is copied to a Storage Element of your choice (see below). To be used as an alternative to I<return_data> and recommended in case of large output.
454 slacapra 1.6
455     =item B<storage_element>
456    
457 fanzago 1.71 To be used with <copy_data>=1
458     If you want to copy the output of your analysis in a official CMS Tier2 or Tier3, you have to write the CMS Site Name of the site, as written in the SiteDB https://cmsweb.cern.ch/sitedb/reports/showReport?reportid=se_cmsname_map.ini (i.e T2_IT_legnaro). You have also to specify the <remote_dir>(see below)
459    
460     If you want to copy the output in a not_official_CMS remote site you have to specify the complete storage element name (i.e se.xxx.infn.it).You have also to specify the <storage_path> and the <storage_port> if you do not use the default one(see below).
461    
462     =item B<user_remote_dir>
463    
464     To be used with <copy_data>=1 and <storage_element> official CMS sites.
465     This is the directory where your output will be stored. This directory will be created under the mountpoint of the official CMS storage Element. The mountpoint is discovered by CRAB.
466 slacapra 1.6
467     =item B<storage_path>
468    
469 fanzago 1.71 To be used with <copy_data>=1 and <storage_element> not official CMS sites.
470     This is the full path of the Storage Element writeable by all, the mountpoint of SE (i.e /srm/managerv2?SFN=/pnfs/se.xxx.infn.it/yyy/zzz/)
471    
472     =item B<lfn>
473    
474     To be used with <copy_data>=1 and <storage_element> not official CMS sites.
475     This is the directory or tree of directories that CRAB will create under the storage path of the SE. Here your produced output will be stored.This part of the path will be used as logical file name of your files in the case of publication
476 slacapra 1.6
477 fanzago 1.72 =item B<storage_pool>
478    
479     If you are using CAF scheduler, you can specify the storage pool where to write your output.
480     The default is cmscafuser. If you do not want to use the default, you can overwrite it specifing None
481    
482 spiga 1.70 =item B<storage_port>
483    
484     To choose the storage port specify I<storage_port> = N (default is 8443) .
485    
486 fanzago 1.71 =item B<publish_data*>
487    
488     To be used with <copy_data>=1
489     To publish your produced output in a local istance of DBS set publish_data = 1
490     All the details about how to use this functionality are written in https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideCrabHowTo.
491     N.B 1) if you are using an official CMS site to stored data, the remote dir will be not considered. The directory where data will be stored is decided by CRAB, following the CMS policy in order to be able to re-read published data.
492     2) if you are using a not official CMS site to store data, you have to check the <lfn>, that will be part of the logical file name of you published files, in order to be able to re-read the data.
493    
494     =item B<publish_data_name>
495    
496     You produced output will be published in your local DBS with dataset name <primarydataset>/<publish_data_name>/USER
497    
498     =item B<dbs_url_for_publication>
499    
500     Specify the URL of your local DBS istance where CRAB has to publish the output files
501    
502 spiga 1.55 =item B<srm_version>
503 slacapra 1.46
504 spiga 1.69 To choose the srm version specify I<srm_version> = (srmv1 or srmv2).
505 slacapra 1.46
506 spiga 1.51 =item B<xml_report>
507    
508     To be used to switch off the screen report during the status query, enabling the db serialization in a file. Specifying I<xml_report> = FileName CRAB will serialize the DB into CRAB_WORKING_DIR/share/FileName.
509 slacapra 1.6
510 spiga 1.55 =item B<usenamespace>
511    
512 ewv 1.64 To use the automate namespace definition (perfomed by CRAB) it is possible to set I<usenamespace>=1. The same policy used for the stage out in case of data publication will be applied.
513 spiga 1.54
514 spiga 1.55 =item B<debug_wrapper>
515    
516 ewv 1.64 To enable the higer verbose level on wrapper specify I<debug_wrapper> = True. The Pset contents before and after the CRAB maipulation will be written together with other useful infos.
517 spiga 1.54
518 slacapra 1.68 =item B<dontCheckSpaceLeft>
519    
520     Set it to 1 to skip the check of free space left on your working directory before attempting to get the output back. Default is 0 (=False)
521    
522 slacapra 1.6 =back
523    
524 slacapra 1.5 B<[EDG]>
525 nsmirnov 1.1
526 slacapra 1.13 =over 4
527 slacapra 1.6
528 slacapra 1.13 =item B<RB>
529 slacapra 1.6
530 ewv 1.52 Which RB you want to use instead of the default one, as defined in the configuration of your UI. The ones available for CMS are I<CERN> and I<CNAF>. They are actually identical, being a collection of all RB/WMS available for CMS: the configuration files needed to change the broker will be automatically downloaded from CRAB web page and used.
531     You can use any other RB which is available, if you provide the proper configuration files. E.g., for RB XYZ, you should provide I<edg_wl_ui.conf.CMS_XYZ> and I<edg_wl_ui_cmd_var.conf.CMS_XYZ> for EDG RB, or I<glite.conf.CMS_XYZ> for glite WMS. These files are searched for in the current working directory, and, if not found, on crab web page. So, if you put your private configuration files in the working directory, they will be used, even if they are not available on crab web page.
532 slacapra 1.29 Please get in contact with crab team if you wish to provide your RB or WMS as a service to the CMS community.
533 slacapra 1.6
534 slacapra 1.14 =item B<proxy_server>
535    
536     The proxy server to which you delegate the responsibility to renew your proxy once expired. The default is I<myproxy.cern.ch> : change only if you B<really> know what you are doing.
537    
538 slacapra 1.26 =item B<role>
539    
540     The role to be set in the VOMS. See VOMS documentation for more info.
541    
542 slacapra 1.27 =item B<group>
543    
544     The group to be set in the VOMS, See VOMS documentation for more info.
545    
546 slacapra 1.28 =item B<dont_check_proxy>
547    
548 ewv 1.52 If you do not want CRAB to check your proxy. The creation of the proxy (with proper length), its delegation to a myproxyserver is your responsibility.
549 slacapra 1.28
550 slacapra 1.6 =item B<requirements>
551    
552     Any other requirements to be add to JDL. Must be written in compliance with JDL syntax (see LCG user manual for further info). No requirement on Computing element must be set.
553    
554 slacapra 1.27 =item B<additional_jdl_parameters:>
555    
556 spiga 1.48 Any other parameters you want to add to jdl file:semicolon separated list, each
557 ewv 1.44 item B<must> be complete, including the closing ";".
558 spiga 1.48
559     =item B<wms_service>
560    
561 fanzago 1.71 With this field it is also possible to specify which WMS you want to use (https://hostname:port/pathcode) where "hostname" is WMS name, the "port" generally is 7443 and the "pathcode" should be something like "glite_wms_wmproxy_server".
562 slacapra 1.27
563 slacapra 1.6 =item B<max_cpu_time>
564    
565     Maximum CPU time needed to finish one job. It will be used to select a suitable queue on the CE. Time in minutes.
566    
567     =item B<max_wall_clock_time>
568    
569     Same as previous, but with real time, and not CPU one.
570    
571     =item B<CE_black_list>
572    
573 ewv 1.66 All the CE (Computing Element) whose name contains the following strings (comma separated list) will not be considered for submission. Use the dns domain (e.g. fnal, cern, ifae, fzk, cnaf, lnl,....). You may use hostnames or CMS Site names (T2_DE_DESY) or substrings.
574 slacapra 1.6
575     =item B<CE_white_list>
576    
577 ewv 1.66 Only the CE (Computing Element) whose name contains the following strings (comma separated list) will be considered for submission. Use the dns domain (e.g. fnal, cern, ifae, fzk, cnaf, lnl,....). You may use hostnames or CMS Site names (T2_DE_DESY) or substrings. Please note that if the selected CE(s) does not contain the data you want to access, no submission can take place.
578 slacapra 1.27
579     =item B<SE_black_list>
580    
581 ewv 1.66 All the SE (Storage Element) whose name contains the following strings (comma separated list) will not be considered for submission.It works only if a datasetpath is specified. You may use hostnames or CMS Site names (T2_DE_DESY) or substrings.
582 slacapra 1.27
583     =item B<SE_white_list>
584    
585 ewv 1.66 Only the SE (Storage Element) whose name contains the following strings (comma separated list) will be considered for submission.It works only if a datasetpath is specified. Please note that if the selected CE(s) does not contain the data you want to access, no submission can take place. You may use hostnames or CMS Site names (T2_DE_DESY) or substrings.
586 slacapra 1.6
587     =item B<virtual_organization>
588    
589 mcinquil 1.35 You don\'t want to change this: it\'s cms!
590 slacapra 1.6
591     =item B<retry_count>
592    
593 fanzago 1.37 Number of time the Grid will try to resubmit your job in case of Grid related problem.
594 slacapra 1.6
595 slacapra 1.27 =item B<shallow_retry_count>
596    
597 fanzago 1.37 Number of time shallow resubmission the Grid will try: resubmissions are tried B<only> if the job aborted B<before> start. So you are guaranteed that your jobs run strictly once.
598 slacapra 1.27
599 slacapra 1.30 =item B<maxtarballsize>
600    
601     Maximum size of tar-ball in Mb. If bigger, an error will be generated. The actual limit is that on the RB input sandbox. Default is 9.5 Mb (sandbox limit is 10 Mb)
602    
603 spiga 1.55 =item B<skipwmsauth>
604    
605 ewv 1.64 Temporary useful parameter to allow the WMSAuthorisation handling. Specifying I<skipwmsauth> = 1 the pyopenssl problmes will disappear. It is needed working on gLite UI outside of CERN.
606 spiga 1.55
607 slacapra 1.6 =back
608    
609 spiga 1.55 B<[LSF]> or B<[CAF]>
610 slacapra 1.46
611     =over 4
612    
613     =item B<queue>
614    
615 ewv 1.52 The LSF queue you want to use: if none, the default one will be used. For CAF, the proper queue will be automatically selected.
616 slacapra 1.46
617     =item B<resource>
618    
619     The resources to be used within a LSF queue. Again, for CAF, the right one is selected.
620    
621 spiga 1.55 =item B<copyCommand>
622    
623     To define the command to be used to copy both Input and Output sandboxes to final location. Default is cp
624 slacapra 1.46
625     =back
626    
627 nsmirnov 1.1 =head1 FILES
628    
629 slacapra 1.6 I<crab> uses a configuration file I<crab.cfg> which contains configuration parameters. This file is written in the INI-style. The default filename can be changed by the I<-cfg> option.
630 nsmirnov 1.1
631 slacapra 1.6 I<crab> creates by default a working directory 'crab_0_E<lt>dateE<gt>_E<lt>timeE<gt>'
632 nsmirnov 1.1
633     I<crab> saves all command lines in the file I<crab.history>.
634    
635     =head1 HISTORY
636    
637 ewv 1.52 B<CRAB> is a tool for the CMS analysis on the Grid environment. It is based on the ideas from CMSprod, a production tool originally implemented by Nikolai Smirnov.
638 nsmirnov 1.1
639     =head1 AUTHORS
640    
641     """
642     author_string = '\n'
643     for auth in common.prog_authors:
644     #author = auth[0] + ' (' + auth[2] + ')' + ' E<lt>'+auth[1]+'E<gt>,\n'
645     author = auth[0] + ' E<lt>' + auth[1] +'E<gt>,\n'
646     author_string = author_string + author
647     pass
648     help_string = help_string + author_string[:-2] + '.'\
649     """
650    
651     =cut
652 slacapra 1.19 """
653 nsmirnov 1.1
654     pod = tempfile.mktemp()+'.pod'
655     pod_file = open(pod, 'w')
656     pod_file.write(help_string)
657     pod_file.close()
658    
659     if option == 'man':
660     man = tempfile.mktemp()
661     pod2man = 'pod2man --center=" " --release=" " '+pod+' >'+man
662     os.system(pod2man)
663     os.system('man '+man)
664     pass
665     elif option == 'tex':
666     fname = common.prog_name+'-v'+common.prog_version_str
667     tex0 = tempfile.mktemp()+'.tex'
668     pod2tex = 'pod2latex -full -out '+tex0+' '+pod
669     os.system(pod2tex)
670     tex = fname+'.tex'
671     tex_old = open(tex0, 'r')
672     tex_new = open(tex, 'w')
673     for s in tex_old.readlines():
674     if string.find(s, '\\begin{document}') >= 0:
675     tex_new.write('\\title{'+common.prog_name+'\\\\'+
676     '(Version '+common.prog_version_str+')}\n')
677     tex_new.write('\\author{\n')
678     for auth in common.prog_authors:
679     tex_new.write(' '+auth[0]+
680     '\\thanks{'+auth[1]+'} \\\\\n')
681     tex_new.write('}\n')
682     tex_new.write('\\date{}\n')
683     elif string.find(s, '\\tableofcontents') >= 0:
684     tex_new.write('\\maketitle\n')
685     continue
686     elif string.find(s, '\\clearpage') >= 0:
687     continue
688     tex_new.write(s)
689     tex_old.close()
690     tex_new.close()
691     print 'See '+tex
692     pass
693     elif option == 'html':
694     fname = common.prog_name+'-v'+common.prog_version_str+'.html'
695     pod2html = 'pod2html --title='+common.prog_name+\
696     ' --infile='+pod+' --outfile='+fname
697     os.system(pod2html)
698     print 'See '+fname
699     pass
700 slacapra 1.33 elif option == 'txt':
701     fname = common.prog_name+'-v'+common.prog_version_str+'.txt'
702     pod2text = 'pod2text '+pod+' '+fname
703     os.system(pod2text)
704     print 'See '+fname
705     pass
706 nsmirnov 1.1
707     sys.exit(0)