ViewVC Help
View File | Revision Log | Show Annotations | Root Listing
root/cvsroot/COMP/CRAB/python/crab_help.py
Revision: 1.182
Committed: Wed Sep 11 12:52:35 2013 UTC (11 years, 7 months ago) by belforte
Content type: text/x-python
Branch: MAIN
CVS Tags: CRAB_2_9_1, CRAB_2_9_1_pre2, HEAD
Changes since 1.181: +1 -1 lines
Log Message:
fix twiki reference, see: https://savannah.cern.ch/bugs/?102552

File Contents

# Content
1
2 ###########################################################################
3 #
4 # H E L P F U N C T I O N S
5 #
6 ###########################################################################
7
8 import common
9
10 import sys, os, string
11
12 import tempfile
13
14 ###########################################################################
15 def usage():
16 print 'in usage()'
17 usa_string = common.prog_name + """ [options]
18
19 The most useful general options (use '-h' to get complete help):
20
21 -create -- Create all the jobs.
22 -submit n -- Submit the first n available jobs. Default is all.
23 -status -- check status of all jobs.
24 -getoutput|-get [range] -- get back the output of all jobs: if range is defined, only of selected jobs.
25 -publish -- after the getouput, publish the data user in a local DBS instance.
26 -publishNoInp -- after the getoutput, publish the data user in the local DBS instance removing input data file
27 -checkPublication [dbs_url datasetpath] -- checks if a dataset is published in a DBS.
28 -kill [range] -- kill submitted jobs.
29 -resubmit range or all -- resubmit killed/aborted/retrieved jobs.
30 -forceResubmit range or all -- resubmit jobs regardless to their status.
31 -copyData [range [dest_se or dest_endpoint]] -- copy locally (in crab_working_dir/res dir) or on a remote SE your produced output,
32 already stored on remote SE.
33 -renewCredential -- renew credential on the server.
34 -clean -- gracefully cleanup the directory of a task.
35 -match|-testJdl [range] -- check if resources exist which are compatible with jdl.
36 -report -- print a short report about the task
37 -list [range] -- show technical job details.
38 -postMortem [range] -- provide a file with information useful for post-mortem analysis of the jobs.
39 -printId -- print the SID for all jobs in task
40 -createJdl [range] -- provide files with a complete Job Description (JDL).
41 -validateCfg [fname] -- parse the ParameterSet using the framework's Python API.
42 -cleanCache -- clean SiteDB and CRAB caches.
43 -uploadLog [jobid] -- upload main log files to a central repository
44 -continue|-c [dir] -- Apply command to task stored in [dir].
45 -h [format] -- Detailed help. Formats: man (default), tex, html, txt.
46 -cfg fname -- Configuration file name. Default is 'crab.cfg'.
47 -debug N -- set the verbosity level to N.
48 -v -- Print version and exit.
49
50 "range" has syntax "n,m,l-p" which correspond to [n,m,l,l+1,...,p-1,p] and all possible combination
51
52 Example:
53 crab -create -submit 1
54 """
55 print usa_string
56 sys.exit(2)
57
58 ###########################################################################
59 def help(option='man'):
60 help_string = """
61 =pod
62
63 =head1 NAME
64
65 B<CRAB>: B<C>ms B<R>emote B<A>nalysis B<B>uilder
66
67 """+common.prog_name+""" version: """+common.prog_version_str+"""
68
69 This tool B<must> be used from an User Interface and the user is supposed to
70 have a valid Grid certificate.
71
72 =head1 SYNOPSIS
73
74 B<"""+common.prog_name+"""> [I<options>] [I<command>]
75
76 =head1 DESCRIPTION
77
78 CRAB is a Python program intended to simplify the process of creation and submission of CMS analysis jobs to the Grid environment .
79
80 Parameters for CRAB usage and configuration are provided by the user changing the configuration file B<crab.cfg>.
81
82 CRAB generates scripts and additional data files for each job. The produced scripts are submitted directly to the Grid. CRAB makes use of BossLite to interface to the Grid scheduler, as well as for logging and bookkeeping.
83
84 CRAB supports any CMSSW based executable, with any modules/libraries, including user provided ones, and deals with the output produced by the executable. CRAB provides an interface to CMS data discovery services (DBS and DLS), which are completely hidden to the final user. It also splits a task (such as analyzing a whole dataset) into smaller jobs, according to user requirements.
85
86 CRAB can be used in two ways: StandAlone and with a Server.
87 The StandAlone mode is suited for small task, of the order of O(100) jobs: it submits the jobs directly to the scheduler, and these jobs are under user responsibility.
88 In the Server mode, suited for larger tasks, the jobs are prepared locally and then passed to a dedicated CRAB server, which then interacts with the scheduler on behalf of the user, including additional services, such as automatic resubmission, status caching, output retrieval, and more.
89 The CRAB commands are exactly the same in both cases.
90
91 CRAB web page is available at
92
93 I<https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideCrab>
94
95 =head1 HOW TO RUN CRAB FOR THE IMPATIENT USER
96
97 Please, read all the way through in any case!
98
99 Source B<crab.(c)sh> from the CRAB installation area, which have been setup either by you or by someone else for you.
100
101 Modify the CRAB configuration file B<crab.cfg> according to your need: see below for a complete list. A template and commented B<crab.cfg> can be found on B<$CRABDIR/python/full_crab.cfg> (detailed cfg) and B<$CRABDIR/python/minimal_crab.cfg> (only basic parameters)
102
103 ~>crab -create
104 create all jobs (no submission!)
105
106 ~>crab -submit 2 -continue [ui_working_dir]
107 submit 2 jobs, the ones already created (-continue)
108
109 ~>crab -create -submit 2
110 create _and_ submit 2 jobs
111
112 ~>crab -status
113 check the status of all jobs
114
115 ~>crab -getoutput
116 get back the output of all jobs
117
118 ~>crab -publish
119 publish all user outputs in the DBS specified in the crab.cfg (dbs_url_for_publication) or written as argument of this option
120
121 =head1 RUNNING CMSSW WITH CRAB
122
123 =over 4
124
125 =item B<A)>
126
127 Develop your code in your CMSSW working area. Do anything which is needed to run interactively your executable, including the setup of run time environment (I<cmsenv>), a suitable I<ParameterSet>, etc. It seems silly, but B<be extra sure that you actually did compile your code> I<scram b>.
128
129 =item B<B)>
130
131 Source B<crab.(c)sh> from the CRAB installation area, which have been setup either by you or by someone else for you. Modify the CRAB configuration file B<crab.cfg> according to your need: see below for a complete list.
132
133 The most important parameters are the following (see below for complete description of each parameter):
134
135 =item B<Mandatory!>
136
137 =over 6
138
139 =item B<[CMSSW]> section: datasetpath, pset, splitting parameters, output_file
140
141 =item B<[USER]> section: output handling parameters, such as return_data, copy_data etc...
142
143 =back
144
145 =item B<Run it!>
146
147 You must have a valid voms-enabled Grid proxy. See CRAB web page for details.
148
149 =back
150
151 =head1 RUNNING MULTICRAB
152
153 MultiCRAB is a CRAB extension to submit the same job to multiple datasets in one go.
154
155 The use case for multicrab is when you have your analysis code that you want to run on several datasets, typically some signals plus some backgrounds (for MC studies)
156 or on different streams/configuration/runs for real data taking. You want to run exactly the same code, and also the crab.cfg are different only for few keys:
157 for sure datasetpath but also other keys, such as eg total_number_of_events, in case you want to run on all signals but only a fraction of background, or anything else.
158 So far, you would have to create a set of crab.cfg, one for each dataset you want to access, and submit several instances of CRAB, saving the output to different locations.
159 Multicrab is meant to automatize this procedure.
160 In addition to the usual crab.cfg, there is a new configuration file called multicrab.cfg. The syntax is very similar to that of crab.cfg, namely
161 [SECTION] <crab.cfg Section>.Key=Value
162
163 Please note that it is mandatory to add explicitly the crab.cfg [SECTION] in front of [KEY].
164 The role of multicrab.cfg is to apply modification to the template crab.cfg, some which are common to all tasks, and some which are task specific.
165
166 =head2 So there are two sections:
167
168 =over 2
169
170 =item B<[COMMON]>
171
172 section: which applies to all task, and which is fully equivalent to modify directly the template crab.cfg
173
174 =item B<[DATASET]>
175
176 section: there could be an arbitrary number of sections, one for each dataset you want to run. The names are free (but COMMON and MULTICRAB), and they will be used as ui_working_dir for the task as well as an appendix to the user_remote_dir in case of output copy to remote SE. So, the task corresponding to section, say [SIGNAL] will be placed in directory SIGNAL, and the output will be put on /SIGNAL/, so SIGNAL will be added as last subdir in the user_remote_dir.
177
178 =back
179
180 For further details please visit
181
182 I<https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideMultiCrab>
183
184 =head1 HOW TO RUN ON CONDOR-G
185
186 The B<Condor-G> mode for B<CRAB> is a special submission mode next to the standard Resource Broker submission. It is designed to submit jobs directly to a site and not using the Resource Broker.
187
188 Due to the nature of B<Condor-G> submission, the B<Condor-G> mode is restricted to OSG sites within the CMS Grid, currently the 7 US T2: Florida(ufl.edu), Nebraska(unl.edu), San Diego(ucsd.edu), Purdue(purdue.edu), Wisconsin(wisc.edu), Caltech(ultralight.org), MIT(mit.edu).
189
190 =head2 B<Requirements:>
191
192 =over 2
193
194 =item installed and running local Condor scheduler
195
196 (either installed by the local Sysadmin or self-installed using the VDT user interface: http://www.uscms.org/SoftwareComputing/UserComputing/Tutorials/vdt.html)
197
198 =item locally available LCG or OSG UI installation
199
200 for authentication via Grid certificate proxies ("voms-proxy-init -voms cms" should result in valid proxy)
201
202 =item set the environment variable GRID_WL_LOCATION to the edg directory of the local LCG or OSG UI installation
203
204 =back
205
206 =head2 B<What the Condor-G mode can do:>
207
208 =over 2
209
210 =item submission directly to multiple OSG sites,
211
212 the requested dataset must be published correctly by the site in the local and global services.
213 Previous restrictions on submitting only to a single site have been removed. SE and CE whitelisting
214 and blacklisting work as in the other modes.
215
216 =back
217
218 =head2 B<What the Condor-G mode cannot do:>
219
220 =over 2
221
222 =item submit jobs if no condor scheduler is running on the submission machine
223
224 =item submit jobs if the local condor installation does not provide Condor-G capabilities
225
226 =item submit jobs to an LCG site
227
228 =item support Grid certificate proxy renewal via the myproxy service
229
230 =back
231
232 =head2 B<CRAB configuration for Condor-G mode:>
233
234 The CRAB configuration for the Condor-G mode only requires one change in crab.cfg:
235
236 =over 2
237
238 =item select condor_g Scheduler:
239
240 scheduler = condor_g
241
242 =back
243
244
245 =head1 HOW TO RUN ON NORDUGRID ARC
246
247 The ARC scheduler can be used to submit jobs to sites running the NorduGrid
248 ARC grid middleware. To use it you need to have the ARC client
249 installed.
250
251 =head2 B<CRAB configuration for ARC mode:>
252
253 The ARC scheduler requires some changes to crab.cfg:
254
255 =over 2
256
257 =item B<scheduler:>
258
259 Select the ARC scheduler:
260 scheduler = arc
261
262 =item B<requirements>, B<additional_jdl_parameters:>
263
264 Use xrsl code instead of jdl for these parameters.
265
266 =item B<max_cpu_time>, B<max_wall_clock_time:>
267
268 When using ARC scheduler, for parameters max_cpu_time and max_wall_clock_time,
269 you can use units, e.g. "72 hours" or "3 days", just like with the xrsl attributes
270 cpuTime and wallTime. If no unit is given, minutes is assumed by default.
271
272 =back
273
274 =head2 B<CRAB Commands:>
275
276 Most CRAB commands behave approximately the same with the ARC scheduler, with only some minor differences:
277
278 =over 2
279
280 =item B<*> B<-printJdl|-createJdl> will print xrsl code instead of jdl.
281
282 =back
283
284
285
286
287 =head1 COMMANDS
288
289 =head2 B<-create>
290
291 Create the jobs: from version 1_3_0 it is only possible to create all jobs.
292 The maximum number of jobs depends on dataset and splitting directives. This set of identical jobs accessing the same dataset are defined as a task.
293 This command create a directory with default name is I<crab_0_date_time> (can be changed via ui_working_dir parameter, see below). Inside this directory it is placed whatever is needed to submit your jobs. Also the output of your jobs (once finished) will be place there (see after). Do not cancel by hand this directory: rather use -clean (see).
294 See also I<-continue>.
295
296 =head2 B<-submit [range]>
297
298 Submit n jobs: 'n' is either a positive integer or 'all' or a [range]. The default is all.
299 If 'n' is passed as an argument, the first 'n' suitable jobs will be submitted. Please note that this is behaviour is different from other commands, where -command N means act the command to the job N, and not to the first N jobs. If a [range] is passed, the selected jobs will be submitted. In order to only submit job number M use this syntax (note the trailing comma): I<crab -submit M,>
300
301 This option may be used in conjunction with -create (to create and submit immediately) or with -continue (which is assumed by default) to submit previously created jobs. Failure to do so will stop CRAB and generate an error message. See also I<-continue>.
302
303 =head2 B<-continue [dir] | -c [dir]>
304
305 Apply the action on the task stored in directory [dir]. If the task directory is the standard one (crab_0_date_time), the most recent in time is assumed. Any other directory must be specified.
306 Basically all commands (except -create) need -continue, so it is automatically assumed. Of course, the standard task directory is used in this case.
307
308 =head2 B<-status [options]>
309
310 Check the status of all jobs. With the server, the full status, including application and wrapper exit codes, is available as soon as a job end. In StandAlone mode it is necessary to retrieve (crab -get) the job output first to obtain the exit codes. The status is printed on the console as a table with 7 columns: ID (identifier in the task), END (job completed or not. Crab server resubmit failed jobs, therefore: N=server is still working on this job, Y=server has done and status will not change anymore), STATUS (the job status), ACTION (some additional status info useful for experts), ExeExitCode (exit code from cmsRun, if not zero it means cmsRun failed), JobExitCode (the exit code assigned by Crab and reported by dashboard), E_HOST (the CE where the job executed). A list of comma separated options can be passed to -status (which do not accept a range). The option implmented are: I<-status short> which skip the detailed job-per-job status, printing only the summary; I<-status color> which add some coloring to the summary status. The color code is the following: Green for successfully finished jobs, Red for jobs which ended unsuccessfully, Blue for jobs done but not retireved, yellow for jobs still to be submitted, default color for all other jobs, namely those running or pending on the grid. The color will be used only if the output stream is capable of accepting it. The two options can coexist I<-status short,color>.
311
312 =head2 B<-getoutput|-get [range]>
313
314 Retrieve the output declared by the user via the output sandbox. By default the output will be put in task working dir under I<res> subdirectory. This can be changed via config parameters. B<Be extra sure that you have enough free space>. From version 2_3_x, the available free space is checked in advance. See I<range> below for syntax.
315
316 =head2 B<-publish>
317
318 Publish user output in a local DBS instance after the retrieval of output. By default publish uses the dbs_url_for_publication specified in the crab.cfg file, otherwise you can supply it as an argument of this option.
319 Warnings about publication:
320
321 CRAB publishes only EDM files (in the FJR they are written in the tag <File>)
322
323 CRAB publishes in the same USER dataset more EDM files if they are produced by a job and written in the tag <File> of FJR.
324
325 It is not possible for the user to select only one file to publish, nor to publish two files in two different USER datasets.
326
327
328 =head2 B<-checkPublication [-USER.dbs_url_for_publication=dbs_url -USER.dataset_to_check=datasetpath -debug]>
329
330 Check if a dataset is published in a DBS. This option is automaticaly called at the end of the publication step, but it can be also used as a standalone command. By default it reads the parameters (USER.dbs_url_for_publication and USER.dataset_to_check) in your crab.cfg. You can overwrite the defaults in crab.cfg by passing these parameters as option. Using the -debug option, you will get detailed info about the files of published blocks.
331
332 =head2 B<-publishNoInp>
333
334 To be used only if you know why and you are of sure what you are doing, or if crab support persons told you to use it.It is meant for situations where crab -publish fails because framework job report xml file contains input files not present in DBS. It will publish the dataset anyhow, while marking it as Unknown Provenace to indicate that parentage information is partial. Those dataset will not be accepted for promotion to Global Scope DBS. In all other respects this works as crab -publish
335
336 =head2 B<-resubmit range or all>
337
338 Resubmit jobs which have been previously submitted and have been either I<killed> or are I<aborted>. See I<range> below for syntax. Also possible with key I<bad>, which will resubmit all jobs in I<killed> or I<aborted> or I<failed submission> or I<retrieved> but with exit status not 0 (with the exception for wrapper exit status equal 60307).
339
340 =head2 B<-forceResubmit range or all>
341
342 iSame as -resubmit but without any check about the actual status of the job: please use with caution, you can have problem if both the original job and the resubmitted ones actually run and tries to write the output ona a SE. This command is meant to be used if the killing is not possible or not working but you know that the job failed or will. See I<range> below for syntax.
343
344 =head2 B<-kill [range]>
345
346 Kill (cancel) jobs which have been submitted to the scheduler. A range B<must> be used in all cases, no default value is set.
347
348 =head2 B<-copyData [range -dest_se=the official SE name or -dest_endpoint=the complete endpoint of the remote SE]>
349
350 Option that can be used only if your output have been previously copied by CRAB on a remote SE.
351 By default the copyData copies your output from the remote SE locally on the current CRAB working directory (under res). Otherwise you can copy the output from the remote SE to another one, specifying either -dest_se=<the remote SE official name> or -dest_endpoint=<the complete endpoint of remote SE>. If dest_se is used, CRAB finds the correct path where the output can be stored.
352
353 Example: crab -copyData --> output copied to crab_working_dir/res directory
354 crab -copyData -dest_se=T2_IT_Legnaro --> output copied to the legnaro SE, directory discovered by CRAB
355 crab -copyData -dest_endpoint=srm://<se_name>:8443/xxx/yyyy/zzzz --> output copied to the se <se_name> under
356 /xxx/yyyy/zzzz directory.
357
358 =head2 B<-renewCredential >
359
360 If using the server modality, this command allows to delegate a valid credential (proxy/token) to the server associated with the task.
361
362 =head2 B<-match|-testJdl [range]>
363
364 Check if the job can find compatible resources. It is equivalent of doing I<glite-wms-job-list-match> on edg.
365
366 =head2 B<-printId>
367
368 Just print the Scheduler Job Identifierb (Grid job identifier e.g.) of the jobs in the task.
369
370 =head2 B<-createJdl [range]>
371
372 Collect the full Job Description in a file located under share directory. The file base name is File- .
373
374 =head2 B<-postMortem [range]>
375
376 Try to collect more information of the job from the scheduler point of view.
377 And this is the only way to obtain info about failure reason of aborted jobs.
378
379 =head2 B<-list [range]>
380
381 Dump technical information about jobs: for developers only.
382
383 =head2 B<-report>
384
385 Print a short report about the task, namely the total number of events and files processed/requested/available, the name of the dataset path, a summary of the status of the jobs, and so on. A summary file of the runs and luminosity sections processed is written to res subdirecttory as lumiSummary.json and can be used as input to tools that compute the luminosity like lumiCalc.py. In the same subdirectory also a file containing all the input runs and lumis, called InputLumiSummaryOfTask.json and the file containing the missing runs and lumis due to failed jobs, called missingLumiSummary.json are produced. The missingLumiSummary.json can be use as lumimask file to create a new task in order to analyse the missing data (instead of failure jobs resubmission).
386
387 =head2 B<-clean [dir]>
388
389 Clean up (i.e. erase) the task working directory after a check whether there are still running jobs. In case, you are notified and asked to kill them or retrieve their output. B<Warning> this will possibly delete also the output produced by the task (if any)!
390
391 =head2 B<-cleanCache>
392
393 Clean up (i.e. erase) the SiteDb and CRAB cache content.
394
395 =head2 B<-uploadLog [jobid]>
396
397 Upload main log files to a central repository. It prints a link to be forwared to supporting people (eg: crab feedback hypernews).
398
399 It can optionally take a job id as input. It does not allow job ranges/lists.
400
401 Uploaded files are: crab.log, crab.cfg, job logging info, summary file and a metadata file.
402 If you specify the jobid, also the job standard output and fjr will be uploaded. Warning: in this case you need to run the getoutput before!!
403 In the case of aborted jobs you have to upload the postMortem file too, creating it with crab -postMortem jobid and then uploading files specifying the jobid number.
404
405 =head2 B<-validateCfg [fname]>
406
407 Parse the ParameterSet using the framework\'s Python API in order to perform a sanity check of the CMSSW configuration file.
408 You have to create your task with crab -create and then to validate the config file with crab -validateCfg.
409
410 =head2 B<-help [format] | -h [format]>
411
412 This help. It can be produced in three different I<format>: I<man> (default), I<tex> and I<html>.
413
414 =head2 B<-v>
415
416 Print the version and exit.
417
418 =head2 B<range>
419
420 The range to be used in many of the above commands has the following syntax. It is a comma separated list of jobs ranges, each of which may be a job number, or a job range of the form first-last.
421 Example: 1,3-5,8 = {1,3,4,5,8}
422
423 =head1 OPTIONS
424
425 =head2 B<-cfg [file]>
426
427 Configuration file name. Default is B<crab.cfg>.
428
429 =head2 B<-debug [level]>
430
431 Set the debug level: high number for high verbosity.
432
433 =head1 CONFIGURATION PARAMETERS
434
435 All the parameter describe in this section can be defined in the CRAB configuration file. The configuration file has different sections: [CRAB], [USER], etc. Each parameter must be defined in its proper section. An alternative way to pass a config parameter to CRAB is via command line interface; the syntax is: crab -SECTION.key value . For example I<crab -USER.outputdir MyDirWithFullPath> .
436 The parameters passed to CRAB at the creation step are stored, so they cannot be changed by changing the original crab.cfg . On the other hand the task is protected from any accidental change. If you want to change any parameters, this require the creation of a new task.
437 Mandatory parameters are flagged with a *.
438
439 =head2 B<[CRAB]>
440
441 =head3 B<jobtype *>
442
443 The type of the job to be executed: I<cmssw> jobtypes are supported. No default value.
444
445 =head3 B<scheduler *>
446 The scheduler to be used: <glite> or I<condor_g> (see specific paragraph) Grid schedulers to be used with glite or osg middleware. In addition, there's an I<arc> scheduler to be used with the NorduGrid ARC middleware.
447 From version 210, also local scheduler are supported, for the time being only at CERN. I<LSF> is the standard CERN local scheduler or I<CAF> which is LSF dedicated to CERN Analysis Facilities. I<condor> is the scheduler to submit jobs to US LPC CAF. No default value.
448
449 =head3 B<use_server>
450
451 To use the server for job handling (recommended) 0=no (default), 1=true. The server to be used will be found automatically from a list of available ones: it can also be specified explicitly by using I<server_name> (see below). The server usage is compulsory for task with a number of created jobs > 500. Default value = 0.
452
453 =head3 B<server_name>
454
455 To use the CRAB-server support it is needed to fill this key with server name as <Server_DOMAIN> (e.g. cnaf,fnal). If this is set, I<use_server> is set to true automatically.
456 If I<server_name=None> crab works in standalone way, same as using I<use_server=0> and no I<server_name>.
457 The server available to users can be found from CRAB web page. No default value.
458
459 =head2 B<[CMSSW]>
460
461 =head3 B<datasetpath *>
462
463 The path of the processed or analysis dataset as defined in DBS. It comes with the format I</PrimaryDataset/DataTier/Process[/OptionalADS]>. If no input is needed I<None> must be specified. When running on an analysis dataset, the job splitting must be specified by luminosity block rather than event. Analysis datasets are only treated accurately on a lumi-by-lumi level with CMSSW 3_1_x and later. No default value.
464
465 =head3 B<runselection *>
466
467 Within a dataset you can restrict to run on a specific run number or run number range. For example runselection=XYZ or runselection=XYZ1-XYZ2 . Run number range will include both run XYZ1 and XYZ2. Combining runselection with a lumi_mask runs on the intersection of the two lists. No default value
468
469 =head3 B<use_parent>
470
471 Within a dataset you can ask to run over the related parent files too. E.g., this will give you access to the RAW data while running over a RECO sample. Setting use_parent=1 CRAB determines the parent files from DBS and will add secondaryFileNames = cms.untracked.vstring( <LIST of parent FIles> ) to the pool source section of your parameter set.
472 This setting is supposed to works both with Splitting by Lumis and Splitting by Events. Default value = 0.
473
474 =head3 B<pset *>
475
476 The python ParameterSet to be used. No default value.
477
478 =head3 B<pycfg_params *>
479
480 These parameters are passed to the python config file, as explained in https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideAboutPythonConfigFile#Passing_Command_Line_Arguments_T
481
482 =head3 B<lumi_mask>
483
484 The filename of a JSON file that describes which runs and lumis to process. CRAB will skip luminosity blocks not listed in the file. When using this setting, you must also use the split by lumi settings rather than split by event as described below. Combining runselection with a lumi_mask runs on the intersection of the two lists. No default value.
485
486 =head3 B<Splitting jobs by Lumi>
487
488 =over 4
489
490 =item B<NOTE: Exactly two of these three parameters must be used: total_number_of_lumis, lumis_per_job, number_of_jobs.> Split by lumi (or by run, explained below) is required for real data. Because jobs in split by lumi mode process entire rather than partial files, you will often end up with fewer jobs processing more lumis than you are expecting. Additionally, a single job cannot analyze files from multiple blocks in DBS. All job splitting parameters in split by lumi mode are "advice" to CRAB rather than determinative.
491
492 =back
493
494 =head4 B<total_number_of_lumis *>
495
496 The number of luminosity blocks to be processed. -1 for processing a whole dataset. Your task will process this many lumis regardless of how the jobs are actually split up. If you do not specify this, the total number of lumis processed will be number_of_jobs x lumis_per_job. No default value
497
498 =head4 B<lumis_per_job *>
499
500 The number of luminosity blocks to be accessed by each job. Since a job cannot access less than a whole file, it may be that the actual number of lumis per job is more than you asked for. NO default value
501
502 =head4 B<number_of_jobs *>
503
504 Define the number of jobs to be run for the task. This parameter is common between split by lumi and split by event modes. In split by lumi mode, the number of jobs will only approximately match this value. No default value
505
506 =head3 B<Splitting jobs by Event>
507
508 =over 4
509
510 =item B<NOTE: Exactly two of these three parameters must be used: total_number_of_events, events_per_job, number_of_jobs.> Otherwise CRAB will complain. Only MC data can be split by event. No default value
511
512 =back
513
514 =head4 B<total_number_of_events *>
515
516 The number of events to be processed. To access all available events, use I<-1>. Of course, the latter option is not viable in case of no input. In this case, the total number of events will be used to split the task in jobs, together with I<events_per_job>. No default value.
517
518 =head4 B<events_per_job*>
519
520 The number of events to be accessed by each job. Since a job cannot cross the boundary of a fileblock it might be that the actual number of events per job is not exactly what you asked for. It can be used also with no input. No default value.
521
522 =head4 B<number_of_jobs *>
523
524 Define the number of jobs to be run for the task. The number of events for each job is computed taking into account the total number of events required as well as the granularity of EventCollections. Can be used also with No input. No default value.
525
526 =head4 B<split_by_event *>
527
528 This setting is for experts only. If you don't know why you want to use it, you don't want to use it. Set the value to 1 to enabe split by event on data. CRAB then behaves like old versions of CRAB which did not enforce split by lumi for data. Default value = 0.
529
530 =head3 B<split_by_run>
531
532 To activate the split run based (each job will access a different run) use I<split_by_run>=1. You can also define I<number_of_jobs> and/or I<runselection>. NOTE: the Run Based combined with Event Based split is not available. Default value = 0.
533
534 =head3 B<output_file *>
535
536 The output files produced by your application (comma separated list). From CRAB 2_2_2 onward, if TFileService is defined in user Pset, the corresponding output file is automatically added to the list of output files. User can avoid this by setting B<skip_TFileService_output> = 1 (default is 0 == file included). The Edm output produced via PoolOutputModule can be automatically added by setting B<get_edm_output> = 1 (default is 0 == no). B<warning> it is not allowed to have a PoolOutputSource and not save it somewhere, since it is a waste of resource on the WN. In case you really want to do that, and if you really know what you are doing (hint: you dont!) you can user I<ignore_edm_output=1>. No default value.
537
538 =head3 B<skip_TFileService_output>
539
540 Force CRAB to skip the inclusion of file produced by TFileService to list of output files. Default value = 0, namely the file is included.
541
542 =head3 B<get_edm_output>
543
544 Force CRAB to add the EDM output file, as defined in PSET in PoolOutputModule (if any) to be added to the list of output files. Default value = 0 (== no inclusion)
545
546 =head3 B<increment_seeds>
547
548 Specifies a comma separated list of seeds to increment from job to job. The initial value is taken
549 from the CMSSW config file. I<increment_seeds=sourceSeed,g4SimHits> will set sourceSeed=11,12,13 and g4SimHits=21,22,23 on
550 subsequent jobs if the values of the two seeds are 10 and 20 in the CMSSW config file.
551
552 See also I<preserve_seeds>. Seeds not listed in I<increment_seeds> or I<preserve_seeds> are randomly set for each job.
553
554 =head3 B<preserve_seeds>
555
556 Specifies a comma separated list of seeds to which CRAB will not change from their values in the user
557 CMSSW config file. I<preserve_seeds=sourceSeed,g4SimHits> will leave the Pythia and GEANT seeds the same for every job.
558
559 See also I<increment_seeds>. Seeds not listed in I<increment_seeds> or I<preserve_seeds> are randomly set for each job.
560
561 =head3 B<first_lumi>
562
563 Relevant only for Monte Carlo production for which it defaults to 1. The first job will generate events with this lumi block number, subsequent jobs will
564 increment the lumi block number. Setting this number to 0 (not recommend) means CMSSW will not be able to read multiple such files as they
565 will all have the same run, lumi and event numbers. This check in CMSSW can be bypassed by setting
566 I<process.source.duplicateCheckMode = cms.untracked.string('noDuplicateCheck')> in the input source, should you need to
567 read files produced without setting first_run (in old versions of CRAB) or first_lumi. Default value = 1.
568
569 =head3 B<generator>
570
571 Name of the generator your MC job is using. Some generators require CRAB to skip events, others do not.
572 Possible values are pythia (default), comphep, lhe, and madgraph. This will skip events in your generator input file.
573
574 =head3 B<executable>
575
576 The name of the executable to be run on remote WN. The default is cmsrun. The executable is either to be found on the release area of the WN, or has been built on user working area on the UI and is (automatically) shipped to WN. If you want to run a script (which might internally call I<cmsrun>, use B<USER.script_exe> instead. Default value = cmsRun.
577
578 =head3 I<DBS and DLS parameters:>
579
580 =head3 B<dbs_url>
581
582 The URL of the DBS query page. For expert only. Default value the global DBS http://cmsdbsprod.cern.ch/cms_dbs_prod_global/servlet/DBSServlet
583
584 =head3 B<show_prod>
585
586 To enable CRAB to show data hosted on Tier1s sites specify I<show_prod> = 1. By default those data are masked.
587
588 =head3 B<subscribed>
589
590 By setting the flag I<subscribed> = 1 only the replicas that are subscribed to its site are considered.The default is to return all replicas. The intended use of this flag is to avoid sending jobs to sites based on data that is being moved or deleted (and thus not subscribed).
591
592 =head3 B<no_block_boundary>
593
594 To remove fileblock boundaries in job splitting specify I<no_block_boundary> = 1. Default value = 0.
595
596 =head2 B<[USER]>
597
598 =head3 B<additional_input_files>
599
600 Any additional input file you want to ship to WN: comma separated list. IMPORTANT NOTE: they will be placed in the WN working dir, and not in ${CMS_SEARCH_PATH}. Specific files required by CMSSW application must be placed in the local data directory ($CMSSW_BASE/src/data), which will be automatically shipped by CRAB itself, without specifying them as additional_input_files. You do not need to specify the I<ParameterSet> you are using, which will be included automatically. Wildcards are allowed. No default value.
601
602 =head3 B<script_exe>
603
604 A user script that will be run on WN (instead of default cmsRun). It is up to the user to setup properly the script itself to run on WN enviroment. CRAB guarantees that the CMSSW environment is setup (e.g. scram is in the path) and that the modified pset.py will be placed in the working directory, with name pset.py . The user must ensure that a properly name job report will be written, this can be done e.g. by calling cmsRun within the script as "cmsRun -j $RUNTIME_AREA/crab_fjr_$NJob.xml -p pset.py". The script itself will be added automatically to the input sandbox so user MUST NOT add it within the B<USER.additional_input_files>.
605 Arguments: CRAB does automatically pass the job index as the first argument of script_exe.
606 The MaxEvents number is set by CRAB in the environment variable "$MaxEvents". So the script can reads this value directly from there. No default value.
607
608 =head3 B<script_arguments>
609
610 Any arguments you want to pass to the B<USER.script_exe>: comma separated list.
611 CRAB does automatically pass the job index as the first argument of script_exe.
612 The MaxEvents number is set by CRAB in the environment variable "$MaxEvents". So the script can read this value directly from there. No default value.
613
614 =head3 B<ui_working_dir>
615
616 Name of the working directory for the current task. By default, a name I<crab_0_(date)_(time)> will be used. If this card is set, any CRAB command which require I<-continue> need to specify also the name of the working directory. A special syntax is also possible, to reuse the name of the dataset provided before: I<ui_working_dir : %(dataset)s> . In this case, if e.g. the dataset is SingleMuon, the ui_working_dir will be set to SingleMuon as well. Default value = crab_0_(date)_(time).
617
618 =head3 B<thresholdLevel>
619
620 This has to be a value between 0 and 100, that indicates the percentage of task completeness (jobs in a ended state are complete, even if failed). The server will notify the user by e-mail (look at the field: B<eMail>) when the task will reach the specified threshold. Works just when using the server. Default value = 100.
621
622 =head3 B<eMail>
623
624 The server will notify the specified e-mail when the task will reaches the specified B<thresholdLevel>. A notification is also sent when the task will reach the 100\% of completeness. This field can also be a list of e-mail: "B<eMail = user1@cern.ch, user2@cern.ch>". Works just when using the server. No default value.
625
626 =head3 B<client>
627
628 Specify the client storage protocol that can be used to interact with the server in B<CRAB.server_name>. The default is the value in the server configuration.
629
630 =head3 B<return_data *>
631
632 The output produced by the executable on WN is returned (via output sandbox) to the UI, by issuing the I<-getoutput> command. B<Warning>: this option should be used only for I<small> output, say less than 10MB, since the sandbox cannot accommodate big files. Depending on Resource Broker used, a size limit on output sandbox can be applied: bigger files will be truncated. To be used in alternative to I<copy_data>. Default value = 0.
633
634 =head3 B<outputdir>
635
636 To be used together with I<return_data>. Directory on user interface where to store the output. Full path is mandatory, "~/" is not allowed: the default location of returned output is ui_working_dir/res . BEWARE: does not work with scheduler=CAF
637
638 =head3 B<logdir>
639
640 To be used together with I<return_data>. Directory on user interface where to store the standard output and error. Full path is mandatory, "~/" is not allowed: the default location of returned output is ui_working_dir/res .
641
642 =head3 B<copy_data *>
643
644 The output (only the file produced by the analysis executable, not the std-out and err) is copied to a Storage Element of your choice (see below). To be used as an alternative to I<return_data> and recommended in case of large output. Default value = 0.
645
646 =head3 B<storage_element>
647
648 To be used with <copy_data>=1
649 If you want to copy the output of your analysis in a official CMS Tier2 or Tier3, you have to write the CMS Site Name of the site, e.g. as written in SiteDB at https://cmsweb.cern.ch/sitedb/prod/sites (i.e T2_IT_legnaro). You have also to specify the <remote_dir>(see below)
650
651 If you want to copy the output in a not_official_CMS remote site you have to specify the complete storage element name (i.e se.xxx.infn.it).You have also to specify the <storage_path> and the <storage_port> if you do not use the default one(see below). No default value.
652
653 =head3 B<user_remote_dir>
654
655 To be used with <copy_data>=1 and <storage_element> official CMS sites.
656 This is the directory or tree of directories where your output will be stored. This directory will be created under the mountpoint ( which will be discover by CRAB if an official CMS storage Element has been used, or taken from the crab.cfg as specified by the user). B<NOTE> This part of the path will be used as logical file name of your files in the case of publication without using an official CMS storage Element. Generally it should start with "/store".
657
658 =head3 B<storage_path>
659
660 To be used with <copy_data>=1 and <storage_element> not official CMS sites.
661 This is the full path of the Storage Element writeable by all, the mountpoint of SE (i.e /srm/managerv2?SFN=/pnfs/se.xxx.infn.it/yyy/zzz/)
662 No default value.
663
664 =head3 B<storage_port>
665
666 To choose the storage port specify I<storage_port> = N. Default value = 8443.
667
668 =head3 B<caf_lfn>
669 Running at CAF, you can decide in which mountpoint to copy your output, by selecting the first part of LFN.
670 The default value is /store/caf/user.
671 To test eos area you can use caf_lfn = /store/eos/user
672
673 =head3 B<local_stage_out *>
674
675 This option enables the local stage out of produced output to the "close storage element" where the job is running, in case of failure of the remote copy to the Storage element decided by the user in che crab.cfg. It has to be used with the copy_data option. In the case of backup copy, the publication of data is forbidden. Set I<local_stage_out> = 1. Default value = 0.
676
677 =head3 B<publish_data*>
678
679 To be used with <copy_data>=1
680 To publish your produced output in a local istance of DBS set publish_data = 1
681 All the details about how to use this functionality are written in https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideCrabForPublication
682 N.B 1) if you are using an official CMS site to stored data, the remote dir will be not considered. The directory where data will be stored is decided by CRAB, following the CMS policy in order to be able to re-read published data.
683 2) if you are using a not official CMS site to store data, you have to check the <lfn>, that will be part of the logical file name of you published files, in order to be able to re-read the data.
684 Default value = 0.
685
686 =head3 B<publish_data_name>
687
688 You produced output will be published in your local DBS with dataset name <primarydataset>/<publish_data_name>/USER. No default value.
689
690 =head3 B<dbs_url_for_publication>
691
692 Specify the URL of your local DBS istance where CRAB has to publish the output files. No default value.
693
694
695 =head3 B<xml_report>
696
697 To be used to switch off the screen report during the status query, enabling the db serialization in a file. Specifying I<xml_report> = FileName CRAB will serialize the DB into CRAB_WORKING_DIR/share/FileName. No default value.
698
699 =head3 B<usenamespace>
700
701 To use the automate namespace definition (perfomed by CRAB) it is possible to set I<usenamespace>=1. The same policy used for the stage out in case of data publication will be applied. Default value = 0.
702
703 =head3 B<debug_wrapper>
704
705 To enable the higer verbose level on wrapper specify I<debug_wrapper> = 1. The Pset contents before and after the CRAB maipulation will be written together with other useful infos. Default value = 0.
706
707 =head3 B<deep_debug>
708
709 To be used in case of unexpected job crash when the sdtout and stderr files are lost. Submitting again the same jobs specifying I<deep_debug> = 1 these files will be reported back. NOTE: it works only on standalone mode for debugging purpose.
710
711 =head3 B<dontCheckSpaceLeft>
712
713 Set it to 1 to skip the check of free space left on your working directory before attempting to get the output back. Default is 0 (=False)
714
715 =head3 B<check_user_remote_dir>
716
717 To avoid stage out failures CRAB checks the remote location content at the creation time. By setting I<check_user_remote_dir>=0 crab will skip the check. Default value = 0.
718
719 =head3 B<tasktype>
720
721 Expert only parameter. Not to be used. Default value = analysis.
722
723 =head3 B<ssh_control_persist>
724
725 Expert only parameter. Not to be used. Default value = 3600. Behaves like ControlPersist in ssh_config but time is only supported in seconds.
726
727 =head2 B<[GRID]>
728
729 in square brackets the name of the schedulers this parameter applies to in case it does not apply to all
730
731 =head3 B<RB [glite]>
732
733 Which WMS you want to use instead of the default one, as defined in the configuration file automatically downloaded by CRAB from CMSDOC web page. You can use any other WMS which is available, if you provide the proper configuration files. E.g., for gLite WMS XYZ, you should provide I< 0_GET_glite_wms_XXX.conf> where XXX is the RB value. These files are searched for in the cache dir (~/.cms_crab), and, if not found, on cmsdoc web page. So, if you put your private configuration files in the cache dir, they will be used, even if they are not available on crab web page.
734 Please get in contact with crab team if you wish to provide your WMS as a service to the CMS community.
735
736 =head3 B<role [glite]>
737
738 The role to be set in the VOMS. Beware that simultaneus use of I<role> and I<group> is not supported. See VOMS documentation for more info. No default value.
739
740 =head3 B<group [glite]>
741
742 The group to be set in the VOMS. Beware that simultaneus use of I<role> and I<group> is not supported. See VOMS documentation for more info. No default value.
743
744 =head3 B<dont_check_proxy>
745
746 If you do not want CRAB to check your proxy. The creation of the proxy (with proper length), its delegation to a myproxyserver is your responsibility.
747
748 =head3 B<dont_check_myproxy>
749
750 If you want to to switch off only the proxy renewal set I<dont_check_myproxy>=1. The proxy delegation to a myproxyserver is your responsibility. Default value = 0.
751
752 =head3 B<requirements [glite]>
753
754 Any other requirements to be add to JDL. Must be written in compliance with JDL syntax (see LCG user manual for further info). No requirement on Computing element must be set. No default value.
755
756 =head3 B<additional_jdl_parameters [glite, remoteGlidein]>
757
758 Any other parameters you want to add to jdl file:semicolon separated list, each
759 item in the list must, including the closing ";". No default value.
760 Works both for gLite and remoteGlidein
761
762 =head3 B<wms_service [glite]>
763
764 With this field it is also possible to specify which WMS you want to use (https://hostname:port/pathcode) where "hostname" is WMS name, the "port" generally is 7443 and the "pathcode" should be something like "glite_wms_wmproxy_server". No default value.
765
766 =head3 B<max_cpu_time>
767
768 Maximum CPU time needed to finish one job. It will be used to select a suitable queue on the CE. Time in minutes. Default value = 130.
769
770 =head3 B<max_wall_clock_time>
771
772 Same as previous, but with real time, and not CPU one. No default value.
773
774 =head3 B<max_rss [remoteGlidein]>
775
776 Maximum Resident Set Size (memory) needed for one job. It will be used to select a suitable queue on the CE and to adjust the crab watchdog. Memory need in Mbytes. Default value = 2300
777
778 =head3 B<ce_black_list [glite]>
779
780 All the CE (Computing Element) whose name contains the following strings (comma separated list) will not be considered for submission. Use the dns domain (e.g. fnal, cern, ifae, fzk, cnaf, lnl,....). You may use hostnames or CMS Site names (T2_DE_DESY) or substrings.
781 By default T0 and T1s site are in blacklist.
782
783 =head3 B<ce_white_list[glite]>
784
785 Only the CE (Computing Element) whose name contains the following strings (comma separated list) will be considered for submission. Use the dns domain (e.g. fnal, cern, ifae, fzk, cnaf, lnl,....). You may use hostnames or CMS Site names (T2_DE_DESY) or substrings. Please note that if the selected CE(s) does not contain the data you want to access, no submission can take place.
786
787 =head3 B<se_black_list [glite,glidein,remoteGlidein]>
788
789 All the SE (Storage Element) whose name contains the following strings (comma separated list) will not be considered for submission.It works only if a datasetpath is specified. You may use hostnames or CMS Site names (T2_DE_DESY) or substrings.
790 By default T0 and T1s site are in blacklist.
791
792 =head3 B<se_white_list [glite,glidein,remoteGlidein]>
793
794 Only the SE (Storage Element) whose name contains the following strings (comma separated list) will be considered for submission.It works only if a datasetpath is specified. Please note that if the selected CE(s) does not contain the data you want to access, no submission can take place. You may use hostnames or CMS Site names (T2_DE_DESY) or substrings.
795
796 =head3 B<remove_default_blacklist [glite]>
797
798 CRAB enforce the T1s Computing Eelements Black List. By default it is appended to the user defined I<CE_black_list>. To remove the enforced T1 black lists set I<remove_default_blacklist>=1. Default value = 0.
799
800 =head3 B<data_location_override [remoteGlidein]>
801
802 Overrides the data location list obtained from DLS/PhEDEx with the list of sites indicated. Same syntax as se_white_list. Up to the user to make sure that needed data can be read nevertheless. Note: ONLY WORKS INSIDE crab.cfg at crab -create time, not when issued in the command line as crab -submit -GRID.data_location_override=...
803
804 =head3 B<allow_overflow [remoteGlidein]>
805
806 Tells glidein wether it can overlow this job, i.e. run at another site and access data via xrootds if the sites were data are located are full. Set to 0 to disallow overflow. Default value = 1.
807
808 =head2 B<[LSF]> or B<[CAF]> or B<[PBS]> or B<[SGE]>
809
810 =head3 B<queue>
811
812 The LSF/PBS queue you want to use: if none, the default one will be used. For CAF, the proper queue will be automatically selected.
813
814 =head3 B<resource>
815
816 The resources to be used within a LSF/PBS queue. Again, for CAF, the right one is selected.
817
818 =head3 B<group>
819
820 The physics GROUP which the user belong to ( it is for example PHYS_SUSY etc...). By specifying that the LSF accounting and fair share per sub-group is done properly.
821
822 =head1 FILES
823
824 I<crab> uses a configuration file I<crab.cfg> which contains configuration parameters. This file is written in the INI-style. The default filename can be changed by the I<-cfg> option.
825
826 I<crab> creates by default a working directory 'crab_0_E<lt>dateE<gt>_E<lt>timeE<gt>'
827
828 I<crab> saves all command lines in the file I<crab.history>.
829
830 I<crab> downloads some configuration files from internet and keeps cached copies in ~/.cms_crab and ~/.cms_sitedbcache directories. The location of those caches can be redirected using the enviromental variables CMS_SITEDB_CACHE_DIR and CMS_CRAB_CACHE_DIR
831
832 =head1 HISTORY
833
834 B<CRAB> is a tool for the CMS analysis on the Grid environment. It is based on the ideas from CMSprod, a production tool originally implemented by Nikolai Smirnov.
835
836 =head1 AUTHORS
837
838 """
839 author_string = '\n'
840 for auth in common.prog_authors:
841 #author = auth[0] + ' (' + auth[2] + ')' + ' E<lt>'+auth[1]+'E<gt>,\n'
842 author = auth[0] + ' E<lt>' + auth[1] +'E<gt>,\n'
843 author_string = author_string + author
844 pass
845 help_string = help_string + author_string[:-2] + '.'\
846 """
847
848 =cut
849 """
850
851 pod = tempfile.mktemp()+'.pod'
852 pod_file = open(pod, 'w')
853 pod_file.write(help_string)
854 pod_file.close()
855
856 if option == 'man':
857 man = tempfile.mktemp()
858 pod2man = 'pod2man --center=" " --release=" " '+pod+' >'+man
859 os.system(pod2man)
860 os.system('man '+man)
861 pass
862 elif option == 'tex':
863 fname = common.prog_name+'-v'+common.prog_version_str
864 tex0 = tempfile.mktemp()+'.tex'
865 pod2tex = 'pod2latex -full -out '+tex0+' '+pod
866 os.system(pod2tex)
867 tex = fname+'.tex'
868 tex_old = open(tex0, 'r')
869 tex_new = open(tex, 'w')
870 for s in tex_old.readlines():
871 if string.find(s, '\\begin{document}') >= 0:
872 tex_new.write('\\title{'+common.prog_name+'\\\\'+
873 '(Version '+common.prog_version_str+')}\n')
874 tex_new.write('\\author{\n')
875 for auth in common.prog_authors:
876 tex_new.write(' '+auth[0]+
877 '\\thanks{'+auth[1]+'} \\\\\n')
878 tex_new.write('}\n')
879 tex_new.write('\\date{}\n')
880 elif string.find(s, '\\tableofcontents') >= 0:
881 tex_new.write('\\maketitle\n')
882 continue
883 elif string.find(s, '\\clearpage') >= 0:
884 continue
885 tex_new.write(s)
886 tex_old.close()
887 tex_new.close()
888 print 'See '+tex
889 pass
890 elif option == 'html':
891 fname = common.prog_name+'-v'+common.prog_version_str+'.html'
892 pod2html = 'pod2html --title='+common.prog_name+\
893 ' --infile='+pod+' --outfile='+fname
894 os.system(pod2html)
895 print 'See '+fname
896 pass
897 elif option == 'txt':
898 fname = common.prog_name+'-v'+common.prog_version_str+'.txt'
899 pod2text = 'pod2text '+pod+' '+fname
900 os.system(pod2text)
901 print 'See '+fname
902 pass
903
904 sys.exit(0)