ViewVC Help
View File | Revision Log | Show Annotations | Root Listing
root/cvsroot/COMP/CRAB/python/crab_help.py
Revision: 1.133
Committed: Mon Dec 14 22:33:54 2009 UTC (15 years, 4 months ago) by ewv
Content type: text/x-python
Branch: MAIN
Changes since 1.132: +9 -9 lines
Log Message:
Lumi related changes for 2.7: first_run replaced by first_lumi, defaults to 1 and -report write lumiSummary.json to res/ directory

File Contents

# Content
1
2 ###########################################################################
3 #
4 # H E L P F U N C T I O N S
5 #
6 ###########################################################################
7
8 import common
9
10 import sys, os, string
11
12 import tempfile
13
14 ###########################################################################
15 def usage():
16 print 'in usage()'
17 usa_string = common.prog_name + """ [options]
18
19 The most useful general options (use '-h' to get complete help):
20
21 -create -- Create all the jobs.
22 -submit n -- Submit the first n available jobs. Default is all.
23 -status -- check status of all jobs.
24 -getoutput|-get [range] -- get back the output of all jobs: if range is defined, only of selected jobs.
25 -extend -- Extend an existing task to run on new fileblocks if there.
26 -publish -- after the getouput, publish the data user in a local DBS instance.
27 -checkPublication [dbs_url datasetpath] -- checks if a dataset is published in a DBS.
28 -kill [range] -- kill submitted jobs.
29 -resubmit [range] -- resubmit killed/aborted/retrieved jobs.
30 -forceResubmit [range] -- resubmit jobs regardless to their status.
31 -copyData [range [dest_se or dest_endpoint]] -- copy locally (in crab_working_dir/res dir) or on a remote SE your produced output,
32 already stored on remote SE.
33 -renewCredential -- renew credential on the server.
34 -clean -- gracefully cleanup the directory of a task.
35 -match|-testJdl [range] -- check if resources exist which are compatible with jdl.
36 -report -- print a short report about the task
37 -list [range] -- show technical job details.
38 -postMortem [range] -- provide a file with information useful for post-mortem analysis of the jobs.
39 -printId [range] -- print the job SID or Task Unique ID while using the server.
40 -createJdl [range] -- provide files with a complete Job Description (JDL).
41 -validateCfg [fname] -- parse the ParameterSet using the framework's Python API.
42 -continue|-c [dir] -- Apply command to task stored in [dir].
43 -h [format] -- Detailed help. Formats: man (default), tex, html, txt.
44 -cfg fname -- Configuration file name. Default is 'crab.cfg'.
45 -debug N -- set the verbosity level to N.
46 -v -- Print version and exit.
47
48 "range" has syntax "n,m,l-p" which correspond to [n,m,l,l+1,...,p-1,p] and all possible combination
49
50 Example:
51 crab -create -submit 1
52 """
53 print usa_string
54 sys.exit(2)
55
56 ###########################################################################
57 def help(option='man'):
58 help_string = """
59 =pod
60
61 =head1 NAME
62
63 B<CRAB>: B<C>ms B<R>emote B<A>nalysis B<B>uilder
64
65 """+common.prog_name+""" version: """+common.prog_version_str+"""
66
67 This tool B<must> be used from an User Interface and the user is supposed to
68 have a valid Grid certificate.
69
70 =head1 SYNOPSIS
71
72 B<"""+common.prog_name+"""> [I<options>] [I<command>]
73
74 =head1 DESCRIPTION
75
76 CRAB is a Python program intended to simplify the process of creation and submission of CMS analysis jobs to the Grid environment .
77
78 Parameters for CRAB usage and configuration are provided by the user changing the configuration file B<crab.cfg>.
79
80 CRAB generates scripts and additional data files for each job. The produced scripts are submitted directly to the Grid. CRAB makes use of BossLite to interface to the Grid scheduler, as well as for logging and bookkeeping.
81
82 CRAB supports any CMSSW based executable, with any modules/libraries, including user provided ones, and deals with the output produced by the executable. CRAB provides an interface to CMS data discovery services (DBS and DLS), which are completely hidden to the final user. It also splits a task (such as analyzing a whole dataset) into smaller jobs, according to user requirements.
83
84 CRAB can be used in two ways: StandAlone and with a Server.
85 The StandAlone mode is suited for small task, of the order of O(100) jobs: it submits the jobs directly to the scheduler, and these jobs are under user responsibility.
86 In the Server mode, suited for larger tasks, the jobs are prepared locally and then passed to a dedicated CRAB server, which then interacts with the scheduler on behalf of the user, including additional services, such as automatic resubmission, status caching, output retrieval, and more.
87 The CRAB commands are exactly the same in both cases.
88
89 CRAB web page is available at
90
91 I<https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideCrab>
92
93 =head1 HOW TO RUN CRAB FOR THE IMPATIENT USER
94
95 Please, read all the way through in any case!
96
97 Source B<crab.(c)sh> from the CRAB installation area, which have been setup either by you or by someone else for you.
98
99 Modify the CRAB configuration file B<crab.cfg> according to your need: see below for a complete list. A template and commented B<crab.cfg> can be found on B<$CRABDIR/python/crab.cfg>
100
101 ~>crab -create
102 create all jobs (no submission!)
103
104 ~>crab -submit 2 -continue [ui_working_dir]
105 submit 2 jobs, the ones already created (-continue)
106
107 ~>crab -create -submit 2
108 create _and_ submit 2 jobs
109
110 ~>crab -status
111 check the status of all jobs
112
113 ~>crab -getoutput
114 get back the output of all jobs
115
116 ~>crab -publish
117 publish all user outputs in the DBS specified in the crab.cfg (dbs_url_for_publication) or written as argument of this option
118
119 =head1 RUNNING CMSSW WITH CRAB
120
121 =over 4
122
123 =item B<A)>
124
125 Develop your code in your CMSSW working area. Do anything which is needed to run interactively your executable, including the setup of run time environment (I<eval `scramv1 runtime -sh|csh`>), a suitable I<ParameterSet>, etc. It seems silly, but B<be extra sure that you actually did compile your code> I<scramv1 b>.
126
127 =item B<B)>
128
129 Source B<crab.(c)sh> from the CRAB installation area, which have been setup either by you or by someone else for you. Modify the CRAB configuration file B<crab.cfg> according to your need: see below for a complete list.
130
131 The most important parameters are the following (see below for complete description of each parameter):
132
133 =item B<Mandatory!>
134
135 =over 6
136
137 =item B<[CMSSW]> section: datasetpath, pset, splitting parameters, output_file
138
139 =item B<[USER]> section: output handling parameters, such as return_data, copy_data etc...
140
141 =back
142
143 =item B<Run it!>
144
145 You must have a valid voms-enabled Grid proxy. See CRAB web page for details.
146
147 =back
148
149 =head1 RUNNING MULTICRAB
150
151 MultiCRAB is a CRAB extension to submit the same job to multiple datasets in one go.
152
153 The use case for multicrab is when you have your analysis code that you want to run on several datasets, typically some signals plus some backgrounds (for MC studies)
154 or on different streams/configuration/runs for real data taking. You want to run exactly the same code, and also the crab.cfg are different only for few keys:
155 for sure datasetpath but also other keys, such as eg total_number_of_events, in case you want to run on all signals but only a fraction of background, or anything else.
156 So far, you would have to create a set of crab.cfg, one for each dataset you want to access, and submit several instances of CRAB, saving the output to different locations.
157 Multicrab is meant to automatize this procedure.
158 In addition to the usual crab.cfg, there is a new configuration file called multicrab.cfg. The syntax is very similar to that of crab.cfg, namely
159 [SECTION] <crab.cfg Section>.Key=Value
160
161 Please note that it is mandatory to add explicitly the crab.cfg [SECTION] in front of [KEY].
162 The role of multicrab.cfg is to apply modification to the template crab.cfg, some which are common to all tasks, and some which are task specific.
163
164 =head2 So there are two sections:
165
166 =over 2
167
168 =item B<[COMMON]>
169
170 section: which applies to all task, and which is fully equivalent to modify directly the template crab.cfg
171
172 =item B<[DATASET]>
173
174 section: there could be an arbitrary number of sections, one for each dataset you want to run. The names are free (but COMMON and MULTICRAB), and they will be used as ui_working_dir for the task as well as an appendix to the user_remote_dir in case of output copy to remote SE. So, the task corresponding to section, say [SIGNAL] will be placed in directory SIGNAL, and the output will be put on /SIGNAL/, so SIGNAL will be added as last subdir in the user_remote_dir.
175
176 =back
177
178 For further details please visit
179
180 I<https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideMultiCrab>
181
182 =head1 HOW TO RUN ON CONDOR-G
183
184 The B<Condor-G> mode for B<CRAB> is a special submission mode next to the standard Resource Broker submission. It is designed to submit jobs directly to a site and not using the Resource Broker.
185
186 Due to the nature of B<Condor-G> submission, the B<Condor-G> mode is restricted to OSG sites within the CMS Grid, currently the 7 US T2: Florida(ufl.edu), Nebraska(unl.edu), San Diego(ucsd.edu), Purdue(purdue.edu), Wisconsin(wisc.edu), Caltech(ultralight.org), MIT(mit.edu).
187
188 =head2 B<Requirements:>
189
190 =over 2
191
192 =item installed and running local Condor scheduler
193
194 (either installed by the local Sysadmin or self-installed using the VDT user interface: http://www.uscms.org/SoftwareComputing/UserComputing/Tutorials/vdt.html)
195
196 =item locally available LCG or OSG UI installation
197
198 for authentication via Grid certificate proxies ("voms-proxy-init -voms cms" should result in valid proxy)
199
200 =item set the environment variable GRID_WL_LOCATION to the edg directory of the local LCG or OSG UI installation
201
202 =back
203
204 =head2 B<What the Condor-G mode can do:>
205
206 =over 2
207
208 =item submission directly to multiple OSG sites,
209
210 the requested dataset must be published correctly by the site in the local and global services.
211 Previous restrictions on submitting only to a single site have been removed. SE and CE whitelisting
212 and blacklisting work as in the other modes.
213
214 =back
215
216 =head2 B<What the Condor-G mode cannot do:>
217
218 =over 2
219
220 =item submit jobs if no condor scheduler is running on the submission machine
221
222 =item submit jobs if the local condor installation does not provide Condor-G capabilities
223
224 =item submit jobs to an LCG site
225
226 =item support Grid certificate proxy renewal via the myproxy service
227
228 =back
229
230 =head2 B<CRAB configuration for Condor-G mode:>
231
232 The CRAB configuration for the Condor-G mode only requires one change in crab.cfg:
233
234 =over 2
235
236 =item select condor_g Scheduler:
237
238 scheduler = condor_g
239
240 =back
241
242
243 =head1 HOW TO RUN ON NORDUGRID ARC
244
245 The ARC scheduler can be used to submit jobs to sites running the NorduGrid
246 ARC grid middleware. To use it you'll need to have the ARC client
247 installed.
248
249 =head2 B<CRAB configuration for ARC mode:>
250
251 The ARC scheduler requires some changes to crab.cfg:
252
253 =over 2
254
255 =item B<scheduler:>
256
257 Select the ARC scheduler:
258 scheduler = arc
259
260 =item B<requirements>, B<additional_jdl_parameters:>
261
262 Use xrsl code instead of jdl for these parameters.
263
264 =item B<max_cpu_time>, B<max_wall_clock_time:>
265
266 For parameters max_cpu_time and max_wall_clock_time, you can use
267 units, e.g. "72 hours" or "3 days", just like with the xrsl attributes
268 cpuTime and wallTime. If no unit is given, minutes is assumed by default.
269
270 =back
271
272 =head2 B<CRAB Commands:>
273
274 Most CRAB commands behave approximately the same with the ARC scheduler, with only some minor differences:
275
276 =over 2
277
278 =item B<*> B<-printJdl|-createJdl> will print xrsl code instead of jdl.
279
280 =back
281
282
283
284
285 =head1 COMMANDS
286
287 =over 4
288
289 =item B<-create>
290
291 Create the jobs: from version 1_3_0 it is only possible to create all jobs.
292 The maximum number of jobs depends on dataset and splitting directives. This set of identical jobs accessing the same dataset are defined as a task.
293 This command create a directory with default name is I<crab_0_date_time> (can be changed via ui_working_dir parameter, see below). Inside this directory it is placed whatever is needed to submit your jobs. Also the output of your jobs (once finished) will be place there (see after). Do not cancel by hand this directory: rather use -clean (see).
294 See also I<-continue>.
295
296 =item B<-submit [range]>
297
298 Submit n jobs: 'n' is either a positive integer or 'all' or a [range]. The default is all.
299 If 'n' is passed as an argument, the first 'n' suitable jobs will be submitted. Please note that this is behaviour is different from other commands, where -command N means act the command to the job N, and not to the first N jobs. If a [range] is passed, the selected jobs will be submitted.
300 This option may be used in conjunction with -create (to create and submit immediately) or with -continue (which is assumed by default) to submit previously created jobs. Failure to do so will stop CRAB and generate an error message. See also I<-continue>.
301
302 =item B<-continue [dir] | -c [dir]>
303
304 Apply the action on the task stored in directory [dir]. If the task directory is the standard one (crab_0_date_time), the most recent in time is assumed. Any other directory must be specified.
305 Basically all commands (except -create) need -continue, so it is automatically assumed. Of course, the standard task directory is used in this case.
306
307 =item B<-status [v|verbose]>
308
309 Check the status of the jobs, in all states. With the server, the full status, including application and wrapper exit codes, is available as soon as the jobs end. In StandAlone mode it is necessary to retrieve (-get) the job output first. With B<v|verbose> some more information is displayed.
310
311 =item B<-getoutput|-get [range]>
312
313 Retrieve the output declared by the user via the output sandbox. By default the output will be put in task working dir under I<res> subdirectory. This can be changed via config parameters. B<Be extra sure that you have enough free space>. From version 2_3_x, the available free space is checked in advance. See I<range> below for syntax.
314
315 =item B<-publish>
316
317 Publish user output in a local DBS instance after the retrieval of output. By default publish uses the dbs_url_for_publication specified in the crab.cfg file, otherwise you can supply it as an argument of this option.
318 Warnings about publication:
319
320 CRAB publishes only EDM files (in the FJR they are written in the tag <File>)
321
322 By default the publication of files containing 0 events is desabled. If you want to enable it you have to set the parameter [USER].publish_zero_event=1 in crab.cfg.
323
324 CRAB publishes in the same USER dataset more EDM files if they are produced by a job and written in the tag <File> of FJR.
325
326 It is not possible for the user to select only one file to publish, nor to publish two files in two different USER datasets.
327
328
329 =item B<-checkPublication [-USER.dbs_url_for_publication=dbs_url -USER.dataset_to_check=datasetpath -debug]>
330
331 Check if a dataset is published in a DBS. This option is automaticaly called at the end of the publication step, but it can be also used as a standalone command. By default it reads the parameters (USER.dbs_url_for_publication and USER.dataset_to_check) in your crab.cfg. You can overwrite the defaults in crab.cfg by passing these parameters as option. Using the -debug option, you will get detailed info about the files of published blocks.
332
333 =item B<-resubmit [range]>
334
335 Resubmit jobs which have been previously submitted and have been either I<killed> or are I<aborted>. See I<range> below for syntax.
336
337 =item B<-forceResubmit [range]>
338
339 iSame as -resubmit but without any check about the actual status of the job: please use with caution, you can have problem if both the original job and the resubmitted ones actually run and tries to write the output ona a SE. This command is meant to be used if the killing is not possible or not working but you know that the job failed or will. See I<range> below for syntax.
340
341 =item B<-extend>
342
343 Create new jobs for an existing task, checking if new blocks are published for the given dataset.
344
345 =item B<-kill [range]>
346
347 Kill (cancel) jobs which have been submitted to the scheduler. A range B<must> be used in all cases, no default value is set.
348
349 =item B<-copyData [range -dest_se=the official SE name or -dest_endpoint=the complete endpoint of the remote SE]>
350
351 Option that can be used only if your output have been previously copied by CRAB on a remote SE.
352 By default the copyData copies your output from the remote SE locally on the current CRAB working directory (under res). Otherwise you can copy the output from the remote SE to another one, specifying either -dest_se=<the remote SE official name> or -dest_endpoint=<the complete endpoint of remote SE>. If dest_se is used, CRAB finds the correct path where the output can be stored.
353
354 Example: crab -copyData --> output copied to crab_working_dir/res directory
355 crab -copyData -dest_se=T2_IT_Legnaro --> output copied to the legnaro SE, directory discovered by CRAB
356 crab -copyData -dest_endpoint=srm://<se_name>:8443/xxx/yyyy/zzzz --> output copied to the se <se_name> under
357 /xxx/yyyy/zzzz directory.
358
359 =item B<-renewCredential >
360
361 If using the server modality, this command allows to delegate a valid credential (proxy/token) to the server associated with the task.
362
363 =item B<-match|-testJdl [range]>
364
365 Check if the job can find compatible resources. It is equivalent of doing I<edg-job-list-match> on edg.
366
367 =item B<-printId [range]>
368
369 Just print the job identifier, which can be the SID (Grid job identifier) of the job(s) or the taskId if you are using CRAB with the server or local scheduler Id. If [range] is "full", the the SID of all the jobs are printed, also in the case of submission with server.
370
371 =item B<-createJdl [range]>
372
373 Collect the full Job Description in a file located under share directory. The file base name is File- .
374
375 =item B<-postMortem [range]>
376
377 Try to collect more information of the job from the scheduler point of view.
378
379 =item B<-list [range]>
380
381 Dump technical information about jobs: for developers only.
382
383 =item B<-report>
384
385 Print a short report about the task, namely the total number of events and files processed/requested/available, the name of the datasetpath, a summary of the status of the jobs, the list of runs and lumi sections, and so on. In principle it should contain all the info needed for analysis. Work in progress.
386
387 =item B<-clean [dir]>
388
389 Clean up (i.e. erase) the task working directory after a check whether there are still running jobs. In case, you are notified and asked to kill them or retrieve their output. B<Warning> this will possibly delete also the output produced by the task (if any)!
390
391 =item B<-cleanCache>
392
393 Clean up (i.e. erase) the SiteDb, WMS and CrabServer caches in your submitting directory
394
395 =item B<-help [format] | -h [format]>
396
397 This help. It can be produced in three different I<format>: I<man> (default), I<tex> and I<html>.
398
399 =item B<-v>
400
401 Print the version and exit.
402
403 =item B<range>
404
405 The range to be used in many of the above commands has the following syntax. It is a comma separated list of jobs ranges, each of which may be a job number, or a job range of the form first-last.
406 Example: 1,3-5,8 = {1,3,4,5,8}
407
408 =back
409
410 =head1 OPTION
411
412 =over 4
413
414 =item B<-cfg [file]>
415
416 Configuration file name. Default is B<crab.cfg>.
417
418 =item B<-debug [level]>
419
420 Set the debug level: high number for high verbosity.
421
422 =back
423
424 =head1 CONFIGURATION PARAMETERS
425
426 All the parameter describe in this section can be defined in the CRAB configuration file. The configuration file has different sections: [CRAB], [USER], etc. Each parameter must be defined in its proper section. An alternative way to pass a config parameter to CRAB is via command line interface; the syntax is: crab -SECTION.key value . For example I<crab -USER.outputdir MyDirWithFullPath> .
427 The parameters passed to CRAB at the creation step are stored, so they cannot be changed by changing the original crab.cfg . On the other hand the task is protected from any accidental change. If you want to change any parameters, this require the creation of a new task.
428 Mandatory parameters are flagged with a *.
429
430 B<[CRAB]>
431
432 =over 4
433
434 =item B<jobtype *>
435
436 The type of the job to be executed: I<cmssw> jobtypes are supported
437
438 =item B<scheduler *>
439
440 The scheduler to be used: I<glitecoll> is the more efficient grid scheduler and should be used. Other choice are I<glite>, same as I<glitecoll> but without bulk submission (and so slower) or I<condor_g> (see specific paragraph) or I<edg> which is the former Grid scheduler, which will be dismissed in some future. In addition, there's an I<arc> scheduler to be used with the NorduGrid ARC middleware.
441 From version 210, also local scheduler are supported, for the time being only at CERN. I<LSF> is the standard CERN local scheduler or I<CAF> which is LSF dedicated to CERN Analysis Facilities.
442
443 =item B<use_server>
444
445 To use the server for job handling (recommended) 0=no (default), 1=true. The server to be used will be found automatically from a list of available ones: it can also be specified explicitly by using I<server_name> (see below)
446
447 =item B<server_name>
448
449 To use the CRAB-server support it is needed to fill this key with server name as <Server_DOMAIN> (e.g. cnaf,fnal). If this is set, I<use_server> is set to true automatically.
450 If I<server_name=None> crab works in standalone way, same as using I<use_server=0> and no I<server_name>.
451 The server available to users can be found from CRAB web page.
452
453 =back
454
455 B<[CMSSW]>
456
457 =over 4
458
459 =item B<datasetpath *>
460
461 The path of the processed or analysis dataset as defined in DBS. It comes with the format I</PrimaryDataset/DataTier/Process[/OptionalADS]>. If no input is needed I<None> must be specified. When running on an analysis dataset, the job splitting must be specified by luminosity block rather than event. Analysis datasets are only treated accurately on a lumi-by-lumi level with CMSSW 3_1_x and later.
462
463 =item B<runselection *>
464
465 Within a dataset you can restrict to run on a specific run number or run number range. For example runselection=XYZ or runselection=XYZ1-XYZ2 .
466
467 =item B<use_parent *>
468
469 Within a dataset you can ask to run over the related parent files too. E.g., this will give you access to the RAW data while running over a RECO sample. Setting use_parent=1 CRAB determines the parent files from DBS and will add secondaryFileNames = cms.untracked.vstring( <LIST of parent FIles> ) to the pool source section of your parameter set.
470
471 =item B<pset *>
472
473 The python ParameterSet to be used.
474
475 =item B<pycfg_params *>
476
477 These parameters are passed to the python config file, as explained in https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideAboutPythonConfigFile#Passing_Command_Line_Arguments_T
478
479 =item I<Of the following three parameter exactly two must be used, otherwise CRAB will complain.>
480
481 =item B<total_number_of_events *>
482
483 The number of events to be processed. To access all available events, use I<-1>. Of course, the latter option is not viable in case of no input. In this case, the total number of events will be used to split the task in jobs, together with I<events_per_job>.
484
485 =item B<events_per_job*>
486
487 The number of events to be accessed by each job. Since a job cannot cross the boundary of a fileblock it might be that the actual number of events per job is not exactly what you asked for. It can be used also with no input.
488
489 =item B<total_number_of_lumis *>
490
491 The number of luminosity blocks to be processed. This option is only valid when using analysis datasets. Since a job cannot access less than a whole file, it may be that the actual number of lumis per job is more than you asked for. Two of I<total_number_of_lumis>, I<lumis_per_job>, and I<number_of_jobs> must be supplied to run on an analysis dataset.
492
493 =item B<lumis_per_job*>
494
495 The number of luminosity blocks to be accessed by each job. This option is only valid when using analysis datasets. Since a job cannot access less than a whole file, it may be that the actual number of lumis per job is more than you asked for.
496
497 =item B<number_of_jobs *>
498
499 Define the number of jobs to be run for the task. The number of event for each job is computed taking into account the total number of events required as well as the granularity of EventCollections. Can be used also with No input.
500
501 =item B<split_by_run *>
502
503 To activate the split run based (each job will access a different run) use I<split_by_run>=1. You can also define I<number_of_jobs> and/or I<runselection>. NOTE: the Run Based combined with Event Based split is not yet available.
504
505 =item B<output_file *>
506
507 The output files produced by your application (comma separated list). From CRAB 2_2_2 onward, if TFileService is defined in user Pset, the corresponding output file is automatically added to the list of output files. User can avoid this by setting B<skip_TFileService_output> = 1 (default is 0 == file included). The Edm output produced via PoolOutputModule can be automatically added by setting B<get_edm_output> = 1 (default is 0 == no). B<warning> it is not allowed to have a PoolOutputSource and not save it somewhere, since it is a waste of resource on the WN. In case you really want to do that, and if you really know what you are doing (hint: you dont!) you can user I<ignore_edm_output=1>.
508
509 =item B<skip_TFileService_output>
510
511 Force CRAB to skip the inclusion of file produced by TFileService to list of output files. Default is I<0>, namely the file is included.
512
513 =item B<get_edm_output>
514
515 Force CRAB to add the EDM output file, as defined in PSET in PoolOutputModule (if any) to be added to the list of output files. Default is 0 (== no inclusion)
516
517 =item B<increment_seeds>
518
519 Specifies a comma separated list of seeds to increment from job to job. The initial value is taken
520 from the CMSSW config file. I<increment_seeds=sourceSeed,g4SimHits> will set sourceSeed=11,12,13 and g4SimHits=21,22,23 on
521 subsequent jobs if the values of the two seeds are 10 and 20 in the CMSSW config file.
522
523 See also I<preserve_seeds>. Seeds not listed in I<increment_seeds> or I<preserve_seeds> are randomly set for each job.
524
525 =item B<preserve_seeds>
526
527 Specifies a comma separated list of seeds to which CRAB will not change from their values in the user
528 CMSSW config file. I<preserve_seeds=sourceSeed,g4SimHits> will leave the Pythia and GEANT seeds the same for every job.
529
530 See also I<increment_seeds>. Seeds not listed in I<increment_seeds> or I<preserve_seeds> are randomly set for each job.
531
532 =item B<first_lumi>
533
534 Relevant only for Monte Carlo production for which it defaults to 1. The first job will generate events with this lumi block number, subsequent jobs will
535 increment the lumi block number. Setting this number to 0 (not recommend) means CMSSW will not be able to read multiple such files as they
536 will all have the same run, lumi and event numbers. This check in CMSSW can be bypassed by setting
537 I<process.source.duplicateCheckMode = cms.untracked.string('noDuplicateCheck')> in the input source, should you need to
538 read files produced without setting first_run (in old versions of CRAB) or first_lumi.
539
540 =item B<generator>
541
542 Name of the generator your MC job is using. Some generators require CRAB to skip events, others do not.
543 Possible values are pythia (default), comphep, lhe, and madgraph. This will skip events in your generator input file.
544
545 =item B<executable>
546
547 The name of the executable to be run on remote WN. The default is cmsrun. The executable is either to be found on the release area of the WN, or has been built on user working area on the UI and is (automatically) shipped to WN. If you want to run a script (which might internally call I<cmsrun>, use B<USER.script_exe> instead.
548
549 =item I<DBS and DLS parameters:>
550
551 =item B<dbs_url>
552
553 The URL of the DBS query page. For expert only.
554
555 =item B<show_prod>
556
557 To enable CRAB to show data hosted on Tier1s sites specify I<show_prod> = 1. By default those data are masked.
558
559 =item B<subscribed>
560
561 By setting the flag I<subscribed> = 1 only the replicas that are subscribed to its site are considered.The default is to return all replicas. The intended use of this flag is to avoid sending jobs to sites based on data that is being moved or deleted (and thus not subscribed).
562
563 =item B<no_block_boundary>
564
565 To remove fileblock boundaries in job splitting specify I<no_block_boundary> = 1.
566
567 =back
568
569 B<[USER]>
570
571 =over 4
572
573 =item B<additional_input_files>
574
575 Any additional input file you want to ship to WN: comma separated list. IMPORTANT NOTE: they will be placed in the WN working dir, and not in ${CMS_SEARCH_PATH}. Specific files required by CMSSW application must be placed in the local data directory, which will be automatically shipped by CRAB itself. You do not need to specify the I<ParameterSet> you are using, which will be included automatically. Wildcards are allowed.
576
577 =item B<script_exe>
578
579 A user script that will be run on WN (instead of default cmsrun). It is up to the user to setup properly the script itself to run on WN enviroment. CRAB guarantees that the CMSSW environment is setup (e.g. scram is in the path) and that the modified pset.py will be placed in the working directory, with name CMSSW.py . The user must ensure that a job report named crab_fjr.xml will be written. This can be guaranteed by passing the arguments "-j crab_fjr.xml" to cmsRun in the script. The script itself will be added automatically to the input sandbox so user MUST NOT add it within the B<USER.additional_input_files>.
580
581 =item B<script_arguments>
582
583 Any arguments you want to pass to the B<USER.script_exe>: comma separated list.
584
585 =item B<ui_working_dir>
586
587 Name of the working directory for the current task. By default, a name I<crab_0_(date)_(time)> will be used. If this card is set, any CRAB command which require I<-continue> need to specify also the name of the working directory. A special syntax is also possible, to reuse the name of the dataset provided before: I<ui_working_dir : %(dataset)s> . In this case, if e.g. the dataset is SingleMuon, the ui_working_dir will be set to SingleMuon as well.
588
589 =item B<thresholdLevel>
590
591 This has to be a value between 0 and 100, that indicates the percentage of task completeness (jobs in a ended state are complete, even if failed). The server will notify the user by e-mail (look at the field: B<eMail>) when the task will reach the specified threshold. Works just when using the server.
592
593 =item B<eMail>
594
595 The server will notify the specified e-mail when the task will reaches the specified B<thresholdLevel>. A notification is also sent when the task will reach the 100\% of completeness. This field can also be a list of e-mail: "B<eMail = user1@cern.ch, user2@cern.ch>". Works just when using the server.
596
597 =item B<client>
598
599 Specify the client that can be used to interact with the server in B<CRAB.server_name>. The default is the value in the server configuration.
600
601 =item B<return_data *>
602
603 The output produced by the executable on WN is returned (via output sandbox) to the UI, by issuing the I<-getoutput> command. B<Warning>: this option should be used only for I<small> output, say less than 10MB, since the sandbox cannot accommodate big files. Depending on Resource Broker used, a size limit on output sandbox can be applied: bigger files will be truncated. To be used in alternative to I<copy_data>.
604
605 =item B<outputdir>
606
607 To be used together with I<return_data>. Directory on user interface where to store the output. Full path is mandatory, "~/" is not allowed: the default location of returned output is ui_working_dir/res .
608
609 =item B<logdir>
610
611 To be used together with I<return_data>. Directory on user interface where to store the standard output and error. Full path is mandatory, "~/" is not allowed: the default location of returned output is ui_working_dir/res .
612
613 =item B<copy_data *>
614
615 The output (only that produced by the executable, not the std-out and err) is copied to a Storage Element of your choice (see below). To be used as an alternative to I<return_data> and recommended in case of large output.
616
617 =item B<storage_element>
618
619 To be used with <copy_data>=1
620 If you want to copy the output of your analysis in a official CMS Tier2 or Tier3, you have to write the CMS Site Name of the site, as written in the SiteDB https://cmsweb.cern.ch/sitedb/reports/showReport?reportid=se_cmsname_map.ini (i.e T2_IT_legnaro). You have also to specify the <remote_dir>(see below)
621
622 If you want to copy the output in a not_official_CMS remote site you have to specify the complete storage element name (i.e se.xxx.infn.it).You have also to specify the <storage_path> and the <storage_port> if you do not use the default one(see below).
623
624 =item B<user_remote_dir>
625
626 To be used with <copy_data>=1 and <storage_element> official CMS sites.
627 This is the directory or tree of directories where your output will be stored. This directory will be created under the mountpoint ( which will be discover by CRAB if an official CMS storage Element has been used, or taken from the crab.cfg as specified by the user). B<NOTE> This part of the path will be used as logical file name of your files in the case of publication without using an official CMS storage Element. Generally it should start with "/store".
628
629 =item B<storage_path>
630
631 To be used with <copy_data>=1 and <storage_element> not official CMS sites.
632 This is the full path of the Storage Element writeable by all, the mountpoint of SE (i.e /srm/managerv2?SFN=/pnfs/se.xxx.infn.it/yyy/zzz/)
633
634
635 =item B<storage_pool>
636
637 If you are using CAF scheduler, you can specify the storage pool where to write your output.
638 The default is cmscafuser. If you do not want to use the default, you can overwrite it specifing None
639
640 =item B<storage_port>
641
642 To choose the storage port specify I<storage_port> = N (default is 8443) .
643
644 =item B<local_stage_out *>
645
646 This option enables the local stage out of produced output to the "close storage element" where the job is running, in case of failure of the remote copy to the Storage element decided by the user in che crab.cfg. It has to be used with the copy_data option. In the case of backup copy, the publication of data is forbidden. Set I<local_stage_out> = 1
647
648 =item B<publish_data*>
649
650 To be used with <copy_data>=1
651 To publish your produced output in a local istance of DBS set publish_data = 1
652 All the details about how to use this functionality are written in https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideCrabForPublication
653 N.B 1) if you are using an official CMS site to stored data, the remote dir will be not considered. The directory where data will be stored is decided by CRAB, following the CMS policy in order to be able to re-read published data.
654 2) if you are using a not official CMS site to store data, you have to check the <lfn>, that will be part of the logical file name of you published files, in order to be able to re-read the data.
655
656 =item B<publish_data_name>
657
658 You produced output will be published in your local DBS with dataset name <primarydataset>/<publish_data_name>/USER
659
660 =item B<dbs_url_for_publication>
661
662 Specify the URL of your local DBS istance where CRAB has to publish the output files
663
664 =item B<publish_zero_event>
665
666 T0 force zero event files publication specify I<publish_zero_event> = 1
667
668 =item B<srm_version>
669
670 To choose the srm version specify I<srm_version> = (srmv1 or srmv2).
671
672 =item B<xml_report>
673
674 To be used to switch off the screen report during the status query, enabling the db serialization in a file. Specifying I<xml_report> = FileName CRAB will serialize the DB into CRAB_WORKING_DIR/share/FileName.
675
676 =item B<usenamespace>
677
678 To use the automate namespace definition (perfomed by CRAB) it is possible to set I<usenamespace>=1. The same policy used for the stage out in case of data publication will be applied.
679
680 =item B<debug_wrapper>
681
682 To enable the higer verbose level on wrapper specify I<debug_wrapper> = 1. The Pset contents before and after the CRAB maipulation will be written together with other useful infos.
683
684 =item B<deep_debug>
685
686 To be used in case of unexpected job crash when the sdtout and stderr files are lost. Submitting again the same jobs specifying I<deep_debug> = 1 these files will be reported back. NOTE: it works only on standalone mode for debugging purpose.
687
688 =item B<dontCheckSpaceLeft>
689
690 Set it to 1 to skip the check of free space left on your working directory before attempting to get the output back. Default is 0 (=False)
691
692 =item B<check_user_remote_dir>
693
694 To avoid stage out failures CRAB checks the remote location content at the creation time. By setting I<check_user_remote_dir>=0 crab will skip the check.
695
696 =back
697
698 B<[GRID]>
699
700 =over 4
701
702 =item B<RB>
703
704 Which RB you want to use instead of the default one, as defined in the configuration of your UI. The ones available for CMS are I<CERN> and I<CNAF>. They are actually identical, being a collection of all WMSes available for CMS: the configuration files needed to change the broker will be automatically downloaded from CRAB web page and used.
705 You can use any other RB which is available, if you provide the proper configuration files. E.g., for gLite WMS XYZ, you should provide I<glite.conf.CMS_XYZ>. These files are searched for in the current working directory, and, if not found, on crab web page. So, if you put your private configuration files in the working directory, they will be used, even if they are not available on crab web page.
706 Please get in contact with crab team if you wish to provide your RB or WMS as a service to the CMS community.
707
708 =item B<proxy_server>
709
710 The proxy server to which you delegate the responsibility to renew your proxy once expired. The default is I<myproxy.cern.ch> : change only if you B<really> know what you are doing.
711
712 =item B<role>
713
714 The role to be set in the VOMS. See VOMS documentation for more info.
715
716 =item B<group>
717
718 The group to be set in the VOMS, See VOMS documentation for more info.
719
720 =item B<dont_check_proxy>
721
722 If you do not want CRAB to check your proxy. The creation of the proxy (with proper length), its delegation to a myproxyserver is your responsibility.
723
724 =item B<dont_check_myproxy>
725
726 If you want to to switch off only the proxy renewal set I<dont_check_myproxy>=1. The proxy delegation to a myproxyserver is your responsibility.
727
728 =item B<requirements>
729
730 Any other requirements to be add to JDL. Must be written in compliance with JDL syntax (see LCG user manual for further info). No requirement on Computing element must be set.
731
732 =item B<additional_jdl_parameters:>
733
734 Any other parameters you want to add to jdl file:semicolon separated list, each
735 item B<must> be complete, including the closing ";".
736
737 =item B<wms_service>
738
739 With this field it is also possible to specify which WMS you want to use (https://hostname:port/pathcode) where "hostname" is WMS name, the "port" generally is 7443 and the "pathcode" should be something like "glite_wms_wmproxy_server".
740
741 =item B<max_cpu_time>
742
743 Maximum CPU time needed to finish one job. It will be used to select a suitable queue on the CE. Time in minutes.
744
745 =item B<max_wall_clock_time>
746
747 Same as previous, but with real time, and not CPU one.
748
749 =item B<ce_black_list>
750
751 All the CE (Computing Element) whose name contains the following strings (comma separated list) will not be considered for submission. Use the dns domain (e.g. fnal, cern, ifae, fzk, cnaf, lnl,....). You may use hostnames or CMS Site names (T2_DE_DESY) or substrings.
752
753 =item B<ce_white_list>
754
755 Only the CE (Computing Element) whose name contains the following strings (comma separated list) will be considered for submission. Use the dns domain (e.g. fnal, cern, ifae, fzk, cnaf, lnl,....). You may use hostnames or CMS Site names (T2_DE_DESY) or substrings. Please note that if the selected CE(s) does not contain the data you want to access, no submission can take place.
756
757 =item B<se_black_list>
758
759 All the SE (Storage Element) whose name contains the following strings (comma separated list) will not be considered for submission.It works only if a datasetpath is specified. You may use hostnames or CMS Site names (T2_DE_DESY) or substrings.
760
761 =item B<se_white_list>
762
763 Only the SE (Storage Element) whose name contains the following strings (comma separated list) will be considered for submission.It works only if a datasetpath is specified. Please note that if the selected CE(s) does not contain the data you want to access, no submission can take place. You may use hostnames or CMS Site names (T2_DE_DESY) or substrings.
764
765 =item B<remove_default_blacklist>
766
767 CRAB enforce the T1s Computing Eelements Black List. By default it is appended to the user defined I<CE_black_list>. To remove the enforced T1 black lists set I<remove_default_blacklist>=1.
768
769 =item B<virtual_organization>
770
771 You do not want to change this: it is cms!
772
773 =item B<retry_count>
774
775 Number of time the Grid will try to resubmit your job in case of Grid related problem.
776
777 =item B<shallow_retry_count>
778
779 Number of time shallow resubmission the Grid will try: resubmissions are tried B<only> if the job aborted B<before> start. So you are guaranteed that your jobs run strictly once.
780
781 =item B<maxtarballsize>
782
783 Maximum size of tar-ball in Mb. If bigger, an error will be generated. The actual limit is that on the RB input sandbox. Default is 9.5 Mb (sandbox limit is 10 Mb)
784
785 =item B<skipwmsauth>
786
787 Temporary useful parameter to allow the WMSAuthorisation handling. Specifying I<skipwmsauth> = 1 the pyopenssl problmes will disappear. It is needed working on gLite UI outside of CERN.
788
789 =back
790
791 B<[LSF]> or B<[CAF]> or B<[PBS]>
792
793 =over 4
794
795 =item B<queue>
796
797 The LSF/PBS queue you want to use: if none, the default one will be used. For CAF, the proper queue will be automatically selected.
798
799 =item B<resource>
800
801 The resources to be used within a LSF/PBS queue. Again, for CAF, the right one is selected.
802
803 =back
804
805 =head1 FILES
806
807 I<crab> uses a configuration file I<crab.cfg> which contains configuration parameters. This file is written in the INI-style. The default filename can be changed by the I<-cfg> option.
808
809 I<crab> creates by default a working directory 'crab_0_E<lt>dateE<gt>_E<lt>timeE<gt>'
810
811 I<crab> saves all command lines in the file I<crab.history>.
812
813 =head1 HISTORY
814
815 B<CRAB> is a tool for the CMS analysis on the Grid environment. It is based on the ideas from CMSprod, a production tool originally implemented by Nikolai Smirnov.
816
817 =head1 AUTHORS
818
819 """
820 author_string = '\n'
821 for auth in common.prog_authors:
822 #author = auth[0] + ' (' + auth[2] + ')' + ' E<lt>'+auth[1]+'E<gt>,\n'
823 author = auth[0] + ' E<lt>' + auth[1] +'E<gt>,\n'
824 author_string = author_string + author
825 pass
826 help_string = help_string + author_string[:-2] + '.'\
827 """
828
829 =cut
830 """
831
832 pod = tempfile.mktemp()+'.pod'
833 pod_file = open(pod, 'w')
834 pod_file.write(help_string)
835 pod_file.close()
836
837 if option == 'man':
838 man = tempfile.mktemp()
839 pod2man = 'pod2man --center=" " --release=" " '+pod+' >'+man
840 os.system(pod2man)
841 os.system('man '+man)
842 pass
843 elif option == 'tex':
844 fname = common.prog_name+'-v'+common.prog_version_str
845 tex0 = tempfile.mktemp()+'.tex'
846 pod2tex = 'pod2latex -full -out '+tex0+' '+pod
847 os.system(pod2tex)
848 tex = fname+'.tex'
849 tex_old = open(tex0, 'r')
850 tex_new = open(tex, 'w')
851 for s in tex_old.readlines():
852 if string.find(s, '\\begin{document}') >= 0:
853 tex_new.write('\\title{'+common.prog_name+'\\\\'+
854 '(Version '+common.prog_version_str+')}\n')
855 tex_new.write('\\author{\n')
856 for auth in common.prog_authors:
857 tex_new.write(' '+auth[0]+
858 '\\thanks{'+auth[1]+'} \\\\\n')
859 tex_new.write('}\n')
860 tex_new.write('\\date{}\n')
861 elif string.find(s, '\\tableofcontents') >= 0:
862 tex_new.write('\\maketitle\n')
863 continue
864 elif string.find(s, '\\clearpage') >= 0:
865 continue
866 tex_new.write(s)
867 tex_old.close()
868 tex_new.close()
869 print 'See '+tex
870 pass
871 elif option == 'html':
872 fname = common.prog_name+'-v'+common.prog_version_str+'.html'
873 pod2html = 'pod2html --title='+common.prog_name+\
874 ' --infile='+pod+' --outfile='+fname
875 os.system(pod2html)
876 print 'See '+fname
877 pass
878 elif option == 'txt':
879 fname = common.prog_name+'-v'+common.prog_version_str+'.txt'
880 pod2text = 'pod2text '+pod+' '+fname
881 os.system(pod2text)
882 print 'See '+fname
883 pass
884
885 sys.exit(0)