ViewVC Help
View File | Revision Log | Show Annotations | Root Listing
root/cvsroot/COMP/CRAB/python/crab_help.py
Revision: 1.94
Committed: Tue Apr 7 10:14:29 2009 UTC (16 years ago) by spiga
Content type: text/x-python
Branch: MAIN
CVS Tags: CRAB_2_6_0_pre2, CRAB_2_6_0_pre1, CRAB_2_5_1, CRAB_2_5_1_pre4, CRAB_2_5_1_pre3, CRAB_2_5_1_pre2
Changes since 1.93: +35 -2 lines
Log Message:
update documentation. added MultiCrab docs

File Contents

# User Rev Content
1 nsmirnov 1.1
2     ###########################################################################
3     #
4     # H E L P F U N C T I O N S
5     #
6     ###########################################################################
7    
8     import common
9    
10     import sys, os, string
11 spiga 1.34
12 nsmirnov 1.1 import tempfile
13    
14     ###########################################################################
15     def usage():
16 slacapra 1.43 print 'in usage()'
17 nsmirnov 1.1 usa_string = common.prog_name + """ [options]
18 slacapra 1.3
19     The most useful general options (use '-h' to get complete help):
20    
21 spiga 1.85 -create -- Create all the jobs.
22     -submit n -- Submit the first n available jobs. Default is all.
23     -status [range] -- check status of all jobs.
24     -getoutput|-get [range] -- get back the output of all jobs: if range is defined, only of selected jobs.
25     -extend -- Extend an existing task to run on new fileblocks if there.
26     -publish [dbs_url] -- after the getouput, publish the data user in a local DBS instance.
27     -kill [range] -- kill submitted jobs.
28     -resubmit [range] -- resubmit killed/aborted/retrieved jobs.
29     -copyData [range] -- copy locally the output stored on remote SE.
30     -renewCredential -- renew credential on the server.
31     -clean -- gracefully cleanup the directory of a task.
32     -match|-testJdl [range] -- check if resources exist which are compatible with jdl.
33 slacapra 1.89 -report -- print a short report about the task
34 spiga 1.85 -list [range] -- show technical job details.
35     -postMortem [range] -- provide a file with information useful for post-mortem analysis of the jobs.
36     -printId [range] -- print the job SID or Task Unique ID while using the server.
37     -createJdl [range] -- provide files with a complete Job Description (JDL).
38     -validateCfg [fname] -- parse the ParameterSet using the framework's Python API.
39     -continue|-c [dir] -- Apply command to task stored in [dir].
40     -h [format] -- Detailed help. Formats: man (default), tex, html, txt.
41     -cfg fname -- Configuration file name. Default is 'crab.cfg'.
42     -debug N -- set the verbosity level to N.
43     -v -- Print version and exit.
44 nsmirnov 1.1
45 slacapra 1.4 "range" has syntax "n,m,l-p" which correspond to [n,m,l,l+1,...,p-1,p] and all possible combination
46    
47 nsmirnov 1.1 Example:
48 slacapra 1.26 crab -create -submit 1
49 nsmirnov 1.1 """
50 slacapra 1.43 print usa_string
51 nsmirnov 1.1 sys.exit(2)
52    
53     ###########################################################################
54     def help(option='man'):
55     help_string = """
56     =pod
57    
58     =head1 NAME
59    
60     B<CRAB>: B<C>ms B<R>emote B<A>nalysis B<B>uilder
61    
62 slacapra 1.3 """+common.prog_name+""" version: """+common.prog_version_str+"""
63 nsmirnov 1.1
64 slacapra 1.19 This tool B<must> be used from an User Interface and the user is supposed to
65 fanzago 1.37 have a valid Grid certificate.
66 nsmirnov 1.1
67     =head1 SYNOPSIS
68    
69 slacapra 1.13 B<"""+common.prog_name+"""> [I<options>] [I<command>]
70 nsmirnov 1.1
71     =head1 DESCRIPTION
72    
73 ewv 1.52 CRAB is a Python program intended to simplify the process of creation and submission of CMS analysis jobs to the Grid environment .
74 nsmirnov 1.1
75 slacapra 1.3 Parameters for CRAB usage and configuration are provided by the user changing the configuration file B<crab.cfg>.
76 nsmirnov 1.1
77 spiga 1.48 CRAB generates scripts and additional data files for each job. The produced scripts are submitted directly to the Grid. CRAB makes use of BossLite to interface to the Grid scheduler, as well as for logging and bookkeeping.
78 nsmirnov 1.1
79 ewv 1.52 CRAB supports any CMSSW based executable, with any modules/libraries, including user provided ones, and deals with the output produced by the executable. CRAB provides an interface to CMS data discovery services (DBS and DLS), which are completely hidden to the final user. It also splits a task (such as analyzing a whole dataset) into smaller jobs, according to user requirements.
80 nsmirnov 1.1
81 slacapra 1.46 CRAB can be used in two ways: StandAlone and with a Server.
82     The StandAlone mode is suited for small task, of the order of O(100) jobs: it submits the jobs directly to the scheduler, and these jobs are under user responsibility.
83 ewv 1.52 In the Server mode, suited for larger tasks, the jobs are prepared locally and then passed to a dedicated CRAB server, which then interacts with the scheduler on behalf of the user, including additional services, such as automatic resubmission, status caching, output retrieval, and more.
84 slacapra 1.46 The CRAB commands are exactly the same in both cases.
85    
86 slacapra 1.13 CRAB web page is available at
87    
88 spiga 1.94 I<https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideCrab>
89 slacapra 1.6
90 slacapra 1.19 =head1 HOW TO RUN CRAB FOR THE IMPATIENT USER
91    
92 ewv 1.52 Please, read all the way through in any case!
93 slacapra 1.19
94     Source B<crab.(c)sh> from the CRAB installation area, which have been setup either by you or by someone else for you.
95    
96 ewv 1.52 Modify the CRAB configuration file B<crab.cfg> according to your need: see below for a complete list. A template and commented B<crab.cfg> can be found on B<$CRABDIR/python/crab.cfg>
97 slacapra 1.19
98 ewv 1.44 ~>crab -create
99 slacapra 1.19 create all jobs (no submission!)
100    
101 spiga 1.25 ~>crab -submit 2 -continue [ui_working_dir]
102 slacapra 1.19 submit 2 jobs, the ones already created (-continue)
103    
104 slacapra 1.26 ~>crab -create -submit 2
105 slacapra 1.19 create _and_ submit 2 jobs
106    
107 spiga 1.25 ~>crab -status
108 slacapra 1.19 check the status of all jobs
109    
110 spiga 1.25 ~>crab -getoutput
111 slacapra 1.19 get back the output of all jobs
112    
113 ewv 1.44 ~>crab -publish
114     publish all user outputs in the DBS specified in the crab.cfg (dbs_url_for_publication) or written as argument of this option
115 fanzago 1.42
116 slacapra 1.20 =head1 RUNNING CMSSW WITH CRAB
117 nsmirnov 1.1
118 slacapra 1.3 =over 4
119    
120     =item B<A)>
121    
122 ewv 1.52 Develop your code in your CMSSW working area. Do anything which is needed to run interactively your executable, including the setup of run time environment (I<eval `scramv1 runtime -sh|csh`>), a suitable I<ParameterSet>, etc. It seems silly, but B<be extra sure that you actually did compile your code> I<scramv1 b>.
123 slacapra 1.3
124 ewv 1.44 =item B<B)>
125 slacapra 1.3
126 slacapra 1.20 Source B<crab.(c)sh> from the CRAB installation area, which have been setup either by you or by someone else for you. Modify the CRAB configuration file B<crab.cfg> according to your need: see below for a complete list.
127    
128     The most important parameters are the following (see below for complete description of each parameter):
129    
130     =item B<Mandatory!>
131    
132     =over 6
133    
134     =item B<[CMSSW]> section: datasetpath, pset, splitting parameters, output_file
135    
136     =item B<[USER]> section: output handling parameters, such as return_data, copy_data etc...
137    
138     =back
139    
140     =item B<Run it!>
141    
142 fanzago 1.37 You must have a valid voms-enabled Grid proxy. See CRAB web page for details.
143 slacapra 1.20
144     =back
145    
146 spiga 1.94 =head1 RUNNING MULTICRAB
147    
148     MultiCRAB is a CRAB extension to submit the same job to multiple datasets in one go.
149    
150     The use case for multicrab is when you have your analysis code that you want to run on several datasets, typically some signals plus some backgrounds (for MC studies)
151     or on different streams/configuration/runs for real data taking. You want to run exactly the same code, and also the crab.cfg are different only for few keys:
152     for sure datasetpath but also other keys, such as eg total_number_of_events, in case you want to run on all signals but only a fraction of background, or anything else.
153     So far, you would have to create a set of crab.cfg, one for each dataset you want to access, and submit several instances of CRAB, saving the output to different locations.
154     Multicrab is meant to automatize this procedure.
155     In addition to the usual crab.cfg, there is a new configuration file called multicrab.cfg. The syntax is very similar to that of crab.cfg, namely
156     [SECTION] <crab.cfg Section>.Key=Value
157    
158     Please note that it is mandatory to add explicitly the crab.cfg [SECTION] in front of [KEY].
159     The role of multicrab.cfg is to apply modification to the template crab.cfg, some which are common to all tasks, and some which are task specific.
160    
161     =head2 So there are two sections:
162    
163     =over 2
164    
165     =item B<[COMMON]>
166    
167     section: which applies to all task, and which is fully equivalent to modify directly the template crab.cfg
168    
169     =item B<[DATASET]>
170    
171     section: there could be an arbitrary number of sections, one for each dataset you want to run. The names are free (but COMMON and MULTICRAB), and they will be used as ui_working_dir for the task as well as an appendix to the user_remote_dir in case of output copy to remote SE. So, the task corresponding to section, say [SIGNAL] will be placed in directory SIGNAL, and the output will be put on /SIGNAL/, so SIGNAL will be added as last subdir in the user_remote_dir.
172    
173     =back
174    
175     For further details please visit
176    
177     I<https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideMultiCrab>
178    
179 slacapra 1.19 =head1 HOW TO RUN ON CONDOR-G
180    
181     The B<Condor-G> mode for B<CRAB> is a special submission mode next to the standard Resource Broker submission. It is designed to submit jobs directly to a site and not using the Resource Broker.
182    
183 ewv 1.52 Due to the nature of B<Condor-G> submission, the B<Condor-G> mode is restricted to OSG sites within the CMS Grid, currently the 7 US T2: Florida(ufl.edu), Nebraska(unl.edu), San Diego(ucsd.edu), Purdue(purdue.edu), Wisconsin(wisc.edu), Caltech(ultralight.org), MIT(mit.edu).
184 slacapra 1.19
185     =head2 B<Requirements:>
186    
187     =over 2
188    
189     =item installed and running local Condor scheduler
190    
191     (either installed by the local Sysadmin or self-installed using the VDT user interface: http://www.uscms.org/SoftwareComputing/UserComputing/Tutorials/vdt.html)
192    
193     =item locally available LCG or OSG UI installation
194    
195 ewv 1.44 for authentication via Grid certificate proxies ("voms-proxy-init -voms cms" should result in valid proxy)
196 slacapra 1.19
197 ewv 1.52 =item set the environment variable EDG_WL_LOCATION to the edg directory of the local LCG or OSG UI installation
198 slacapra 1.19
199     =back
200    
201     =head2 B<What the Condor-G mode can do:>
202    
203     =over 2
204    
205 ewv 1.52 =item submission directly to multiple OSG sites,
206 slacapra 1.19
207 ewv 1.52 the requested dataset must be published correctly by the site in the local and global services.
208     Previous restrictions on submitting only to a single site have been removed. SE and CE whitelisting
209     and blacklisting work as in the other modes.
210 slacapra 1.19
211     =back
212    
213     =head2 B<What the Condor-G mode cannot do:>
214    
215     =over 2
216    
217     =item submit jobs if no condor scheduler is running on the submission machine
218    
219     =item submit jobs if the local condor installation does not provide Condor-G capabilities
220    
221 ewv 1.52 =item submit jobs to an LCG site
222 slacapra 1.19
223 fanzago 1.37 =item support Grid certificate proxy renewal via the myproxy service
224 slacapra 1.19
225     =back
226    
227     =head2 B<CRAB configuration for Condor-G mode:>
228    
229 ewv 1.52 The CRAB configuration for the Condor-G mode only requires one change in crab.cfg:
230 nsmirnov 1.1
231 slacapra 1.19 =over 2
232 slacapra 1.3
233 slacapra 1.19 =item select condor_g Scheduler:
234 slacapra 1.4
235 slacapra 1.19 scheduler = condor_g
236 slacapra 1.4
237 slacapra 1.19 =back
238 slacapra 1.4
239 ewv 1.52 =head1 COMMANDS
240 slacapra 1.4
241     =over 4
242    
243 slacapra 1.26 =item B<-create>
244 slacapra 1.4
245 slacapra 1.26 Create the jobs: from version 1_3_0 it is only possible to create all jobs.
246 ewv 1.52 The maximum number of jobs depends on dataset and splitting directives. This set of identical jobs accessing the same dataset are defined as a task.
247 slacapra 1.4 This command create a directory with default name is I<crab_0_date_time> (can be changed via ui_working_dir parameter, see below). Inside this directory it is placed whatever is needed to submit your jobs. Also the output of your jobs (once finished) will be place there (see after). Do not cancel by hand this directory: rather use -clean (see).
248     See also I<-continue>.
249    
250 slacapra 1.46 =item B<-submit [range]>
251 slacapra 1.4
252 slacapra 1.46 Submit n jobs: 'n' is either a positive integer or 'all' or a [range]. Default is all.
253 ewv 1.52 If 'n' is passed as argument, the first 'n' suitable jobs will be submitted. Please note that this is behaviour is different from other commands, where -command N means act the command to the job N, and not to the first N jobs. If a [range] is passed, the selected jobs will be submitted.
254 slacapra 1.46 This option must be used in conjunction with -create (to create and submit immediately) or with -continue (which is assumed by default), to submit previously created jobs. Failure to do so will stop CRAB and generate an error message. See also I<-continue>.
255 slacapra 1.4
256     =item B<-continue [dir] | -c [dir]>
257    
258     Apply the action on the task stored on directory [dir]. If the task directory is the standard one (crab_0_date_time), the more recent in time is taken. Any other directory must be specified.
259 slacapra 1.46 Basically all commands (but -create) need -continue, so it is automatically assumed. Of course, the standard task directory is used in this case.
260 slacapra 1.4
261 slacapra 1.26 =item B<-status>
262 nsmirnov 1.1
263 spiga 1.48 Check the status of the jobs, in all states. All the info (e.g. application and wrapper exit codes) will be available only after the output retrieval.
264 nsmirnov 1.1
265 slacapra 1.20 =item B<-getoutput|-get [range]>
266 nsmirnov 1.1
267 slacapra 1.20 Retrieve the output declared by the user via the output sandbox. By default the output will be put in task working dir under I<res> subdirectory. This can be changed via config parameters. B<Be extra sure that you have enough free space>. See I<range> below for syntax.
268 nsmirnov 1.1
269 fanzago 1.42 =item B<-publish [dbs_url]>
270    
271     Publish user output in a local DBS instance after retrieving of output. By default the publish uses the dbs_url_for_publication specified in the crab.cfg file, otherwise you can write it as argument of this option.
272    
273 slacapra 1.4 =item B<-resubmit [range]>
274 nsmirnov 1.1
275 fanzago 1.37 Resubmit jobs which have been previously submitted and have been either I<killed> or are I<aborted>. See I<range> below for syntax.
276 nsmirnov 1.1
277 spiga 1.60 =item B<-extend>
278    
279 ewv 1.64 Create new jobs for an existing task, checking if new blocks are available for the given dataset.
280 spiga 1.60
281 slacapra 1.4 =item B<-kill [range]>
282 nsmirnov 1.1
283 slacapra 1.4 Kill (cancel) jobs which have been submitted to the scheduler. A range B<must> be used in all cases, no default value is set.
284 nsmirnov 1.1
285 spiga 1.74 =item B<-copyData [range]>
286 slacapra 1.58
287 ewv 1.78 Copy locally (on current working directory) the output previously stored on remote SE by the jobs. Of course, only if copy_data option has been set.
288 slacapra 1.58
289 spiga 1.80 =item B<-renewCredential >
290 mcinquil 1.59
291 spiga 1.80 If using the server modality, this command allows to delegate a valid credential (proxy/token) to the server associated with the task.
292 mcinquil 1.59
293 spiga 1.85 =item B<-match|-testJdl [range]>
294 nsmirnov 1.1
295 fanzago 1.71 Check if the job can find compatible resources. It is equivalent of doing I<edg-job-list-match> on edg.
296 nsmirnov 1.1
297 slacapra 1.20 =item B<-printId [range]>
298    
299 slacapra 1.82 Just print the job identifier, which can be the SID (Grid job identifier) of the job(s) or the taskId if you are using CRAB with the server or local scheduler Id. If [range] is "full", the the SID of all the jobs are printed, also in the case of submission with server.
300 slacapra 1.20
301 spiga 1.53 =item B<-printJdl [range]>
302    
303 ewv 1.64 Collect the full Job Description in a file located under share directory. The file base name is File- .
304 spiga 1.53
305 slacapra 1.4 =item B<-postMortem [range]>
306 nsmirnov 1.1
307 slacapra 1.46 Try to collect more information of the job from the scheduler point of view.
308 nsmirnov 1.1
309 slacapra 1.13 =item B<-list [range]>
310    
311 ewv 1.52 Dump technical information about jobs: for developers only.
312 slacapra 1.13
313 slacapra 1.89 =item B<-report>
314    
315     Print a short report about the task, namely the total number of events and files processed/requested/available, the name of the datasetpath, a summary of the status of the jobs, the list of runs and lumi sections, and so on. In principle it should contain all the info needed for analysis. Work in progress.
316    
317 slacapra 1.4 =item B<-clean [dir]>
318 nsmirnov 1.1
319 slacapra 1.26 Clean up (i.e. erase) the task working directory after a check whether there are still running jobs. In case, you are notified and asked to kill them or retrieve their output. B<Warning> this will possibly delete also the output produced by the task (if any)!
320 nsmirnov 1.1
321 slacapra 1.4 =item B<-help [format] | -h [format]>
322 nsmirnov 1.1
323 slacapra 1.4 This help. It can be produced in three different I<format>: I<man> (default), I<tex> and I<html>.
324 nsmirnov 1.1
325 slacapra 1.4 =item B<-v>
326 nsmirnov 1.1
327 slacapra 1.4 Print the version and exit.
328 nsmirnov 1.1
329 slacapra 1.4 =item B<range>
330 nsmirnov 1.1
331 slacapra 1.13 The range to be used in many of the above commands has the following syntax. It is a comma separated list of jobs ranges, each of which may be a job number, or a job range of the form first-last.
332 slacapra 1.4 Example: 1,3-5,8 = {1,3,4,5,8}
333 nsmirnov 1.1
334 ewv 1.44 =back
335 slacapra 1.6
336 slacapra 1.4 =head1 OPTION
337 nsmirnov 1.1
338 slacapra 1.6 =over 4
339    
340 slacapra 1.4 =item B<-cfg [file]>
341 nsmirnov 1.1
342 slacapra 1.4 Configuration file name. Default is B<crab.cfg>.
343 nsmirnov 1.1
344 slacapra 1.4 =item B<-debug [level]>
345 nsmirnov 1.1
346 slacapra 1.13 Set the debug level: high number for high verbosity.
347 nsmirnov 1.1
348 ewv 1.44 =back
349 slacapra 1.6
350 slacapra 1.5 =head1 CONFIGURATION PARAMETERS
351    
352 spiga 1.25 All the parameter describe in this section can be defined in the CRAB configuration file. The configuration file has different sections: [CRAB], [USER], etc. Each parameter must be defined in its proper section. An alternative way to pass a config parameter to CRAB is via command line interface; the syntax is: crab -SECTION.key value . For example I<crab -USER.outputdir MyDirWithFullPath> .
353 slacapra 1.5 The parameters passed to CRAB at the creation step are stored, so they cannot be changed by changing the original crab.cfg . On the other hand the task is protected from any accidental change. If you want to change any parameters, this require the creation of a new task.
354 slacapra 1.6 Mandatory parameters are flagged with a *.
355 slacapra 1.5
356     B<[CRAB]>
357 slacapra 1.6
358 slacapra 1.13 =over 4
359 slacapra 1.5
360 slacapra 1.6 =item B<jobtype *>
361 slacapra 1.5
362 slacapra 1.26 The type of the job to be executed: I<cmssw> jobtypes are supported
363 slacapra 1.6
364     =item B<scheduler *>
365    
366 ewv 1.52 The scheduler to be used: I<glitecoll> is the more efficient grid scheduler and should be used. Other choice are I<glite>, same as I<glitecoll> but without bulk submission (and so slower) or I<condor_g> (see specific paragraph) or I<edg> which is the former Grid scheduler, which will be dismissed in some future
367     From version 210, also local scheduler are supported, for the time being only at CERN. I<LSF> is the standard CERN local scheduler or I<CAF> which is LSF dedicated to CERN Analysis Facilities.
368 slacapra 1.5
369 slacapra 1.81 =item B<use_server>
370    
371     To use the server for job handling (recommended) 0=no (default), 1=true. The server to be used will be found automatically from a list of available ones: it can also be specified explicitly by using I<server_name> (see below)
372    
373 mcinquil 1.35 =item B<server_name>
374    
375 slacapra 1.81 To use the CRAB-server support it is needed to fill this key with server name as <Server_DOMAIN> (e.g. cnaf,fnal). If this is set, I<use_server> is set to true automatically.
376     If I<server_name=None> crab works in standalone way, same as using I<use_server=0> and no I<server_name>.
377 spiga 1.48 The server available to users can be found from CRAB web page.
378 mcinquil 1.35
379 slacapra 1.5 =back
380    
381 slacapra 1.20 B<[CMSSW]>
382    
383     =over 4
384    
385 slacapra 1.22 =item B<datasetpath *>
386 slacapra 1.20
387 slacapra 1.22 the path of processed dataset as defined on the DBS. It comes with the format I</PrimaryDataset/DataTier/Process> . In case no input is needed I<None> must be specified.
388 slacapra 1.20
389 spiga 1.90 =item B<ads *>
390    
391     you may want to run over an AnalysisDataSet. After define the related path in I<datasetpath>, take care to specify ads=1.
392    
393 afanfani 1.50 =item B<runselection *>
394 ewv 1.52
395 afanfani 1.50 within a dataset you can restrict to run on a specific run number or run number range. For example runselection=XYZ or runselection=XYZ1-XYZ2 .
396    
397 spiga 1.57 =item B<use_parent *>
398    
399 spiga 1.90 within a dataset you can ask to run over the related parent files too. E.g., this will give you access to the RAW data while running over a RECO sample. Setting use_parent=1 CRAB determines the parent files from DBS and will add secondaryFileNames = cms.untracked.vstring( <LIST of parent FIles> ) to the pool source section of your parameter set.
400 spiga 1.57
401 slacapra 1.22 =item B<pset *>
402 slacapra 1.20
403 ewv 1.64 the ParameterSet to be used. Both .cfg and .py parameter sets are supported for the relevant versions of CMSSW.
404 slacapra 1.20
405 slacapra 1.26 =item I<Of the following three parameter exactly two must be used, otherwise CRAB will complain.>
406 slacapra 1.20
407 slacapra 1.22 =item B<total_number_of_events *>
408    
409 slacapra 1.26 the number of events to be processed. To access all available events, use I<-1>. Of course, the latter option is not viable in case of no input. In this case, the total number of events will be used to split the task in jobs, together with I<event_per_job>.
410 slacapra 1.22
411 slacapra 1.26 =item B<events_per_job*>
412 slacapra 1.22
413 slacapra 1.26 number of events to be accessed by each job. Since a job cannot cross the boundary of a fileblock it might be that the actual number of events per job is not exactly what you asked for. It can be used also with No input.
414 slacapra 1.22
415     =item B<number_of_jobs *>
416    
417     Define the number of job to be run for the task. The number of event for each job is computed taking into account the total number of events required as well as the granularity of EventCollections. Can be used also with No input.
418    
419 spiga 1.90 =item B<split_by_run *>
420    
421     to activate the split run based (each job will access a different run) use I<split_by_run>=1. You can definfe also I<number_of_jobs> and/or I<runselection>. NOTE: the Run Based combined with Event Based split is not yet available.
422    
423 slacapra 1.22 =item B<output_file *>
424    
425 slacapra 1.63 the output files produced by your application (comma separated list). From CRAB 2_2_2 onward, if TFileService is defined in user Pset, the corresponding output file is automatically added to the list of output files. User can avoid this by setting B<skip_TFileService_output> = 1 (default is 0 == file included). The Edm output produced via PoolOutputModule can be automatically added by setting B<get_edm_output> = 1 (default is 0 == no)
426 slacapra 1.61
427     =item B<skip_TFileService_output>
428    
429     Force CRAB to skip the inclusion of file produced by TFileService to list of output files. Default is I<0>, namely the file is included.
430 slacapra 1.20
431 slacapra 1.63 =item B<get_edm_output>
432    
433     Force CRAB to add the EDM output file, as defined in PSET in PoolOutputModule (if any) to be added to the list of output files. Default is 0 (== no inclusion)
434    
435 ewv 1.47 =item B<increment_seeds>
436    
437     Specifies a comma separated list of seeds to increment from job to job. The initial value is taken
438     from the CMSSW config file. I<increment_seeds=sourceSeed,g4SimHits> will set sourceSeed=11,12,13 and g4SimHits=21,22,23 on
439     subsequent jobs if the values of the two seeds are 10 and 20 in the CMSSW config file.
440    
441     See also I<preserve_seeds>. Seeds not listed in I<increment_seeds> or I<preserve_seeds> are randomly set for each job.
442    
443     =item B<preserve_seeds>
444    
445 ewv 1.78 Specifies a comma separated list of seeds to which CRAB will not change from their values in the user
446 ewv 1.47 CMSSW config file. I<preserve_seeds=sourceSeed,g4SimHits> will leave the Pythia and GEANT seeds the same for every job.
447    
448     See also I<increment_seeds>. Seeds not listed in I<increment_seeds> or I<preserve_seeds> are randomly set for each job.
449    
450 slacapra 1.30 =item B<first_run>
451    
452     First run to be generated in a generation jobs. Relevant only for no-input workflow.
453    
454 ewv 1.78 =item B<generator>
455 ewv 1.79
456     Name of the generator your MC job is using. Some generators require CRAB to skip events, others do not.
457     Possible values are pythia, comphep, and madgraph. This will skip events in your generator input file.
458 ewv 1.78
459 slacapra 1.31 =item B<executable>
460 slacapra 1.30
461 slacapra 1.31 The name of the executable to be run on remote WN. The default is cmsrun. The executable is either to be found on the release area of the WN, or has been built on user working area on the UI and is (automatically) shipped to WN. If you want to run a script (which might internally call I<cmsrun>, use B<USER.script_exe> instead.
462 slacapra 1.30
463     =item I<DBS and DLS parameters:>
464    
465 slacapra 1.26 =item B<dbs_url>
466 slacapra 1.6
467 slacapra 1.40 The URL of the DBS query page. For expert only.
468 slacapra 1.13
469 spiga 1.84 =item B<show_prod>
470    
471 spiga 1.86 To enable CRAB to show data hosted on Tier1s sites specify I<show_prod> = 1. By default those data are masked.
472    
473     =item B<no_block_boundary>
474    
475     To remove fileblock boundaries in job splitting specify I<no_block_boundary> = 1.
476 spiga 1.84
477 slacapra 1.13 =back
478    
479     B<[USER]>
480    
481     =over 4
482    
483 slacapra 1.6 =item B<additional_input_files>
484    
485 spiga 1.67 Any additional input file you want to ship to WN: comma separated list. IMPORTANT NOTE: they will be placed in the WN working dir, and not in ${CMS_SEARCH_PATH}. Specific files required by CMSSW application must be placed in the local data directory, which will be automatically shipped by CRAB itself. You do not need to specify the I<ParameterSet> you are using, which will be included automatically. Wildcards are allowed.
486 slacapra 1.6
487 slacapra 1.31 =item B<script_exe>
488    
489 ewv 1.78 A user script that will be run on WN (instead of default cmsrun). It is up to the user to setup properly the script itself to run on WN enviroment. CRAB guarantees that the CMSSW environment is setup (e.g. scram is in the path) and that the modified pset.cfg will be placed in the working directory, with name CMSSW.cfg . The user must ensure that a job report named crab_fjr.xml will be written. This can be guaranteed by passing the arguments "-j crab_fjr.xml" to cmsRun in the script. The script itself will be added automatically to the input sandbox so user MUST NOT add it within the B<USER.additional_input_files>.
490 slacapra 1.31
491 slacapra 1.6 =item B<ui_working_dir>
492    
493 ewv 1.52 Name of the working directory for the current task. By default, a name I<crab_0_(date)_(time)> will be used. If this card is set, any CRAB command which require I<-continue> need to specify also the name of the working directory. A special syntax is also possible, to reuse the name of the dataset provided before: I<ui_working_dir : %(dataset)s> . In this case, if e.g. the dataset is SingleMuon, the ui_working_dir will be set to SingleMuon as well.
494 slacapra 1.6
495 mcinquil 1.35 =item B<thresholdLevel>
496    
497     This has to be a value between 0 and 100, that indicates the percentage of task completeness (jobs in a ended state are complete, even if failed). The server will notify the user by e-mail (look at the field: B<eMail>) when the task will reach the specified threshold. Works just with the server_mode = 1.
498    
499     =item B<eMail>
500    
501 ewv 1.52 The server will notify the specified e-mail when the task will reaches the specified B<thresholdLevel>. A notification is also sent when the task will reach the 100\% of completeness. This field can also be a list of e-mail: "B<eMail = user1@cern.ch, user2@cern.ch>". Works just with the server_mode = 1.
502 mcinquil 1.35
503 slacapra 1.6 =item B<return_data *>
504    
505 ewv 1.52 The output produced by the executable on WN is returned (via output sandbox) to the UI, by issuing the I<-getoutput> command. B<Warning>: this option should be used only for I<small> output, say less than 10MB, since the sandbox cannot accommodate big files. Depending on Resource Broker used, a size limit on output sandbox can be applied: bigger files will be truncated. To be used in alternative to I<copy_data>.
506 slacapra 1.6
507     =item B<outputdir>
508    
509 ewv 1.52 To be used together with I<return_data>. Directory on user interface where to store the output. Full path is mandatory, "~/" is not allowed: the default location of returned output is ui_working_dir/res .
510 slacapra 1.6
511     =item B<logdir>
512    
513 ewv 1.52 To be used together with I<return_data>. Directory on user interface where to store the standard output and error. Full path is mandatory, "~/" is not allowed: the default location of returned output is ui_working_dir/res .
514 slacapra 1.6
515     =item B<copy_data *>
516    
517 ewv 1.52 The output (only that produced by the executable, not the std-out and err) is copied to a Storage Element of your choice (see below). To be used as an alternative to I<return_data> and recommended in case of large output.
518 slacapra 1.6
519     =item B<storage_element>
520    
521 fanzago 1.71 To be used with <copy_data>=1
522     If you want to copy the output of your analysis in a official CMS Tier2 or Tier3, you have to write the CMS Site Name of the site, as written in the SiteDB https://cmsweb.cern.ch/sitedb/reports/showReport?reportid=se_cmsname_map.ini (i.e T2_IT_legnaro). You have also to specify the <remote_dir>(see below)
523    
524 ewv 1.78 If you want to copy the output in a not_official_CMS remote site you have to specify the complete storage element name (i.e se.xxx.infn.it).You have also to specify the <storage_path> and the <storage_port> if you do not use the default one(see below).
525 fanzago 1.71
526     =item B<user_remote_dir>
527    
528     To be used with <copy_data>=1 and <storage_element> official CMS sites.
529 spiga 1.73 This is the directory or tree of directories where your output will be stored. This directory will be created under the mountpoint ( which will be discover by CRAB if an official CMS storage Element has been used, or taken from the crab.cfg as specified by the user). B<NOTE> This part of the path will be used as logical file name of your files in the case of publication without using an official CMS storage Element.
530 slacapra 1.6
531     =item B<storage_path>
532    
533 fanzago 1.71 To be used with <copy_data>=1 and <storage_element> not official CMS sites.
534     This is the full path of the Storage Element writeable by all, the mountpoint of SE (i.e /srm/managerv2?SFN=/pnfs/se.xxx.infn.it/yyy/zzz/)
535    
536 slacapra 1.6
537 fanzago 1.72 =item B<storage_pool>
538    
539     If you are using CAF scheduler, you can specify the storage pool where to write your output.
540     The default is cmscafuser. If you do not want to use the default, you can overwrite it specifing None
541    
542 spiga 1.70 =item B<storage_port>
543    
544     To choose the storage port specify I<storage_port> = N (default is 8443) .
545    
546 fanzago 1.71 =item B<publish_data*>
547    
548     To be used with <copy_data>=1
549     To publish your produced output in a local istance of DBS set publish_data = 1
550 fanzago 1.77 All the details about how to use this functionality are written in https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideCrabForPublication
551 ewv 1.78 N.B 1) if you are using an official CMS site to stored data, the remote dir will be not considered. The directory where data will be stored is decided by CRAB, following the CMS policy in order to be able to re-read published data.
552     2) if you are using a not official CMS site to store data, you have to check the <lfn>, that will be part of the logical file name of you published files, in order to be able to re-read the data.
553 fanzago 1.71
554     =item B<publish_data_name>
555    
556     You produced output will be published in your local DBS with dataset name <primarydataset>/<publish_data_name>/USER
557    
558     =item B<dbs_url_for_publication>
559    
560     Specify the URL of your local DBS istance where CRAB has to publish the output files
561    
562 spiga 1.93 =item B<pubilish_zero_event>
563    
564     T0 force zero event files publication specify I<pubilish_zero_event> = 1
565    
566 spiga 1.55 =item B<srm_version>
567 slacapra 1.46
568 spiga 1.69 To choose the srm version specify I<srm_version> = (srmv1 or srmv2).
569 slacapra 1.46
570 spiga 1.51 =item B<xml_report>
571    
572     To be used to switch off the screen report during the status query, enabling the db serialization in a file. Specifying I<xml_report> = FileName CRAB will serialize the DB into CRAB_WORKING_DIR/share/FileName.
573 slacapra 1.6
574 spiga 1.55 =item B<usenamespace>
575    
576 ewv 1.64 To use the automate namespace definition (perfomed by CRAB) it is possible to set I<usenamespace>=1. The same policy used for the stage out in case of data publication will be applied.
577 spiga 1.54
578 spiga 1.55 =item B<debug_wrapper>
579    
580 spiga 1.87 To enable the higer verbose level on wrapper specify I<debug_wrapper> = 1. The Pset contents before and after the CRAB maipulation will be written together with other useful infos.
581 spiga 1.54
582 spiga 1.75 =item B<deep_debug>
583    
584 ewv 1.78 To be used in case of unexpected job crash when the sdtout and stderr files are lost. Submitting again the same jobs specifying I<deep_debug> = 1 these files will be reported back. NOTE: it works only on standalone mode for debugging purpose.
585 spiga 1.75
586 slacapra 1.68 =item B<dontCheckSpaceLeft>
587    
588     Set it to 1 to skip the check of free space left on your working directory before attempting to get the output back. Default is 0 (=False)
589    
590 spiga 1.91
591     =item B<local_stage_out>
592 spiga 1.92
593     To use the local stage out (i.e. to the closeSE), in case of remote stage out failure, set I<local_stage_out> = 1 .
594 spiga 1.91
595 slacapra 1.6 =back
596    
597 slacapra 1.5 B<[EDG]>
598 nsmirnov 1.1
599 slacapra 1.13 =over 4
600 slacapra 1.6
601 slacapra 1.13 =item B<RB>
602 slacapra 1.6
603 ewv 1.52 Which RB you want to use instead of the default one, as defined in the configuration of your UI. The ones available for CMS are I<CERN> and I<CNAF>. They are actually identical, being a collection of all RB/WMS available for CMS: the configuration files needed to change the broker will be automatically downloaded from CRAB web page and used.
604     You can use any other RB which is available, if you provide the proper configuration files. E.g., for RB XYZ, you should provide I<edg_wl_ui.conf.CMS_XYZ> and I<edg_wl_ui_cmd_var.conf.CMS_XYZ> for EDG RB, or I<glite.conf.CMS_XYZ> for glite WMS. These files are searched for in the current working directory, and, if not found, on crab web page. So, if you put your private configuration files in the working directory, they will be used, even if they are not available on crab web page.
605 slacapra 1.29 Please get in contact with crab team if you wish to provide your RB or WMS as a service to the CMS community.
606 slacapra 1.6
607 slacapra 1.14 =item B<proxy_server>
608    
609     The proxy server to which you delegate the responsibility to renew your proxy once expired. The default is I<myproxy.cern.ch> : change only if you B<really> know what you are doing.
610    
611 slacapra 1.26 =item B<role>
612    
613     The role to be set in the VOMS. See VOMS documentation for more info.
614    
615 slacapra 1.27 =item B<group>
616    
617     The group to be set in the VOMS, See VOMS documentation for more info.
618    
619 slacapra 1.28 =item B<dont_check_proxy>
620    
621 ewv 1.52 If you do not want CRAB to check your proxy. The creation of the proxy (with proper length), its delegation to a myproxyserver is your responsibility.
622 slacapra 1.28
623 slacapra 1.6 =item B<requirements>
624    
625     Any other requirements to be add to JDL. Must be written in compliance with JDL syntax (see LCG user manual for further info). No requirement on Computing element must be set.
626    
627 slacapra 1.27 =item B<additional_jdl_parameters:>
628    
629 spiga 1.48 Any other parameters you want to add to jdl file:semicolon separated list, each
630 ewv 1.44 item B<must> be complete, including the closing ";".
631 spiga 1.48
632     =item B<wms_service>
633    
634 fanzago 1.71 With this field it is also possible to specify which WMS you want to use (https://hostname:port/pathcode) where "hostname" is WMS name, the "port" generally is 7443 and the "pathcode" should be something like "glite_wms_wmproxy_server".
635 slacapra 1.27
636 slacapra 1.6 =item B<max_cpu_time>
637    
638     Maximum CPU time needed to finish one job. It will be used to select a suitable queue on the CE. Time in minutes.
639    
640     =item B<max_wall_clock_time>
641    
642     Same as previous, but with real time, and not CPU one.
643    
644 spiga 1.88 =item B<ce_black_list>
645 slacapra 1.6
646 ewv 1.66 All the CE (Computing Element) whose name contains the following strings (comma separated list) will not be considered for submission. Use the dns domain (e.g. fnal, cern, ifae, fzk, cnaf, lnl,....). You may use hostnames or CMS Site names (T2_DE_DESY) or substrings.
647 slacapra 1.6
648 spiga 1.88 =item B<ce_white_list>
649 slacapra 1.6
650 ewv 1.66 Only the CE (Computing Element) whose name contains the following strings (comma separated list) will be considered for submission. Use the dns domain (e.g. fnal, cern, ifae, fzk, cnaf, lnl,....). You may use hostnames or CMS Site names (T2_DE_DESY) or substrings. Please note that if the selected CE(s) does not contain the data you want to access, no submission can take place.
651 slacapra 1.27
652 spiga 1.88 =item B<se_black_list>
653 slacapra 1.27
654 ewv 1.66 All the SE (Storage Element) whose name contains the following strings (comma separated list) will not be considered for submission.It works only if a datasetpath is specified. You may use hostnames or CMS Site names (T2_DE_DESY) or substrings.
655 slacapra 1.27
656 spiga 1.88 =item B<se_white_list>
657 slacapra 1.27
658 ewv 1.66 Only the SE (Storage Element) whose name contains the following strings (comma separated list) will be considered for submission.It works only if a datasetpath is specified. Please note that if the selected CE(s) does not contain the data you want to access, no submission can take place. You may use hostnames or CMS Site names (T2_DE_DESY) or substrings.
659 slacapra 1.6
660 spiga 1.73 =item B<remove_default_blacklist>
661    
662 ewv 1.78 CRAB enforce the T1s Computing Eelements Black List. By default it is appended to the user defined I<CE_black_list>. To remove the enforced T1 black lists set I<remove_default_blacklist>=1.
663 spiga 1.73
664 slacapra 1.6 =item B<virtual_organization>
665    
666 spiga 1.94 You do not want to change this: it is cms!
667 slacapra 1.6
668     =item B<retry_count>
669    
670 fanzago 1.37 Number of time the Grid will try to resubmit your job in case of Grid related problem.
671 slacapra 1.6
672 slacapra 1.27 =item B<shallow_retry_count>
673    
674 fanzago 1.37 Number of time shallow resubmission the Grid will try: resubmissions are tried B<only> if the job aborted B<before> start. So you are guaranteed that your jobs run strictly once.
675 slacapra 1.27
676 slacapra 1.30 =item B<maxtarballsize>
677    
678     Maximum size of tar-ball in Mb. If bigger, an error will be generated. The actual limit is that on the RB input sandbox. Default is 9.5 Mb (sandbox limit is 10 Mb)
679    
680 spiga 1.55 =item B<skipwmsauth>
681    
682 ewv 1.64 Temporary useful parameter to allow the WMSAuthorisation handling. Specifying I<skipwmsauth> = 1 the pyopenssl problmes will disappear. It is needed working on gLite UI outside of CERN.
683 spiga 1.55
684 slacapra 1.6 =back
685    
686 spiga 1.55 B<[LSF]> or B<[CAF]>
687 slacapra 1.46
688     =over 4
689    
690     =item B<queue>
691    
692 ewv 1.52 The LSF queue you want to use: if none, the default one will be used. For CAF, the proper queue will be automatically selected.
693 slacapra 1.46
694     =item B<resource>
695    
696     The resources to be used within a LSF queue. Again, for CAF, the right one is selected.
697    
698 spiga 1.55 =item B<copyCommand>
699    
700     To define the command to be used to copy both Input and Output sandboxes to final location. Default is cp
701 slacapra 1.46
702     =back
703    
704 nsmirnov 1.1 =head1 FILES
705    
706 slacapra 1.6 I<crab> uses a configuration file I<crab.cfg> which contains configuration parameters. This file is written in the INI-style. The default filename can be changed by the I<-cfg> option.
707 nsmirnov 1.1
708 slacapra 1.6 I<crab> creates by default a working directory 'crab_0_E<lt>dateE<gt>_E<lt>timeE<gt>'
709 nsmirnov 1.1
710     I<crab> saves all command lines in the file I<crab.history>.
711    
712     =head1 HISTORY
713    
714 ewv 1.52 B<CRAB> is a tool for the CMS analysis on the Grid environment. It is based on the ideas from CMSprod, a production tool originally implemented by Nikolai Smirnov.
715 nsmirnov 1.1
716     =head1 AUTHORS
717    
718     """
719     author_string = '\n'
720     for auth in common.prog_authors:
721     #author = auth[0] + ' (' + auth[2] + ')' + ' E<lt>'+auth[1]+'E<gt>,\n'
722     author = auth[0] + ' E<lt>' + auth[1] +'E<gt>,\n'
723     author_string = author_string + author
724     pass
725     help_string = help_string + author_string[:-2] + '.'\
726     """
727    
728     =cut
729 slacapra 1.19 """
730 nsmirnov 1.1
731     pod = tempfile.mktemp()+'.pod'
732     pod_file = open(pod, 'w')
733     pod_file.write(help_string)
734     pod_file.close()
735    
736     if option == 'man':
737     man = tempfile.mktemp()
738     pod2man = 'pod2man --center=" " --release=" " '+pod+' >'+man
739     os.system(pod2man)
740     os.system('man '+man)
741     pass
742     elif option == 'tex':
743     fname = common.prog_name+'-v'+common.prog_version_str
744     tex0 = tempfile.mktemp()+'.tex'
745     pod2tex = 'pod2latex -full -out '+tex0+' '+pod
746     os.system(pod2tex)
747     tex = fname+'.tex'
748     tex_old = open(tex0, 'r')
749     tex_new = open(tex, 'w')
750     for s in tex_old.readlines():
751     if string.find(s, '\\begin{document}') >= 0:
752     tex_new.write('\\title{'+common.prog_name+'\\\\'+
753     '(Version '+common.prog_version_str+')}\n')
754     tex_new.write('\\author{\n')
755     for auth in common.prog_authors:
756     tex_new.write(' '+auth[0]+
757     '\\thanks{'+auth[1]+'} \\\\\n')
758     tex_new.write('}\n')
759     tex_new.write('\\date{}\n')
760     elif string.find(s, '\\tableofcontents') >= 0:
761     tex_new.write('\\maketitle\n')
762     continue
763     elif string.find(s, '\\clearpage') >= 0:
764     continue
765     tex_new.write(s)
766     tex_old.close()
767     tex_new.close()
768     print 'See '+tex
769     pass
770     elif option == 'html':
771     fname = common.prog_name+'-v'+common.prog_version_str+'.html'
772     pod2html = 'pod2html --title='+common.prog_name+\
773     ' --infile='+pod+' --outfile='+fname
774     os.system(pod2html)
775     print 'See '+fname
776     pass
777 slacapra 1.33 elif option == 'txt':
778     fname = common.prog_name+'-v'+common.prog_version_str+'.txt'
779     pod2text = 'pod2text '+pod+' '+fname
780     os.system(pod2text)
781     print 'See '+fname
782     pass
783 nsmirnov 1.1
784     sys.exit(0)