ViewVC Help
View File | Revision Log | Show Annotations | Root Listing
root/cvsroot/COMP/CRAB/python/crab_help.py
Revision: 1.113
Committed: Wed Sep 2 15:17:56 2009 UTC (15 years, 8 months ago) by mcinquil
Content type: text/x-python
Branch: MAIN
CVS Tags: CRAB_2_7_0_pre1
Changes since 1.112: +0 -4 lines
Log Message:
Removing old copyCommand definition from help

File Contents

# User Rev Content
1 nsmirnov 1.1
2     ###########################################################################
3     #
4     # H E L P F U N C T I O N S
5     #
6     ###########################################################################
7    
8     import common
9    
10     import sys, os, string
11 spiga 1.34
12 nsmirnov 1.1 import tempfile
13    
14     ###########################################################################
15     def usage():
16 slacapra 1.43 print 'in usage()'
17 nsmirnov 1.1 usa_string = common.prog_name + """ [options]
18 slacapra 1.3
19     The most useful general options (use '-h' to get complete help):
20    
21 spiga 1.100 -create -- Create all the jobs.
22     -submit n -- Submit the first n available jobs. Default is all.
23 slacapra 1.102 -status -- check status of all jobs.
24 spiga 1.100 -getoutput|-get [range] -- get back the output of all jobs: if range is defined, only of selected jobs.
25     -extend -- Extend an existing task to run on new fileblocks if there.
26     -publish -- after the getouput, publish the data user in a local DBS instance.
27 ewv 1.104 -checkPublication [dbs_url datasetpath] -- checks if a dataset is published in a DBS.
28 spiga 1.100 -kill [range] -- kill submitted jobs.
29     -resubmit [range] -- resubmit killed/aborted/retrieved jobs.
30     -copyData [range] -- copy locally the output stored on remote SE.
31     -renewCredential -- renew credential on the server.
32     -clean -- gracefully cleanup the directory of a task.
33     -match|-testJdl [range] -- check if resources exist which are compatible with jdl.
34     -report -- print a short report about the task
35     -list [range] -- show technical job details.
36     -postMortem [range] -- provide a file with information useful for post-mortem analysis of the jobs.
37     -printId [range] -- print the job SID or Task Unique ID while using the server.
38     -createJdl [range] -- provide files with a complete Job Description (JDL).
39     -validateCfg [fname] -- parse the ParameterSet using the framework's Python API.
40     -continue|-c [dir] -- Apply command to task stored in [dir].
41     -h [format] -- Detailed help. Formats: man (default), tex, html, txt.
42     -cfg fname -- Configuration file name. Default is 'crab.cfg'.
43     -debug N -- set the verbosity level to N.
44     -v -- Print version and exit.
45 nsmirnov 1.1
46 slacapra 1.4 "range" has syntax "n,m,l-p" which correspond to [n,m,l,l+1,...,p-1,p] and all possible combination
47    
48 nsmirnov 1.1 Example:
49 slacapra 1.26 crab -create -submit 1
50 nsmirnov 1.1 """
51 slacapra 1.43 print usa_string
52 nsmirnov 1.1 sys.exit(2)
53    
54     ###########################################################################
55     def help(option='man'):
56     help_string = """
57     =pod
58    
59     =head1 NAME
60    
61     B<CRAB>: B<C>ms B<R>emote B<A>nalysis B<B>uilder
62    
63 slacapra 1.3 """+common.prog_name+""" version: """+common.prog_version_str+"""
64 nsmirnov 1.1
65 slacapra 1.19 This tool B<must> be used from an User Interface and the user is supposed to
66 fanzago 1.37 have a valid Grid certificate.
67 nsmirnov 1.1
68     =head1 SYNOPSIS
69    
70 slacapra 1.13 B<"""+common.prog_name+"""> [I<options>] [I<command>]
71 nsmirnov 1.1
72     =head1 DESCRIPTION
73    
74 ewv 1.52 CRAB is a Python program intended to simplify the process of creation and submission of CMS analysis jobs to the Grid environment .
75 nsmirnov 1.1
76 slacapra 1.3 Parameters for CRAB usage and configuration are provided by the user changing the configuration file B<crab.cfg>.
77 nsmirnov 1.1
78 spiga 1.48 CRAB generates scripts and additional data files for each job. The produced scripts are submitted directly to the Grid. CRAB makes use of BossLite to interface to the Grid scheduler, as well as for logging and bookkeeping.
79 nsmirnov 1.1
80 ewv 1.52 CRAB supports any CMSSW based executable, with any modules/libraries, including user provided ones, and deals with the output produced by the executable. CRAB provides an interface to CMS data discovery services (DBS and DLS), which are completely hidden to the final user. It also splits a task (such as analyzing a whole dataset) into smaller jobs, according to user requirements.
81 nsmirnov 1.1
82 slacapra 1.46 CRAB can be used in two ways: StandAlone and with a Server.
83     The StandAlone mode is suited for small task, of the order of O(100) jobs: it submits the jobs directly to the scheduler, and these jobs are under user responsibility.
84 ewv 1.52 In the Server mode, suited for larger tasks, the jobs are prepared locally and then passed to a dedicated CRAB server, which then interacts with the scheduler on behalf of the user, including additional services, such as automatic resubmission, status caching, output retrieval, and more.
85 slacapra 1.46 The CRAB commands are exactly the same in both cases.
86    
87 slacapra 1.13 CRAB web page is available at
88    
89 spiga 1.94 I<https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideCrab>
90 slacapra 1.6
91 slacapra 1.19 =head1 HOW TO RUN CRAB FOR THE IMPATIENT USER
92    
93 ewv 1.52 Please, read all the way through in any case!
94 slacapra 1.19
95     Source B<crab.(c)sh> from the CRAB installation area, which have been setup either by you or by someone else for you.
96    
97 ewv 1.52 Modify the CRAB configuration file B<crab.cfg> according to your need: see below for a complete list. A template and commented B<crab.cfg> can be found on B<$CRABDIR/python/crab.cfg>
98 slacapra 1.19
99 ewv 1.44 ~>crab -create
100 slacapra 1.19 create all jobs (no submission!)
101    
102 spiga 1.25 ~>crab -submit 2 -continue [ui_working_dir]
103 slacapra 1.19 submit 2 jobs, the ones already created (-continue)
104    
105 slacapra 1.26 ~>crab -create -submit 2
106 slacapra 1.19 create _and_ submit 2 jobs
107    
108 spiga 1.25 ~>crab -status
109 slacapra 1.19 check the status of all jobs
110    
111 spiga 1.25 ~>crab -getoutput
112 slacapra 1.19 get back the output of all jobs
113    
114 ewv 1.44 ~>crab -publish
115     publish all user outputs in the DBS specified in the crab.cfg (dbs_url_for_publication) or written as argument of this option
116 fanzago 1.42
117 slacapra 1.20 =head1 RUNNING CMSSW WITH CRAB
118 nsmirnov 1.1
119 slacapra 1.3 =over 4
120    
121     =item B<A)>
122    
123 ewv 1.52 Develop your code in your CMSSW working area. Do anything which is needed to run interactively your executable, including the setup of run time environment (I<eval `scramv1 runtime -sh|csh`>), a suitable I<ParameterSet>, etc. It seems silly, but B<be extra sure that you actually did compile your code> I<scramv1 b>.
124 slacapra 1.3
125 ewv 1.44 =item B<B)>
126 slacapra 1.3
127 slacapra 1.20 Source B<crab.(c)sh> from the CRAB installation area, which have been setup either by you or by someone else for you. Modify the CRAB configuration file B<crab.cfg> according to your need: see below for a complete list.
128    
129     The most important parameters are the following (see below for complete description of each parameter):
130    
131     =item B<Mandatory!>
132    
133     =over 6
134    
135     =item B<[CMSSW]> section: datasetpath, pset, splitting parameters, output_file
136    
137     =item B<[USER]> section: output handling parameters, such as return_data, copy_data etc...
138    
139     =back
140    
141     =item B<Run it!>
142    
143 fanzago 1.37 You must have a valid voms-enabled Grid proxy. See CRAB web page for details.
144 slacapra 1.20
145     =back
146    
147 spiga 1.94 =head1 RUNNING MULTICRAB
148    
149 ewv 1.98 MultiCRAB is a CRAB extension to submit the same job to multiple datasets in one go.
150 spiga 1.94
151 ewv 1.98 The use case for multicrab is when you have your analysis code that you want to run on several datasets, typically some signals plus some backgrounds (for MC studies)
152 spiga 1.94 or on different streams/configuration/runs for real data taking. You want to run exactly the same code, and also the crab.cfg are different only for few keys:
153 ewv 1.98 for sure datasetpath but also other keys, such as eg total_number_of_events, in case you want to run on all signals but only a fraction of background, or anything else.
154 spiga 1.94 So far, you would have to create a set of crab.cfg, one for each dataset you want to access, and submit several instances of CRAB, saving the output to different locations.
155     Multicrab is meant to automatize this procedure.
156     In addition to the usual crab.cfg, there is a new configuration file called multicrab.cfg. The syntax is very similar to that of crab.cfg, namely
157     [SECTION] <crab.cfg Section>.Key=Value
158    
159     Please note that it is mandatory to add explicitly the crab.cfg [SECTION] in front of [KEY].
160     The role of multicrab.cfg is to apply modification to the template crab.cfg, some which are common to all tasks, and some which are task specific.
161    
162     =head2 So there are two sections:
163    
164     =over 2
165    
166 ewv 1.98 =item B<[COMMON]>
167 spiga 1.94
168     section: which applies to all task, and which is fully equivalent to modify directly the template crab.cfg
169    
170 ewv 1.98 =item B<[DATASET]>
171 spiga 1.94
172 ewv 1.98 section: there could be an arbitrary number of sections, one for each dataset you want to run. The names are free (but COMMON and MULTICRAB), and they will be used as ui_working_dir for the task as well as an appendix to the user_remote_dir in case of output copy to remote SE. So, the task corresponding to section, say [SIGNAL] will be placed in directory SIGNAL, and the output will be put on /SIGNAL/, so SIGNAL will be added as last subdir in the user_remote_dir.
173 spiga 1.94
174     =back
175    
176     For further details please visit
177    
178     I<https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideMultiCrab>
179    
180 slacapra 1.19 =head1 HOW TO RUN ON CONDOR-G
181    
182     The B<Condor-G> mode for B<CRAB> is a special submission mode next to the standard Resource Broker submission. It is designed to submit jobs directly to a site and not using the Resource Broker.
183    
184 ewv 1.52 Due to the nature of B<Condor-G> submission, the B<Condor-G> mode is restricted to OSG sites within the CMS Grid, currently the 7 US T2: Florida(ufl.edu), Nebraska(unl.edu), San Diego(ucsd.edu), Purdue(purdue.edu), Wisconsin(wisc.edu), Caltech(ultralight.org), MIT(mit.edu).
185 slacapra 1.19
186     =head2 B<Requirements:>
187    
188     =over 2
189    
190     =item installed and running local Condor scheduler
191    
192     (either installed by the local Sysadmin or self-installed using the VDT user interface: http://www.uscms.org/SoftwareComputing/UserComputing/Tutorials/vdt.html)
193    
194     =item locally available LCG or OSG UI installation
195    
196 ewv 1.44 for authentication via Grid certificate proxies ("voms-proxy-init -voms cms" should result in valid proxy)
197 slacapra 1.19
198 spiga 1.96 =item set the environment variable GRID_WL_LOCATION to the edg directory of the local LCG or OSG UI installation
199 slacapra 1.19
200     =back
201    
202     =head2 B<What the Condor-G mode can do:>
203    
204     =over 2
205    
206 ewv 1.52 =item submission directly to multiple OSG sites,
207 slacapra 1.19
208 ewv 1.52 the requested dataset must be published correctly by the site in the local and global services.
209     Previous restrictions on submitting only to a single site have been removed. SE and CE whitelisting
210     and blacklisting work as in the other modes.
211 slacapra 1.19
212     =back
213    
214     =head2 B<What the Condor-G mode cannot do:>
215    
216     =over 2
217    
218     =item submit jobs if no condor scheduler is running on the submission machine
219    
220     =item submit jobs if the local condor installation does not provide Condor-G capabilities
221    
222 ewv 1.52 =item submit jobs to an LCG site
223 slacapra 1.19
224 fanzago 1.37 =item support Grid certificate proxy renewal via the myproxy service
225 slacapra 1.19
226     =back
227    
228     =head2 B<CRAB configuration for Condor-G mode:>
229    
230 ewv 1.52 The CRAB configuration for the Condor-G mode only requires one change in crab.cfg:
231 nsmirnov 1.1
232 slacapra 1.19 =over 2
233 slacapra 1.3
234 slacapra 1.19 =item select condor_g Scheduler:
235 slacapra 1.4
236 slacapra 1.19 scheduler = condor_g
237 slacapra 1.4
238 slacapra 1.19 =back
239 slacapra 1.4
240 ewv 1.52 =head1 COMMANDS
241 slacapra 1.4
242     =over 4
243    
244 slacapra 1.26 =item B<-create>
245 slacapra 1.4
246 slacapra 1.26 Create the jobs: from version 1_3_0 it is only possible to create all jobs.
247 ewv 1.52 The maximum number of jobs depends on dataset and splitting directives. This set of identical jobs accessing the same dataset are defined as a task.
248 slacapra 1.4 This command create a directory with default name is I<crab_0_date_time> (can be changed via ui_working_dir parameter, see below). Inside this directory it is placed whatever is needed to submit your jobs. Also the output of your jobs (once finished) will be place there (see after). Do not cancel by hand this directory: rather use -clean (see).
249     See also I<-continue>.
250    
251 slacapra 1.46 =item B<-submit [range]>
252 slacapra 1.4
253 ewv 1.98 Submit n jobs: 'n' is either a positive integer or 'all' or a [range]. The default is all.
254     If 'n' is passed as an argument, the first 'n' suitable jobs will be submitted. Please note that this is behaviour is different from other commands, where -command N means act the command to the job N, and not to the first N jobs. If a [range] is passed, the selected jobs will be submitted.
255     This option may be used in conjunction with -create (to create and submit immediately) or with -continue (which is assumed by default) to submit previously created jobs. Failure to do so will stop CRAB and generate an error message. See also I<-continue>.
256 slacapra 1.4
257     =item B<-continue [dir] | -c [dir]>
258    
259 ewv 1.98 Apply the action on the task stored in directory [dir]. If the task directory is the standard one (crab_0_date_time), the most recent in time is assumed. Any other directory must be specified.
260     Basically all commands (except -create) need -continue, so it is automatically assumed. Of course, the standard task directory is used in this case.
261 slacapra 1.4
262 slacapra 1.102 =item B<-status [v|verbose]>
263 nsmirnov 1.1
264 slacapra 1.102 Check the status of the jobs, in all states. With the server, the full status, including application and wrapper exit codes, is available as soon as the jobs end. In StandAlone mode it is necessary to retrieve (-get) the job output first. With B<v|verbose> some more information is displayed.
265 nsmirnov 1.1
266 slacapra 1.20 =item B<-getoutput|-get [range]>
267 nsmirnov 1.1
268 slacapra 1.102 Retrieve the output declared by the user via the output sandbox. By default the output will be put in task working dir under I<res> subdirectory. This can be changed via config parameters. B<Be extra sure that you have enough free space>. From version 2_3_x, the available free space is checked in advance. See I<range> below for syntax.
269 nsmirnov 1.1
270 spiga 1.100 =item B<-publish>
271 fanzago 1.42
272 ewv 1.98 Publish user output in a local DBS instance after the retrieval of output. By default publish uses the dbs_url_for_publication specified in the crab.cfg file, otherwise you can supply it as an argument of this option.
273 fanzago 1.42
274 fanzago 1.97 =item B<-checkPublication [-USER.dbs_url_for_publication=dbs_url -USER.dataset_to_check=datasetpath -debug]>
275    
276 ewv 1.98 Check if a dataset is published in a DBS. This option is automaticaly called at the end of the publication step, but it can be also used as a standalone command. By default it reads the parameters (USER.dbs_url_for_publication and USER.dataset_to_check) in your crab.cfg. You can overwrite the defaults in crab.cfg by passing these parameters as option. Using the -debug option, you will get detailed info about the files of published blocks.
277 fanzago 1.97
278 slacapra 1.4 =item B<-resubmit [range]>
279 nsmirnov 1.1
280 fanzago 1.37 Resubmit jobs which have been previously submitted and have been either I<killed> or are I<aborted>. See I<range> below for syntax.
281 nsmirnov 1.1
282 spiga 1.60 =item B<-extend>
283    
284 ewv 1.64 Create new jobs for an existing task, checking if new blocks are available for the given dataset.
285 spiga 1.60
286 slacapra 1.4 =item B<-kill [range]>
287 nsmirnov 1.1
288 slacapra 1.4 Kill (cancel) jobs which have been submitted to the scheduler. A range B<must> be used in all cases, no default value is set.
289 nsmirnov 1.1
290 spiga 1.74 =item B<-copyData [range]>
291 slacapra 1.58
292 ewv 1.78 Copy locally (on current working directory) the output previously stored on remote SE by the jobs. Of course, only if copy_data option has been set.
293 slacapra 1.58
294 spiga 1.80 =item B<-renewCredential >
295 mcinquil 1.59
296 spiga 1.80 If using the server modality, this command allows to delegate a valid credential (proxy/token) to the server associated with the task.
297 mcinquil 1.59
298 spiga 1.85 =item B<-match|-testJdl [range]>
299 nsmirnov 1.1
300 fanzago 1.71 Check if the job can find compatible resources. It is equivalent of doing I<edg-job-list-match> on edg.
301 nsmirnov 1.1
302 slacapra 1.20 =item B<-printId [range]>
303    
304 slacapra 1.82 Just print the job identifier, which can be the SID (Grid job identifier) of the job(s) or the taskId if you are using CRAB with the server or local scheduler Id. If [range] is "full", the the SID of all the jobs are printed, also in the case of submission with server.
305 slacapra 1.20
306 spiga 1.53 =item B<-printJdl [range]>
307    
308 ewv 1.64 Collect the full Job Description in a file located under share directory. The file base name is File- .
309 spiga 1.53
310 slacapra 1.4 =item B<-postMortem [range]>
311 nsmirnov 1.1
312 slacapra 1.46 Try to collect more information of the job from the scheduler point of view.
313 nsmirnov 1.1
314 slacapra 1.13 =item B<-list [range]>
315    
316 ewv 1.52 Dump technical information about jobs: for developers only.
317 slacapra 1.13
318 slacapra 1.89 =item B<-report>
319    
320     Print a short report about the task, namely the total number of events and files processed/requested/available, the name of the datasetpath, a summary of the status of the jobs, the list of runs and lumi sections, and so on. In principle it should contain all the info needed for analysis. Work in progress.
321    
322 slacapra 1.4 =item B<-clean [dir]>
323 nsmirnov 1.1
324 slacapra 1.26 Clean up (i.e. erase) the task working directory after a check whether there are still running jobs. In case, you are notified and asked to kill them or retrieve their output. B<Warning> this will possibly delete also the output produced by the task (if any)!
325 nsmirnov 1.1
326 calloni 1.110 =item B<-refreshCache>
327    
328 ewv 1.112 Clean up (i.e. erase) the SiteDb, WMS and CrabServer caches in your submitting directory
329 calloni 1.110
330 slacapra 1.4 =item B<-help [format] | -h [format]>
331 nsmirnov 1.1
332 slacapra 1.4 This help. It can be produced in three different I<format>: I<man> (default), I<tex> and I<html>.
333 nsmirnov 1.1
334 slacapra 1.4 =item B<-v>
335 nsmirnov 1.1
336 slacapra 1.4 Print the version and exit.
337 nsmirnov 1.1
338 slacapra 1.4 =item B<range>
339 nsmirnov 1.1
340 slacapra 1.13 The range to be used in many of the above commands has the following syntax. It is a comma separated list of jobs ranges, each of which may be a job number, or a job range of the form first-last.
341 slacapra 1.4 Example: 1,3-5,8 = {1,3,4,5,8}
342 nsmirnov 1.1
343 ewv 1.44 =back
344 slacapra 1.6
345 slacapra 1.4 =head1 OPTION
346 nsmirnov 1.1
347 slacapra 1.6 =over 4
348    
349 slacapra 1.4 =item B<-cfg [file]>
350 nsmirnov 1.1
351 slacapra 1.4 Configuration file name. Default is B<crab.cfg>.
352 nsmirnov 1.1
353 slacapra 1.4 =item B<-debug [level]>
354 nsmirnov 1.1
355 slacapra 1.13 Set the debug level: high number for high verbosity.
356 nsmirnov 1.1
357 ewv 1.44 =back
358 slacapra 1.6
359 slacapra 1.5 =head1 CONFIGURATION PARAMETERS
360    
361 spiga 1.25 All the parameter describe in this section can be defined in the CRAB configuration file. The configuration file has different sections: [CRAB], [USER], etc. Each parameter must be defined in its proper section. An alternative way to pass a config parameter to CRAB is via command line interface; the syntax is: crab -SECTION.key value . For example I<crab -USER.outputdir MyDirWithFullPath> .
362 slacapra 1.5 The parameters passed to CRAB at the creation step are stored, so they cannot be changed by changing the original crab.cfg . On the other hand the task is protected from any accidental change. If you want to change any parameters, this require the creation of a new task.
363 slacapra 1.6 Mandatory parameters are flagged with a *.
364 slacapra 1.5
365     B<[CRAB]>
366 slacapra 1.6
367 slacapra 1.13 =over 4
368 slacapra 1.5
369 slacapra 1.6 =item B<jobtype *>
370 slacapra 1.5
371 slacapra 1.26 The type of the job to be executed: I<cmssw> jobtypes are supported
372 slacapra 1.6
373     =item B<scheduler *>
374    
375 ewv 1.52 The scheduler to be used: I<glitecoll> is the more efficient grid scheduler and should be used. Other choice are I<glite>, same as I<glitecoll> but without bulk submission (and so slower) or I<condor_g> (see specific paragraph) or I<edg> which is the former Grid scheduler, which will be dismissed in some future
376     From version 210, also local scheduler are supported, for the time being only at CERN. I<LSF> is the standard CERN local scheduler or I<CAF> which is LSF dedicated to CERN Analysis Facilities.
377 slacapra 1.5
378 slacapra 1.81 =item B<use_server>
379    
380     To use the server for job handling (recommended) 0=no (default), 1=true. The server to be used will be found automatically from a list of available ones: it can also be specified explicitly by using I<server_name> (see below)
381    
382 mcinquil 1.35 =item B<server_name>
383    
384 slacapra 1.81 To use the CRAB-server support it is needed to fill this key with server name as <Server_DOMAIN> (e.g. cnaf,fnal). If this is set, I<use_server> is set to true automatically.
385     If I<server_name=None> crab works in standalone way, same as using I<use_server=0> and no I<server_name>.
386 spiga 1.48 The server available to users can be found from CRAB web page.
387 mcinquil 1.35
388 slacapra 1.5 =back
389    
390 slacapra 1.20 B<[CMSSW]>
391    
392     =over 4
393    
394 slacapra 1.22 =item B<datasetpath *>
395 slacapra 1.20
396 ewv 1.108 The path of the processed or analysis dataset as defined in DBS. It comes with the format I</PrimaryDataset/DataTier/Process[/OptionalADS]>. If no input is needed I<None> must be specified. When running on an analysis dataset, the job splitting must be specified by luminosity block rather than event. Analysis datasets are only treated accurately on a lumi-by-lumi level with CMSSW 3_1_x and later.
397 spiga 1.90
398 afanfani 1.50 =item B<runselection *>
399 ewv 1.52
400 ewv 1.108 Within a dataset you can restrict to run on a specific run number or run number range. For example runselection=XYZ or runselection=XYZ1-XYZ2 .
401 afanfani 1.50
402 spiga 1.57 =item B<use_parent *>
403    
404 ewv 1.108 Within a dataset you can ask to run over the related parent files too. E.g., this will give you access to the RAW data while running over a RECO sample. Setting use_parent=1 CRAB determines the parent files from DBS and will add secondaryFileNames = cms.untracked.vstring( <LIST of parent FIles> ) to the pool source section of your parameter set.
405 spiga 1.57
406 slacapra 1.22 =item B<pset *>
407 slacapra 1.20
408 ewv 1.112 The python ParameterSet to be used.
409 slacapra 1.20
410 ewv 1.111 =item B<pycfg_params *>
411    
412     These parameters are passed to the python config file, as explained in https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideAboutPythonConfigFile#Passing_Command_Line_Arguments_T
413    
414 slacapra 1.26 =item I<Of the following three parameter exactly two must be used, otherwise CRAB will complain.>
415 slacapra 1.20
416 slacapra 1.22 =item B<total_number_of_events *>
417    
418 ewv 1.108 The number of events to be processed. To access all available events, use I<-1>. Of course, the latter option is not viable in case of no input. In this case, the total number of events will be used to split the task in jobs, together with I<events_per_job>.
419 slacapra 1.22
420 slacapra 1.26 =item B<events_per_job*>
421 slacapra 1.22
422 ewv 1.108 The number of events to be accessed by each job. Since a job cannot cross the boundary of a fileblock it might be that the actual number of events per job is not exactly what you asked for. It can be used also with no input.
423    
424     =item B<total_number_of_lumis *>
425    
426     The number of luminosity blocks to be processed. This option is only valid when using analysis datasets. Since a job cannot access less than a whole file, it may be that the actual number of lumis per job is more than you asked for. Two of I<total_number_of_lumis>, I<lumis_per_job>, and I<number_of_jobs> must be supplied to run on an analysis dataset.
427    
428     =item B<lumis_per_job*>
429    
430     The number of luminosity blocks to be accessed by each job. This option is only valid when using analysis datasets. Since a job cannot access less than a whole file, it may be that the actual number of lumis per job is more than you asked for.
431 slacapra 1.22
432     =item B<number_of_jobs *>
433    
434 ewv 1.108 Define the number of jobs to be run for the task. The number of event for each job is computed taking into account the total number of events required as well as the granularity of EventCollections. Can be used also with No input.
435 slacapra 1.22
436 spiga 1.90 =item B<split_by_run *>
437    
438 ewv 1.108 To activate the split run based (each job will access a different run) use I<split_by_run>=1. You can also define I<number_of_jobs> and/or I<runselection>. NOTE: the Run Based combined with Event Based split is not yet available.
439 spiga 1.90
440 slacapra 1.22 =item B<output_file *>
441    
442 ewv 1.108 The output files produced by your application (comma separated list). From CRAB 2_2_2 onward, if TFileService is defined in user Pset, the corresponding output file is automatically added to the list of output files. User can avoid this by setting B<skip_TFileService_output> = 1 (default is 0 == file included). The Edm output produced via PoolOutputModule can be automatically added by setting B<get_edm_output> = 1 (default is 0 == no). B<warning> it is not allowed to have a PoolOutputSource and not save it somewhere, since it is a waste of resource on the WN. In case you really want to do that, and if you really know what you are doing (hint: you dont!) you can user I<ignore_edm_output=1>.
443 slacapra 1.61
444     =item B<skip_TFileService_output>
445    
446     Force CRAB to skip the inclusion of file produced by TFileService to list of output files. Default is I<0>, namely the file is included.
447 slacapra 1.20
448 slacapra 1.63 =item B<get_edm_output>
449    
450     Force CRAB to add the EDM output file, as defined in PSET in PoolOutputModule (if any) to be added to the list of output files. Default is 0 (== no inclusion)
451    
452 ewv 1.47 =item B<increment_seeds>
453    
454     Specifies a comma separated list of seeds to increment from job to job. The initial value is taken
455     from the CMSSW config file. I<increment_seeds=sourceSeed,g4SimHits> will set sourceSeed=11,12,13 and g4SimHits=21,22,23 on
456     subsequent jobs if the values of the two seeds are 10 and 20 in the CMSSW config file.
457    
458     See also I<preserve_seeds>. Seeds not listed in I<increment_seeds> or I<preserve_seeds> are randomly set for each job.
459    
460     =item B<preserve_seeds>
461    
462 ewv 1.78 Specifies a comma separated list of seeds to which CRAB will not change from their values in the user
463 ewv 1.47 CMSSW config file. I<preserve_seeds=sourceSeed,g4SimHits> will leave the Pythia and GEANT seeds the same for every job.
464    
465     See also I<increment_seeds>. Seeds not listed in I<increment_seeds> or I<preserve_seeds> are randomly set for each job.
466    
467 slacapra 1.30 =item B<first_run>
468    
469     First run to be generated in a generation jobs. Relevant only for no-input workflow.
470    
471 ewv 1.78 =item B<generator>
472 ewv 1.79
473     Name of the generator your MC job is using. Some generators require CRAB to skip events, others do not.
474 ewv 1.104 Possible values are pythia (default), comphep, lhe, and madgraph. This will skip events in your generator input file.
475 ewv 1.78
476 slacapra 1.31 =item B<executable>
477 slacapra 1.30
478 slacapra 1.31 The name of the executable to be run on remote WN. The default is cmsrun. The executable is either to be found on the release area of the WN, or has been built on user working area on the UI and is (automatically) shipped to WN. If you want to run a script (which might internally call I<cmsrun>, use B<USER.script_exe> instead.
479 slacapra 1.30
480     =item I<DBS and DLS parameters:>
481    
482 slacapra 1.26 =item B<dbs_url>
483 slacapra 1.6
484 slacapra 1.40 The URL of the DBS query page. For expert only.
485 slacapra 1.13
486 spiga 1.84 =item B<show_prod>
487    
488 ewv 1.98 To enable CRAB to show data hosted on Tier1s sites specify I<show_prod> = 1. By default those data are masked.
489 spiga 1.86
490     =item B<no_block_boundary>
491    
492 ewv 1.98 To remove fileblock boundaries in job splitting specify I<no_block_boundary> = 1.
493 spiga 1.84
494 slacapra 1.13 =back
495    
496     B<[USER]>
497    
498     =over 4
499    
500 slacapra 1.6 =item B<additional_input_files>
501    
502 spiga 1.67 Any additional input file you want to ship to WN: comma separated list. IMPORTANT NOTE: they will be placed in the WN working dir, and not in ${CMS_SEARCH_PATH}. Specific files required by CMSSW application must be placed in the local data directory, which will be automatically shipped by CRAB itself. You do not need to specify the I<ParameterSet> you are using, which will be included automatically. Wildcards are allowed.
503 slacapra 1.6
504 slacapra 1.31 =item B<script_exe>
505    
506 ewv 1.112 A user script that will be run on WN (instead of default cmsrun). It is up to the user to setup properly the script itself to run on WN enviroment. CRAB guarantees that the CMSSW environment is setup (e.g. scram is in the path) and that the modified pset.py will be placed in the working directory, with name CMSSW.py . The user must ensure that a job report named crab_fjr.xml will be written. This can be guaranteed by passing the arguments "-j crab_fjr.xml" to cmsRun in the script. The script itself will be added automatically to the input sandbox so user MUST NOT add it within the B<USER.additional_input_files>.
507 slacapra 1.31
508 spiga 1.105 =item B<script_arguments>
509    
510     Any arguments you want to pass to the B<USER.script_exe>: comma separated list.
511    
512 slacapra 1.6 =item B<ui_working_dir>
513    
514 ewv 1.52 Name of the working directory for the current task. By default, a name I<crab_0_(date)_(time)> will be used. If this card is set, any CRAB command which require I<-continue> need to specify also the name of the working directory. A special syntax is also possible, to reuse the name of the dataset provided before: I<ui_working_dir : %(dataset)s> . In this case, if e.g. the dataset is SingleMuon, the ui_working_dir will be set to SingleMuon as well.
515 slacapra 1.6
516 mcinquil 1.35 =item B<thresholdLevel>
517    
518     This has to be a value between 0 and 100, that indicates the percentage of task completeness (jobs in a ended state are complete, even if failed). The server will notify the user by e-mail (look at the field: B<eMail>) when the task will reach the specified threshold. Works just with the server_mode = 1.
519    
520     =item B<eMail>
521    
522 ewv 1.52 The server will notify the specified e-mail when the task will reaches the specified B<thresholdLevel>. A notification is also sent when the task will reach the 100\% of completeness. This field can also be a list of e-mail: "B<eMail = user1@cern.ch, user2@cern.ch>". Works just with the server_mode = 1.
523 mcinquil 1.35
524 slacapra 1.6 =item B<return_data *>
525    
526 ewv 1.52 The output produced by the executable on WN is returned (via output sandbox) to the UI, by issuing the I<-getoutput> command. B<Warning>: this option should be used only for I<small> output, say less than 10MB, since the sandbox cannot accommodate big files. Depending on Resource Broker used, a size limit on output sandbox can be applied: bigger files will be truncated. To be used in alternative to I<copy_data>.
527 slacapra 1.6
528     =item B<outputdir>
529    
530 ewv 1.52 To be used together with I<return_data>. Directory on user interface where to store the output. Full path is mandatory, "~/" is not allowed: the default location of returned output is ui_working_dir/res .
531 slacapra 1.6
532     =item B<logdir>
533    
534 ewv 1.52 To be used together with I<return_data>. Directory on user interface where to store the standard output and error. Full path is mandatory, "~/" is not allowed: the default location of returned output is ui_working_dir/res .
535 slacapra 1.6
536     =item B<copy_data *>
537    
538 ewv 1.52 The output (only that produced by the executable, not the std-out and err) is copied to a Storage Element of your choice (see below). To be used as an alternative to I<return_data> and recommended in case of large output.
539 slacapra 1.6
540     =item B<storage_element>
541    
542 fanzago 1.71 To be used with <copy_data>=1
543     If you want to copy the output of your analysis in a official CMS Tier2 or Tier3, you have to write the CMS Site Name of the site, as written in the SiteDB https://cmsweb.cern.ch/sitedb/reports/showReport?reportid=se_cmsname_map.ini (i.e T2_IT_legnaro). You have also to specify the <remote_dir>(see below)
544    
545 ewv 1.78 If you want to copy the output in a not_official_CMS remote site you have to specify the complete storage element name (i.e se.xxx.infn.it).You have also to specify the <storage_path> and the <storage_port> if you do not use the default one(see below).
546 fanzago 1.71
547     =item B<user_remote_dir>
548    
549     To be used with <copy_data>=1 and <storage_element> official CMS sites.
550 ewv 1.104 This is the directory or tree of directories where your output will be stored. This directory will be created under the mountpoint ( which will be discover by CRAB if an official CMS storage Element has been used, or taken from the crab.cfg as specified by the user). B<NOTE> This part of the path will be used as logical file name of your files in the case of publication without using an official CMS storage Element. Generally it should start with "/store".
551 slacapra 1.6
552     =item B<storage_path>
553    
554 fanzago 1.71 To be used with <copy_data>=1 and <storage_element> not official CMS sites.
555     This is the full path of the Storage Element writeable by all, the mountpoint of SE (i.e /srm/managerv2?SFN=/pnfs/se.xxx.infn.it/yyy/zzz/)
556    
557 slacapra 1.6
558 fanzago 1.72 =item B<storage_pool>
559    
560     If you are using CAF scheduler, you can specify the storage pool where to write your output.
561     The default is cmscafuser. If you do not want to use the default, you can overwrite it specifing None
562    
563 spiga 1.70 =item B<storage_port>
564    
565     To choose the storage port specify I<storage_port> = N (default is 8443) .
566    
567 fanzago 1.101 =item B<local_stage_out *>
568    
569 ewv 1.104 This option enables the local stage out of produced output to the "close storage element" where the job is running, in case of failure of the remote copy to the Storage element decided by the user in che crab.cfg. It has to be used with the copy_data option. In the case of backup copy, the publication of data is forbidden. Set I<local_stage_out> = 1
570 fanzago 1.101
571 fanzago 1.71 =item B<publish_data*>
572    
573     To be used with <copy_data>=1
574     To publish your produced output in a local istance of DBS set publish_data = 1
575 fanzago 1.77 All the details about how to use this functionality are written in https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideCrabForPublication
576 ewv 1.78 N.B 1) if you are using an official CMS site to stored data, the remote dir will be not considered. The directory where data will be stored is decided by CRAB, following the CMS policy in order to be able to re-read published data.
577     2) if you are using a not official CMS site to store data, you have to check the <lfn>, that will be part of the logical file name of you published files, in order to be able to re-read the data.
578 fanzago 1.71
579 fanzago 1.106 =item B<publish_with_import_all_parents>
580 ewv 1.108
581 fanzago 1.107 To publish your data in your local DBS importing also the complete parents tree, set publish_with_import_all_parents=1, otherwise 0. In this last case only the dataset that you have analyzed will be imported as parent in your local DBS. Default value is 1.
582 fanzago 1.106
583 fanzago 1.71 =item B<publish_data_name>
584    
585     You produced output will be published in your local DBS with dataset name <primarydataset>/<publish_data_name>/USER
586    
587     =item B<dbs_url_for_publication>
588    
589     Specify the URL of your local DBS istance where CRAB has to publish the output files
590    
591 fanzago 1.101 =item B<publish_zero_event>
592 spiga 1.93
593 fanzago 1.101 T0 force zero event files publication specify I<publish_zero_event> = 1
594 spiga 1.93
595 spiga 1.55 =item B<srm_version>
596 slacapra 1.46
597 spiga 1.69 To choose the srm version specify I<srm_version> = (srmv1 or srmv2).
598 slacapra 1.46
599 spiga 1.51 =item B<xml_report>
600    
601     To be used to switch off the screen report during the status query, enabling the db serialization in a file. Specifying I<xml_report> = FileName CRAB will serialize the DB into CRAB_WORKING_DIR/share/FileName.
602 slacapra 1.6
603 spiga 1.55 =item B<usenamespace>
604    
605 ewv 1.64 To use the automate namespace definition (perfomed by CRAB) it is possible to set I<usenamespace>=1. The same policy used for the stage out in case of data publication will be applied.
606 spiga 1.54
607 spiga 1.55 =item B<debug_wrapper>
608    
609 spiga 1.87 To enable the higer verbose level on wrapper specify I<debug_wrapper> = 1. The Pset contents before and after the CRAB maipulation will be written together with other useful infos.
610 spiga 1.54
611 spiga 1.75 =item B<deep_debug>
612    
613 ewv 1.78 To be used in case of unexpected job crash when the sdtout and stderr files are lost. Submitting again the same jobs specifying I<deep_debug> = 1 these files will be reported back. NOTE: it works only on standalone mode for debugging purpose.
614 spiga 1.75
615 slacapra 1.68 =item B<dontCheckSpaceLeft>
616    
617     Set it to 1 to skip the check of free space left on your working directory before attempting to get the output back. Default is 0 (=False)
618    
619 slacapra 1.6 =back
620    
621 spiga 1.96 B<[GRID]>
622 nsmirnov 1.1
623 slacapra 1.13 =over 4
624 slacapra 1.6
625 slacapra 1.13 =item B<RB>
626 slacapra 1.6
627 spiga 1.96 Which RB you want to use instead of the default one, as defined in the configuration of your UI. The ones available for CMS are I<CERN> and I<CNAF>. They are actually identical, being a collection of all WMSes available for CMS: the configuration files needed to change the broker will be automatically downloaded from CRAB web page and used.
628     You can use any other RB which is available, if you provide the proper configuration files. E.g., for gLite WMS XYZ, you should provide I<glite.conf.CMS_XYZ>. These files are searched for in the current working directory, and, if not found, on crab web page. So, if you put your private configuration files in the working directory, they will be used, even if they are not available on crab web page.
629 slacapra 1.29 Please get in contact with crab team if you wish to provide your RB or WMS as a service to the CMS community.
630 slacapra 1.6
631 slacapra 1.14 =item B<proxy_server>
632    
633     The proxy server to which you delegate the responsibility to renew your proxy once expired. The default is I<myproxy.cern.ch> : change only if you B<really> know what you are doing.
634    
635 slacapra 1.26 =item B<role>
636    
637     The role to be set in the VOMS. See VOMS documentation for more info.
638    
639 slacapra 1.27 =item B<group>
640    
641     The group to be set in the VOMS, See VOMS documentation for more info.
642    
643 slacapra 1.28 =item B<dont_check_proxy>
644    
645 ewv 1.52 If you do not want CRAB to check your proxy. The creation of the proxy (with proper length), its delegation to a myproxyserver is your responsibility.
646 slacapra 1.28
647 spiga 1.95 =item B<dont_check_myproxy>
648    
649     If you want to to switch off only the proxy renewal set I<dont_check_myproxy>=1. The proxy delegation to a myproxyserver is your responsibility.
650    
651 slacapra 1.6 =item B<requirements>
652    
653     Any other requirements to be add to JDL. Must be written in compliance with JDL syntax (see LCG user manual for further info). No requirement on Computing element must be set.
654    
655 slacapra 1.27 =item B<additional_jdl_parameters:>
656    
657 spiga 1.48 Any other parameters you want to add to jdl file:semicolon separated list, each
658 ewv 1.44 item B<must> be complete, including the closing ";".
659 spiga 1.48
660     =item B<wms_service>
661    
662 fanzago 1.71 With this field it is also possible to specify which WMS you want to use (https://hostname:port/pathcode) where "hostname" is WMS name, the "port" generally is 7443 and the "pathcode" should be something like "glite_wms_wmproxy_server".
663 slacapra 1.27
664 slacapra 1.6 =item B<max_cpu_time>
665    
666     Maximum CPU time needed to finish one job. It will be used to select a suitable queue on the CE. Time in minutes.
667    
668     =item B<max_wall_clock_time>
669    
670     Same as previous, but with real time, and not CPU one.
671    
672 spiga 1.88 =item B<ce_black_list>
673 slacapra 1.6
674 ewv 1.66 All the CE (Computing Element) whose name contains the following strings (comma separated list) will not be considered for submission. Use the dns domain (e.g. fnal, cern, ifae, fzk, cnaf, lnl,....). You may use hostnames or CMS Site names (T2_DE_DESY) or substrings.
675 slacapra 1.6
676 spiga 1.88 =item B<ce_white_list>
677 slacapra 1.6
678 ewv 1.66 Only the CE (Computing Element) whose name contains the following strings (comma separated list) will be considered for submission. Use the dns domain (e.g. fnal, cern, ifae, fzk, cnaf, lnl,....). You may use hostnames or CMS Site names (T2_DE_DESY) or substrings. Please note that if the selected CE(s) does not contain the data you want to access, no submission can take place.
679 slacapra 1.27
680 spiga 1.88 =item B<se_black_list>
681 slacapra 1.27
682 ewv 1.66 All the SE (Storage Element) whose name contains the following strings (comma separated list) will not be considered for submission.It works only if a datasetpath is specified. You may use hostnames or CMS Site names (T2_DE_DESY) or substrings.
683 slacapra 1.27
684 spiga 1.88 =item B<se_white_list>
685 slacapra 1.27
686 ewv 1.66 Only the SE (Storage Element) whose name contains the following strings (comma separated list) will be considered for submission.It works only if a datasetpath is specified. Please note that if the selected CE(s) does not contain the data you want to access, no submission can take place. You may use hostnames or CMS Site names (T2_DE_DESY) or substrings.
687 slacapra 1.6
688 spiga 1.73 =item B<remove_default_blacklist>
689    
690 ewv 1.78 CRAB enforce the T1s Computing Eelements Black List. By default it is appended to the user defined I<CE_black_list>. To remove the enforced T1 black lists set I<remove_default_blacklist>=1.
691 spiga 1.73
692 slacapra 1.6 =item B<virtual_organization>
693    
694 spiga 1.94 You do not want to change this: it is cms!
695 slacapra 1.6
696     =item B<retry_count>
697    
698 fanzago 1.37 Number of time the Grid will try to resubmit your job in case of Grid related problem.
699 slacapra 1.6
700 slacapra 1.27 =item B<shallow_retry_count>
701    
702 fanzago 1.37 Number of time shallow resubmission the Grid will try: resubmissions are tried B<only> if the job aborted B<before> start. So you are guaranteed that your jobs run strictly once.
703 slacapra 1.27
704 slacapra 1.30 =item B<maxtarballsize>
705    
706     Maximum size of tar-ball in Mb. If bigger, an error will be generated. The actual limit is that on the RB input sandbox. Default is 9.5 Mb (sandbox limit is 10 Mb)
707    
708 spiga 1.55 =item B<skipwmsauth>
709    
710 ewv 1.64 Temporary useful parameter to allow the WMSAuthorisation handling. Specifying I<skipwmsauth> = 1 the pyopenssl problmes will disappear. It is needed working on gLite UI outside of CERN.
711 spiga 1.55
712 slacapra 1.6 =back
713    
714 spiga 1.55 B<[LSF]> or B<[CAF]>
715 slacapra 1.46
716     =over 4
717    
718     =item B<queue>
719    
720 ewv 1.52 The LSF queue you want to use: if none, the default one will be used. For CAF, the proper queue will be automatically selected.
721 slacapra 1.46
722     =item B<resource>
723    
724     The resources to be used within a LSF queue. Again, for CAF, the right one is selected.
725    
726     =back
727    
728 nsmirnov 1.1 =head1 FILES
729    
730 slacapra 1.6 I<crab> uses a configuration file I<crab.cfg> which contains configuration parameters. This file is written in the INI-style. The default filename can be changed by the I<-cfg> option.
731 nsmirnov 1.1
732 slacapra 1.6 I<crab> creates by default a working directory 'crab_0_E<lt>dateE<gt>_E<lt>timeE<gt>'
733 nsmirnov 1.1
734     I<crab> saves all command lines in the file I<crab.history>.
735    
736     =head1 HISTORY
737    
738 ewv 1.52 B<CRAB> is a tool for the CMS analysis on the Grid environment. It is based on the ideas from CMSprod, a production tool originally implemented by Nikolai Smirnov.
739 nsmirnov 1.1
740     =head1 AUTHORS
741    
742     """
743     author_string = '\n'
744     for auth in common.prog_authors:
745     #author = auth[0] + ' (' + auth[2] + ')' + ' E<lt>'+auth[1]+'E<gt>,\n'
746     author = auth[0] + ' E<lt>' + auth[1] +'E<gt>,\n'
747     author_string = author_string + author
748     pass
749     help_string = help_string + author_string[:-2] + '.'\
750     """
751    
752     =cut
753 slacapra 1.19 """
754 nsmirnov 1.1
755     pod = tempfile.mktemp()+'.pod'
756     pod_file = open(pod, 'w')
757     pod_file.write(help_string)
758     pod_file.close()
759    
760     if option == 'man':
761     man = tempfile.mktemp()
762     pod2man = 'pod2man --center=" " --release=" " '+pod+' >'+man
763     os.system(pod2man)
764     os.system('man '+man)
765     pass
766     elif option == 'tex':
767     fname = common.prog_name+'-v'+common.prog_version_str
768     tex0 = tempfile.mktemp()+'.tex'
769     pod2tex = 'pod2latex -full -out '+tex0+' '+pod
770     os.system(pod2tex)
771     tex = fname+'.tex'
772     tex_old = open(tex0, 'r')
773     tex_new = open(tex, 'w')
774     for s in tex_old.readlines():
775     if string.find(s, '\\begin{document}') >= 0:
776     tex_new.write('\\title{'+common.prog_name+'\\\\'+
777     '(Version '+common.prog_version_str+')}\n')
778     tex_new.write('\\author{\n')
779     for auth in common.prog_authors:
780     tex_new.write(' '+auth[0]+
781     '\\thanks{'+auth[1]+'} \\\\\n')
782     tex_new.write('}\n')
783     tex_new.write('\\date{}\n')
784     elif string.find(s, '\\tableofcontents') >= 0:
785     tex_new.write('\\maketitle\n')
786     continue
787     elif string.find(s, '\\clearpage') >= 0:
788     continue
789     tex_new.write(s)
790     tex_old.close()
791     tex_new.close()
792     print 'See '+tex
793     pass
794     elif option == 'html':
795     fname = common.prog_name+'-v'+common.prog_version_str+'.html'
796     pod2html = 'pod2html --title='+common.prog_name+\
797     ' --infile='+pod+' --outfile='+fname
798     os.system(pod2html)
799     print 'See '+fname
800     pass
801 slacapra 1.33 elif option == 'txt':
802     fname = common.prog_name+'-v'+common.prog_version_str+'.txt'
803     pod2text = 'pod2text '+pod+' '+fname
804     os.system(pod2text)
805     print 'See '+fname
806     pass
807 nsmirnov 1.1
808     sys.exit(0)