[ViewVC] Diff of: cvsroot/UserCode/MitHzz4l/Documentation/LeptonSelection.tex

Comparing UserCode/MitHzz4l/Documentation/LeptonSelection.tex (file contents):
Revision 1.3 by dkralph, Mon Nov 21 19:38:07 2011 UTC vs.
Revision 1.5 by khahn, Wed Nov 23 03:08:59 2011 UTC

#	Line 9 \| Line 9
9		%__________________________________________________
10		\subsubsection{Offline Muon Selection}\label{sec:muOffline}
11		%__________________________________________________
12	<	We select offline muon candidates that satisfy the requirements given in Tables~\ref{tab:muonID} and~\ref{tab:muonIso}. The main difference between these criteria and those of~\cite{baseline} is our inclusion of Tracker muons, which provide us with a high-efficiency low-$p_{T}$ reconstruction path. We also introduce additional quality requirements intended to reduce non-prompt backgrounds and we impose $\eta/p_{T}$ dependent, per-muon PF isolation requirements.
12	>	We select offline muon candidates that satisfy the requirements given in Tables~\ref{tab:muonID} and~\ref{tab:muonIso}. The main difference between these criteria and those of~\cite{baseline} is our inclusion of Tracker muons, which provide a high-efficiency reconstruction path at low-$p_{T}$. We also introduce quality requirements to reduce non-prompt backgrounds and we impose $\eta/p_{T}$ dependent, per-muon PF relative isolation.
13
14		%-------------------------------------------------
15		\begin{table}[tbh]
16		\begin{center}
17		\begin{tabular}{c\|c}
18	–
18		\hline
19		\multicolumn{2}{c}{General Muon Requirements} \\
20		\hline
#	Line 67 \| Line 66 \| $< 20$ & $> 1.48$ & $ < 0.05
66		\end{table}
67		%-------------------------------------------------
68
69	<	%
71	<	We measure the efficiency of this selection using samples of $Z \rightarrow \mu\mu$ events and the ``Tag \& Probe'' technique~\cite{TP}. The $\mathcal{L} = 2.1\rm~fb^{-1}$ dataset contains a sufficient number of $Z$ events for us to obtain selection efficiencies for muons below $10\rm~GeV$, thus we do not utilize separate samples of low-mass resonances for this $p_{T}$ region. We require events containing at least one muon candidate that passes the full set of muon identification criteria (the tag) and at least one additional reconstructed Global or Tracker muon candidate (the probe). The sample is split according to whether the probe passes of fails our selection. We determine efficiency in MC by simply counting the number of events that pass or fail the selection in bins of $p_{T}$ and $\eta$. Efficiency is extracted in data by fitting with MC signal shapes and empirical function for the background. Figures~\ref{fig:muTPhighpt} and~\ref{fig:muTPlowpt} respectively show fits results in the central region for high and low $p_{T}$ bins.
69	>	We measure the efficiency of this selection using samples of $Z \rightarrow \mu\mu$ events and the ``Tag \& Probe'' technique~\cite{TP}. The $\mathcal{L} = 4.7\rm~fb^{-1}$ dataset contains a sufficient number of $Z$ events for us to obtain selection efficiencies for $p_{T} < 10\rm~GeV$ muons, thus we do not utilize separate samples of low-mass resonances for this $p_{T}$ region. We require events that contain at least one muon candidate (the tag) that satisfies the full set of muon identification criteria and passes a singleMuon trigger. We then require one additional reconstructed Global or Tracker muon candidate to serve as the probe. We determine efficiency in MC by simply counting the number of probes that pass or fail selection in bins of $p_{T}$ and $\eta$. Binned efficiencies are etermined in data from simultaneous shape fits to the $m(\mu_{tag}\mu_{probe})$ distributions of events in the pass and fail categories. We use MC signal shape templates and an empirical function that describes background when fitting data. Figures~\ref{fig:muTPhighpt} and~\ref{fig:muTPlowpt} show fit results for the high and low $p_{T}$ bins for muons in the central region.
70
71		%-------------------------------------------------
72		\begin{figure}[htb]
#	Line 89 \| Line 87 \| We measure the efficiency of this select
87		\end{figure}
88		%-------------------------------------------------
89
90	<	We divide the $p_{T}/\eta$-binned efficiencies from data with corresponding values from MC to determine data/MC efficiency scale factors, $f_{ID,Iso}$. We use these factors to weight selected muons in our MC samples, as is discussed in Sections~\ref{sec:Signal}. Figure~\ref{fig:muEff} shows $f_{ID,Iso}$ for the central and forward regions as a function of $p_{T}$. Table~\ref{tab:musf} lists values for $f_{ID,Iso}$ in our $p_{T}/\eta$ bins.
90	>	We divide the $p_{T}/\eta$-binned efficiencies from data with corresponding values from MC to determine data/MC efficiency scale factors, $f_{ID,Iso}$. We use these factors to weight selected muons in our MC samples, as discussed in Sections~\ref{sec:Signal}. Figure~\ref{fig:muEff} shows $f_{ID,Iso}$ for the central and forward regions as a function of $p_{T}$. Values for $f_{ID,Iso}$ in each of our $p_{T}/\eta$ bins are given in Table~\ref{tab:musf}.
91
92		%-------------------------------------------------
93		\begin{figure}[htb]
#	Line 129 \| Line 127 \| Identification and isolation efficiencie
127		%__________________________________________________
128		\subsubsection{Online Muon Selection}\label{sec:muOnline}
129		%__________________________________________________
130	<	We use Tag \& Probe to also measure $p_{T}/\eta$-binned per-leg efficiencies for the \verb\|HLT_DoubleMu_7\| and \verb\|HLT_Mu_13_8\| triggers. The trigger efficiencies are calculated with respect to muon candidates that pass the offline requirements described in Section~\ref{sec:muOnline}. We do not use the emulation of these triggers in MC and instead correct the simulation with the absolute efficiencies measured in data. Backgrounds after offline selection are small, so trigger efficiency is determined by simply counting events. Tables~\ref{tab:trigEffMu7}-\ref{tab:trigEffMu13_8_trailing} provide the per-leg efficiencies for our various $p_{T}/eta$ bins.
130	>	Tag \& Probe is also used to measure $p_{T}/\eta$-binned per-leg efficiencies for the \verb\|HLT_DoubleMu_7\| and \verb\|HLT_Mu_13_8\| triggers. We calculated trigger efficiencies with respect to muon candidates that pass the offline requirements described in Section~\ref{sec:muOnline}. We do not use the emulation of these triggers in MC and instead correct the simulation with the absolute efficiencies measured in data. Backgrounds after offline selection are small, so trigger efficiency is determined by simply counting events. Tables~\ref{tab:trigEffMu7}-\ref{tab:trigEffMu13_8_trailing} provide the per-leg efficiencies for our various $p_{T}/eta$ bins.
131
132		% figs/mueff/Run2011A_HLT_DoubleMu7/default/extra/dat_eff_table.tex
133		%KSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKS
#	Line 209 \| Line 207 \| $100 < p_T < 7000$ & $0.9662 \pm 0.0054$
207		%__________________________________________________
208		\subsection{Offline Selection}
209		%__________________________________________________
210	<	We select electron candidates for the analysis using a multivariate (MV) technique. Our method was developed in concert with an MV-based electron ID scheme for the WW analysis~\cite{si}. The two methods are equivalent, modulo small differences in implementation that address the relative severity of ``fake'' electron backgrounds in the respective analyses.
210	>	We select electron candidates for the analysis using a multivariate (MV) technique. Our method was developed together with an MV-based electron ID scheme for the WW analysis~\cite{si}. The two methods are equivalent, modulo small differences in implementation that address the relative severity of ``fake'' electron backgrounds in the respective analyses.
211
212	<	We utilize a TMVA Boosted Decision Tree (BDT) for MV identification. The BDT is trained with separate samples of candidate objects that are enriched in either fake or real electrons. Candidates are defined as reconstructed electrons that pass the minimal set of selection criteria listed in Table~\ref{tab:eleFO}. We construct a signal training sample from pairs of candidates in the DoubleElectron dataset with $\|m_{\ell\ell} - M_{Z}\| < 15~\rm GeV$. Candidates in the background training sample are selected from events that pass a single-electron trigger. We require a $\Delta R(\eta,\phi) >1~\rm$ jet and reject events with $\rm MET > 20~GeV$, or containing more than one candidate. We also veto conversions to suppress real electron contamination.
212	>	We utilize a TMVA Boosted Decision Tree (BDT) for MV identification. The BDT is trained on separate samples of candidate objects that are enriched in either fake or real electrons. Candidates are defined as reconstructed electrons that pass the minimal set of selection criteria listed in Table~\ref{tab:eleFO}. We construct a signal training sample from pairs of candidates in the DoubleElectron dataset with $\|m_{\ell\ell} - M_{Z}\| < 15~\rm GeV$. Candidates in the background training sample are selected from events that pass a single-electron trigger. We require a $\Delta R(\eta,\phi) >1~\rm$ jet and reject events with $\rm MET > 20~GeV$, or containing more than one electron candidate. Conversion candidates are vetoed to further suppress real electron contamination.
213
214		%-------------------------------------------------
215		\begin{table}[tbh]
216		\begin{center}
217		\begin{tabular}{c\|c}
218	+	\hline
219		{\bf Quantity} & {\bf Requirement}\\
220		\hline
221		$\|dz\|$ & $< 0.1\rm~cm$ \\
#	Line 231 \| Line 230 \| $iso_{had}$ & $<0.3$
230		\end{table}
231		%-------------------------------------------------
232
233	<	MV discrimination is performed using the following variables :
235	<
236	<	\begin{itemize}
237	<	\item $\sigma_{i\eta i\eta}$
238	<	\item $\sigma_{i\phi i\phi}$
239	<	\item $\Delta\eta_{in}$
240	<	\item $\Delta\phi_{in}$
241	<	\item fBrem
242	<	\item nBrem
243	<	\item $E/P$
244	<	\item D0
245	<	\item $E_{seed}/P_{out}$
246	<	\item $E_{seed}/P_{in}$
247	<	\item $1/E - 1/P$
248	<	\end{itemize}
249	<
250	<	{\bf Cuts on these guys? Show correlation plot to motivate BDT?}
233	>	MV discrimination is performed using the following variables : $\sigma_{i\eta i\eta}$, $\sigma_{i\phi i\phi}$, $\Delta\eta_{in}$, $\Delta\phi_{in}$, $f_{Brem}$, $n_{Brem}$, $E/P$, $d_{0}$, $E_{seed}/P_{out}$, $E_{seed}/P_{in}$, $1/E - 1/P$. {\bf Cuts on these guys? Show correlation plot to motivate BDT?}
234
235		We train and validate the BDT using statistically independent subsets of events from the samples described above. Training and testing is performed separately for six $\eta/p_{T}$ bins. A cut on the resulting BDT discriminant translates to a specific combination of signal and background efficiency. The locus of signal/background efficiencies yields the performance ({\it i.e:} ROC) curves shown in Figure~\ref{fig:ROC}.
236
#	Line 261 \| Line 244 \| We train and validate the BDT using stat
244		\end{figure}
245		%-------------------------------------------------
246
247	<	The plots in Figure~\ref{fig:ROC} include efficiency points corresponding to the ``Cuts in Categories'' (CIC) loose, medium and tight working points defined in~\cite{CIC}. BDT and CIC performances are comparable in the high $p_{T}$ bins, whereas the BDT outperforms CIC at low $p_{T}$. We define a set of loose, medium and tight BDT working points for this analysis by stipulating background efficiencies equivalent to those of the corresponding CIC working points.
247	>	The plots in Figure~\ref{fig:ROC} include efficiency points that correspond to the ``Cuts in Categories'' (CIC) loose, medium and tight working points defined in~\cite{CIC}. BDT and CIC performances are comparable in the high $p_{T}$ bins, however the BDT outperforms CIC at low $p_{T}$. We define a set of loose, medium and tight BDT working points for this analysis by stipulating background efficiencies that are equivalent to those of the corresponding CIC working points.
248
249		%% BDT and CIC signal efficiencies for the various working points are compared in Table~\ref{tab:WPs}.
250
#	Line 282 \| Line 265 \| The plots in Figure~\ref{fig:ROC} includ
265		%% \end{table}
266		%% %-------------------------------------------------
267
268	<	The efficiencies shown in Figure~\ref{fig:ROC} are determined with respect to the candidate definition in Table~\ref{tab:eleFO}. While these values are useful for performance comparison, efficiencies for the analysis must be taken with respect to reconstructed GSF electrons. As with muons, we calculate electron identification/isolation efficiencies for the analysis using Tag \& Probe. Figures~\ref{fig:eleTPmediumhighpt} and ~\ref{fig:eleTPmediumlowpt} (~\ref{fig:eleTPloosehighpt} and ~\ref{fig:eleTPlooselowpt}) show fit results for our medium (loose) MV selection in the central region. %The complete set of offline selection fits from Tag \& Probe are included in Appendix~\ref{app:}.
268	>	The efficiencies shown in Figure~\ref{fig:ROC} are determined with respect to the candidate definition in Table~\ref{tab:eleFO}. Selection performance can be easily compared with this efficiency definition, however efficiencies for the analysis must be taken with respect to reconstructed GSF electrons. As with muons, we calculate electron identification/isolation efficiencies for the analysis using Tag \& Probe. Figures~\ref{fig:eleTPmediumhighpt} and ~\ref{fig:eleTPmediumlowpt} (~\ref{fig:eleTPloosehighpt} and ~\ref{fig:eleTPlooselowpt}) show fit results for our medium (loose) MV selection in the central region. %The complete set of offline selection fits from Tag \& Probe are included in Appendix~\ref{app:}.
269
270		%-------------------------------------------------
271		\begin{figure}[htb]
#	Line 326 \| Line 309 \| The efficiencies shown in Figure~\ref{fi
309		\end{figure}
310		%-------------------------------------------------
311
312	<	We divide the binned data efficiencies with corresponding values from MC to obtain offline efficiency scale factors, $f_{ID,Iso}$. Tables~\ref{tab:eleSFmedium}-~\ref{tab:eleSFloose} list these factors for the medium and loose offline selections. Figures~\ref{fig:eleSFmedium} and ~\ref{fig:eleSFloose} plot the $f_{ID,Iso}$ as functions of $p_{T}$ for the central and forward regions.
330	<
312	>	We divide the binned efficiencies from data with corresponding values from MC to obtain offline efficiency scale factors, $f_{ID,Iso}$. Tables~\ref{tab:eleSFmedium}-~\ref{tab:eleSFloose} list these factors for the medium and loose offline selections. Figures~\ref{fig:eleSFmedium} and ~\ref{fig:eleSFloose} plot the $f_{ID,Iso}$ as functions of $p_{T}$ for the central and forward regions.
313
314		%eleeff/Run2011A_EleWPEffTP-medium/default/extra/sf_table.tex
315		%KSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKS
#	Line 454 \| Line 436 \| $100 < p_T < 7000$ & $0.9662 \pm 0.0054$
436
437		%KSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKS
438
439	+	\clearpage

Diff Legend

-–
+Removed lines
-+
+Added lines
-<
+Changed lines
->
+Changed lines

Comparing UserCode/MitHzz4l/Documentation/LeptonSelection.tex (file contents): Revision 1.3 by dkralph, Mon Nov 21 19:38:07 2011 UTC vs. Revision 1.5 by khahn, Wed Nov 23 03:08:59 2011 UTC

Diff Legend

Comparing UserCode/MitHzz4l/Documentation/LeptonSelection.tex (file contents):
Revision 1.3 by dkralph, Mon Nov 21 19:38:07 2011 UTC vs.
Revision 1.5 by khahn, Wed Nov 23 03:08:59 2011 UTC