[ViewVC] Diff of: cvsroot/UserCode/MitHzz4l/Documentation/LeptonSelection.tex

Comparing UserCode/MitHzz4l/Documentation/LeptonSelection.tex (file contents):
Revision 1.4 by khahn, Tue Nov 22 01:57:58 2011 UTC vs.
Revision 1.6 by dkralph, Fri Nov 25 20:20:15 2011 UTC

#	Line 9 \| Line 9
9		%__________________________________________________
10		\subsubsection{Offline Muon Selection}\label{sec:muOffline}
11		%__________________________________________________
12	<	We select offline muon candidates that satisfy the requirements given in Tables~\ref{tab:muonID} and~\ref{tab:muonIso}. The main difference between these criteria and those of~\cite{baseline} is our inclusion of Tracker muons, which provide a high-efficiency reconstruction path at low-$p_{T}$. We also introduce additional quality requirements designed to reduce non-prompt backgrounds and we impose $\eta/p_{T}$ dependent, per-muon PF relative isolation.
12	>	We select offline muon candidates that satisfy the requirements given in Tables~\ref{tab:muonID} and~\ref{tab:muonIso}. The main difference between these criteria and those of~\cite{baseline} is our inclusion of Tracker muons, which provide a high-efficiency reconstruction path at low-$p_{T}$. We also introduce quality requirements to reduce non-prompt backgrounds and we impose $\eta/p_{T}$ dependent, per-muon PF relative isolation.
13
14		%-------------------------------------------------
15		\begin{table}[tbh]
16		\begin{center}
17		\begin{tabular}{c\|c}
18	–
18		\hline
19		\multicolumn{2}{c}{General Muon Requirements} \\
20		\hline
#	Line 67 \| Line 66 \| $< 20$ & $> 1.48$ & $ < 0.05
66		\end{table}
67		%-------------------------------------------------
68
69	<	%
71	<	We measure the efficiency of this selection using samples of $Z \rightarrow \mu\mu$ events and the ``Tag \& Probe'' technique~\cite{TP}. The $\mathcal{L} = 4.7\rm~fb^{-1}$ dataset contains a sufficient number of $Z$ events to obtain selection efficiencies for muons below $10\rm~GeV$ -- we do not utilize separate samples of low-mass resonances for this $p_{T}$ region. We require events that contain at least one muon candidate passing the full set of muon identification criteria (the tag) and at least one additional reconstructed Global or Tracker muon candidate (the probe). The sample is split according to whether the probe passes of fails our selection. We determine efficiency in MC by simply counting the number of events that pass or fail the selection in bins of $p_{T}$ and $\eta$. Efficiency is extracted in data by fitting with MC signal shapes and empirical function for the background. Figures~\ref{fig:muTPhighpt} and~\ref{fig:muTPlowpt} respectively show fits results in the central region for high and low $p_{T}$ bins.
69	>	We measure the efficiency of this selection using samples of $Z \rightarrow \mu\mu$ events and the ``Tag \& Probe'' technique~\cite{TP}. The $\mathcal{L} = 4.7\rm~fb^{-1}$ dataset contains a sufficient number of $Z$ events for us to obtain selection efficiencies for $p_{T} < 10\rm~GeV$ muons, thus we do not utilize separate samples of low-mass resonances for this $p_{T}$ region. We require events that contain at least one muon candidate (the tag) that satisfies the full set of muon identification criteria and passes a singleMuon trigger. We then require one additional reconstructed Global or Tracker muon candidate to serve as the probe. We determine efficiency in MC by simply counting the number of probes that pass or fail selection in bins of $p_{T}$ and $\eta$. Binned efficiencies are etermined in data from simultaneous shape fits to the $m(\mu_{tag}\mu_{probe})$ distributions of events in the pass and fail categories. We use MC signal shape templates and an empirical function that describes background when fitting data. Figures~\ref{fig:muTPhighpt} and~\ref{fig:muTPlowpt} show fit results for the high and low $p_{T}$ bins for muons in the central region.
70
71		%-------------------------------------------------
72		\begin{figure}[htb]
#	Line 89 \| Line 87 \| We measure the efficiency of this select
87		\end{figure}
88		%-------------------------------------------------
89
90	<	We divide the $p_{T}/\eta$-binned efficiencies from data with corresponding values from MC to determine data/MC efficiency scale factors, $f_{ID,Iso}$. We use these factors to weight selected muons in our MC samples, as is discussed in Sections~\ref{sec:Signal}. Figure~\ref{fig:muEff} shows $f_{ID,Iso}$ for the central and forward regions as a function of $p_{T}$. Table~\ref{tab:musf} lists values for $f_{ID,Iso}$ in our $p_{T}/\eta$ bins.
90	>	We divide the $p_{T}/\eta$-binned efficiencies from data with corresponding values from MC to determine data/MC efficiency scale factors, $f_{ID,Iso}$. We use these factors to weight selected muons in our MC samples, as discussed in Sections~\ref{sec:Signal}. Figure~\ref{fig:muEff} shows $f_{ID,Iso}$ for the central and forward regions as a function of $p_{T}$. Values for $f_{ID,Iso}$ in each of our $p_{T}/\eta$ bins are given in Table~\ref{tab:musf}.
91
92		%-------------------------------------------------
93		\begin{figure}[htb]
#	Line 129 \| Line 127 \| Identification and isolation efficiencie
127		%__________________________________________________
128		\subsubsection{Online Muon Selection}\label{sec:muOnline}
129		%__________________________________________________
130	<	We use Tag \& Probe to also measure $p_{T}/\eta$-binned per-leg efficiencies for the \verb\|HLT_DoubleMu_7\| and \verb\|HLT_Mu_13_8\| triggers. The trigger efficiencies are calculated with respect to muon candidates that pass the offline requirements described in Section~\ref{sec:muOnline}. We do not use the emulation of these triggers in MC and instead correct the simulation with the absolute efficiencies measured in data. Backgrounds after offline selection are small, so trigger efficiency is determined by simply counting events. Tables~\ref{tab:trigEffMu7}-\ref{tab:trigEffMu13_8_trailing} provide the per-leg efficiencies for our various $p_{T}/eta$ bins.
130	>	Tag \& Probe is also used to measure $p_{T}/\eta$-binned per-leg efficiencies for the \verb\|HLT_DoubleMu_7\| and \verb\|HLT_Mu_13_8\| triggers. We calculated trigger efficiencies with respect to muon candidates that pass the offline requirements described in Section~\ref{sec:muOnline}. We do not use the emulation of these triggers in MC and instead correct the simulation with the absolute efficiencies measured in data. Backgrounds after offline selection are small, so trigger efficiency is determined by simply counting events. Tables~\ref{tab:trigEffMu7}-\ref{tab:trigEffMu13_8_trailing} provide the per-leg efficiencies for our various $p_{T}/eta$ bins.
131
132		% figs/mueff/Run2011A_HLT_DoubleMu7/default/extra/dat_eff_table.tex
133		%KSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKS
#	Line 209 \| Line 207 \| $100 < p_T < 7000$ & $0.9662 \pm 0.0054$
207		%__________________________________________________
208		\subsection{Offline Selection}
209		%__________________________________________________
210	<	We select electron candidates for the analysis using a multivariate (MV) technique. Our method was developed in concert with an MV-based electron ID scheme for the WW analysis~\cite{si}. The two methods are equivalent, modulo small differences in implementation that address the relative severity of ``fake'' electron backgrounds in the respective analyses.
210	>	We select electron candidates for the analysis using a multivariate (MV) technique. Our method was developed together with an MV-based electron ID scheme for the WW analysis~\cite{si}. The two methods are equivalent, modulo small differences in implementation that address the relative severity of ``fake'' electron backgrounds in the respective analyses.
211
212	<	We utilize a TMVA Boosted Decision Tree (BDT) for MV identification. The BDT is trained with separate samples of candidate objects that are enriched in either fake or real electrons. Candidates are defined as reconstructed electrons that pass the minimal set of selection criteria listed in Table~\ref{tab:eleFO}. We construct a signal training sample from pairs of candidates in the DoubleElectron dataset with $\|m_{\ell\ell} - M_{Z}\| < 15~\rm GeV$. Candidates in the background training sample are selected from events that pass a single-electron trigger. We require a $\Delta R(\eta,\phi) >1~\rm$ jet and reject events with $\rm MET > 20~GeV$, or containing more than one candidate. We also veto conversions to suppress real electron contamination.
212	>	We utilize a TMVA Boosted Decision Tree (BDT) for MV identification. The BDT is trained on separate samples of candidate objects that are enriched in either fake or real electrons. Candidates are defined as reconstructed electrons that pass the minimal set of selection criteria listed in Table~\ref{tab:eleFO}. We construct a signal training sample from pairs of candidates in the DoubleElectron dataset with $\|m_{\ell\ell} - M_{Z}\| < 15~\rm GeV$. Candidates in the background training sample are selected from events that pass a single-electron trigger. We require a $\Delta R(\eta,\phi) >1~\rm$ jet and reject events with $\rm MET > 20~GeV$, or containing more than one electron candidate. Conversion candidates are vetoed to further suppress real electron contamination.
213
214		%-------------------------------------------------
215		\begin{table}[tbh]
216		\begin{center}
217		\begin{tabular}{c\|c}
218	+	\hline
219		{\bf Quantity} & {\bf Requirement}\\
220		\hline
221	<	$\|dz\|$ & $< 0.1\rm~cm$ \\
221	>	$\|dz\|$ & $< 0.1\rm~cm$ \\
222		$H/E$ & $< 0.12(0.1) EB(EE)$ \\
223		$iso_{trk}$ & $<0.3$ \\
224		$iso_{em}$ & $<0.3$ \\
#	Line 231 \| Line 230 \| $iso_{had}$ & $<0.3$
230		\end{table}
231		%-------------------------------------------------
232
233	<	MV discrimination is performed using the following variables :
233	>	MV discrimination is performed using the following variables : $\sigma_{i\eta i\eta}$, $\sigma_{i\phi i\phi}$, $\Delta\eta_{in}$, $\Delta\phi_{in}$, $f_{Brem}$, $n_{Brem}$, $E/P$, $d_{0}$, $E_{seed}/P_{out}$, $E_{seed}/P_{in}$, $1/E - 1/P$. As can be seen in Figure~\ref{fig:bdtInput}, these variables exhibit substantial correlations, of which the BDT is able to make full use. The same figure also displays the input distributions for signal and background for several representative variables.
234
235	<	\begin{itemize}
236	<	\item $\sigma_{i\eta i\eta}$
237	<	\item $\sigma_{i\phi i\phi}$
238	<	\item $\Delta\eta_{in}$
239	<	\item $\Delta\phi_{in}$
240	<	\item fBrem
241	<	\item nBrem
242	<	\item $E/P$
243	<	\item D0
244	<	\item $E_{seed}/P_{out}$
246	<	\item $E_{seed}/P_{in}$
247	<	\item $1/E - 1/P$
248	<	\end{itemize}
235	>	%-------------------------------------------------
236	>	\begin{figure}[tbp]
237	>	\begin{center}
238	>	\includegraphics[width=0.4\linewidth]{figs/bdt-correl-sig.png}
239	>	\includegraphics[width=0.4\linewidth]{figs/bdt-correl-bkg.png}
240	>	\includegraphics[width=0.4\linewidth]{figs/bdt-input-OneOverEMinusOneOverP.png}
241	>	\includegraphics[width=0.4\linewidth]{figs/bdt-input-DEtaIn.png}
242	>	\caption{ \label{fig:bdtInput} }
243	>	\end{center}
244	>	\end{figure}
245
246		{\bf Cuts on these guys? Show correlation plot to motivate BDT?}
247
#	Line 261 \| Line 257 \| We train and validate the BDT using stat
257		\end{figure}
258		%-------------------------------------------------
259
260	<	The plots in Figure~\ref{fig:ROC} include efficiency points corresponding to the ``Cuts in Categories'' (CIC) loose, medium and tight working points defined in~\cite{CIC}. BDT and CIC performances are comparable in the high $p_{T}$ bins, whereas the BDT outperforms CIC at low $p_{T}$. We define a set of loose, medium and tight BDT working points for this analysis by stipulating background efficiencies equivalent to those of the corresponding CIC working points.
260	>	The plots in Figure~\ref{fig:ROC} include efficiency points that correspond to the ``Cuts in Categories'' (CIC) loose, medium and tight working points defined in~\cite{CIC}. BDT and CIC performances are comparable in the high $p_{T}$ bins, however the BDT outperforms CIC at low $p_{T}$. We define a set of loose, medium and tight BDT working points for this analysis by stipulating background efficiencies that are equivalent to those of the corresponding CIC working points.
261
262		%% BDT and CIC signal efficiencies for the various working points are compared in Table~\ref{tab:WPs}.
263
#	Line 282 \| Line 278 \| The plots in Figure~\ref{fig:ROC} includ
278		%% \end{table}
279		%% %-------------------------------------------------
280
281	<	The efficiencies shown in Figure~\ref{fig:ROC} are determined with respect to the candidate definition in Table~\ref{tab:eleFO}. While these values are useful for performance comparison, efficiencies for the analysis must be taken with respect to reconstructed GSF electrons. As with muons, we calculate electron identification/isolation efficiencies for the analysis using Tag \& Probe. Figures~\ref{fig:eleTPmediumhighpt} and ~\ref{fig:eleTPmediumlowpt} (~\ref{fig:eleTPloosehighpt} and ~\ref{fig:eleTPlooselowpt}) show fit results for our medium (loose) MV selection in the central region. %The complete set of offline selection fits from Tag \& Probe are included in Appendix~\ref{app:}.
281	>	The efficiencies shown in Figure~\ref{fig:ROC} are determined with respect to the candidate definition in Table~\ref{tab:eleFO}. Selection performance can be easily compared with this efficiency definition, however efficiencies for the analysis must be taken with respect to reconstructed GSF electrons. As with muons, we calculate electron identification/isolation efficiencies for the analysis using Tag \& Probe. Figures~\ref{fig:eleTPmediumhighpt} and ~\ref{fig:eleTPmediumlowpt} (~\ref{fig:eleTPloosehighpt} and ~\ref{fig:eleTPlooselowpt}) show fit results for our medium (loose) MV selection in the central region. %The complete set of offline selection fits from Tag \& Probe are included in Appendix~\ref{app:}.
282
283		%-------------------------------------------------
284		\begin{figure}[htb]
#	Line 326 \| Line 322 \| The efficiencies shown in Figure~\ref{fi
322		\end{figure}
323		%-------------------------------------------------
324
325	<	We divide the binned data efficiencies with corresponding values from MC to obtain offline efficiency scale factors, $f_{ID,Iso}$. Tables~\ref{tab:eleSFmedium}-~\ref{tab:eleSFloose} list these factors for the medium and loose offline selections. Figures~\ref{fig:eleSFmedium} and ~\ref{fig:eleSFloose} plot the $f_{ID,Iso}$ as functions of $p_{T}$ for the central and forward regions.
330	<
325	>	We divide the binned efficiencies from data with corresponding values from MC to obtain offline efficiency scale factors, $f_{ID,Iso}$. Tables~\ref{tab:eleSFmedium}-~\ref{tab:eleSFloose} list these factors for the medium and loose offline selections. Figures~\ref{fig:eleSFmedium} and ~\ref{fig:eleSFloose} plot the $f_{ID,Iso}$ as functions of $p_{T}$ for the central and forward regions.
326
327		%eleeff/Run2011A_EleWPEffTP-medium/default/extra/sf_table.tex
328		%KSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKS
#	Line 454 \| Line 449 \| $100 < p_T < 7000$ & $0.9662 \pm 0.0054$
449
450		%KSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKSKS
451
452	+	\clearpage

Diff Legend

-–
+Removed lines
-+
+Added lines
-<
+Changed lines
->
+Changed lines

Comparing UserCode/MitHzz4l/Documentation/LeptonSelection.tex (file contents): Revision 1.4 by khahn, Tue Nov 22 01:57:58 2011 UTC vs. Revision 1.6 by dkralph, Fri Nov 25 20:20:15 2011 UTC

Diff Legend

Comparing UserCode/MitHzz4l/Documentation/LeptonSelection.tex (file contents):
Revision 1.4 by khahn, Tue Nov 22 01:57:58 2011 UTC vs.
Revision 1.6 by dkralph, Fri Nov 25 20:20:15 2011 UTC