ViewVC Help
View File | Revision Log | Show Annotations | Root Listing
root/cvsroot/UserCode/benhoob/cmsnotes/OSPAS2011/datadriven.tex
(Generate patch)

Comparing UserCode/benhoob/cmsnotes/OSPAS2011/datadriven.tex (file contents):
Revision 1.4 by benhoob, Wed Jun 15 08:26:17 2011 UTC vs.
Revision 1.5 by benhoob, Wed Jun 15 09:18:50 2011 UTC

# Line 1 | Line 1
1 < \section{Background Estimates from Data}
1 > \section{Counting Experiments}
2   \label{sec:datadriven}
3  
4   To look for possible BSM contributions, we define 2 signal regions that preserve about
# Line 12 | Line 12 | To look for possible BSM contributions,
12   For the high \MET\ (high \Ht) signal region, the MC predicts 2.6 (2.5) SM events,
13   dominated by dilepton $t\bar{t}$; the expected LM1 yield is 17 (14) and the
14   expected LM3 yield is 6.4 (6.7). The signal regions are indicated in Fig.~\ref{fig:met_ht}.
15 + These signal regions are tighter than the one used in our published 2010 analysis since
16 + with the larger data sample they give improved sensitivity to contributions from new physics.
17  
18 < We use three independent methods to estimate from data the background in the signal region.
18 > We perform counting experiments in these signal regions, and use three independent methods to estimate from data the background in the signal region.
19   The first method is a novel technique based on the ABCD method, which we used in our 2010 analysis~\cite{ref:ospaper},
20   and exploits the fact that \HT\ and $y \equiv \MET/\sqrt{H_T}$ are nearly uncorrelated for the $t\bar{t}$ background;
21   this method is referred to as the ABCD' technique. First, we extract the $y$ and \Ht\ distributions
# Line 21 | Line 23 | $f(y)$ and $g(H_T)$ from data, using eve
23   Because $y$ and \Ht\ are weakly-correlated, the distribution of events in the $y$ vs. \Ht\ plane is described by:
24  
25   \begin{equation}
26 + \label{eq:abcdprime}
27   \frac{\partial^2 N}{\partial y \partial H_T} = f(y)g(H_T),
28   \end{equation}
29  
30   allowing us to deduce the number of events falling in any region of this plane. In particular,
31   we can deduce the number of events falling in our signal regions defined by requirements on \MET\ and \Ht.
32  
33 < We measure the $f(y)$ and $g(H_T)$ distributions using events in the regions indicated in Fig.~\ref{fig:abcdprimedata}
34 < Next, we randomly sample values of $y$ and \Ht\ from these distributions; each pair of $y$ and \Ht\ values is a pseudo-event.
35 < We generate a large ensemble of pseudo-events, and find the ratio $R_{S/C}$, the ratio of the
36 < number of pseudo-events falling in the signal region to the number of pseudo-events
37 < falling in a control region defined by the same requirements used to select events
38 < to measure $f(y)$ and $g(H_T)$. We then
39 < multiply this ratio by the number events which fall in the control region in data
40 < to get the predicted yield, ie. $N_{pred} = R_{S/C} \times N({\rm control})$.
41 < To estimate the statistical uncertainty in the predicted background, we smear the bin contents
42 < of $f(y)$ and $g(H_T)$ according to their uncertainties. We repeat the prediction 20 times
43 < with these smeared distributions, and take the RMS of the deviation from the nominal prediction
33 > We measure the $f(y)$ and $g(H_T)$ distributions using events in the regions indicated in Fig.~\ref{fig:abcdprimedata},
34 > and predict the background yields in the signal regions using Eq.~\ref{eq:abcdprime}.
35 > %Next, we randomly sample values of $y$ and \Ht\ from these distributions; each pair of $y$ and \Ht\ values is a pseudo-event.
36 > %We generate a large ensemble of pseudo-events, and find the ratio $R_{S/C}$, the ratio of the
37 > %number of pseudo-events falling in the signal region to the number of pseudo-events
38 > %falling in a control region defined by the same requirements used to select events
39 > %to measure $f(y)$ and $g(H_T)$. We then
40 > %multiply this ratio by the number events which fall in the control region in data
41 > %to get the predicted yield, ie. $N_{pred} = R_{S/C} \times N({\rm control})$.
42 > To estimate the statistical uncertainty in the predicted background, the bin contents
43 > of $f(y)$ and $g(H_T)$ are smeared according to their Poisson uncertainties, the prediction is repeated 20 times
44 > with these smeared distributions, and the RMS of the deviation from the nominal prediction is taken
45   as the statistical uncertainty. We have studied this technique using toy MC studies based on
46   event samples of similar size to the expected yield in data for 1 fb$^{-1}$.
47   Based on these studies we correct the predicted background yields by factors of 1.2 $\pm$ 0.5
# Line 55 | Line 59 | reliably  accounted   for.   We then  us
59   $\pt(\ell\ell)$ distribution to  model the $\pt(\nu\nu)$ distribution,
60   which is  identified with \MET.  Thus,  we use the  number of observed
61   events  with $\HT > 300\GeV$ and $\pt(\ell\ell)  > 275\GeV$
62 < ($\HT > 600\GeV$ and $\pt(\ell\ell)  > 200\GeV^{1/2}$ )
62 > ($\HT > 600\GeV$ and $\pt(\ell\ell)  > 200\GeV$ )
63   to predict the  number of  background events  with
64   $\HT >  300\GeV$ and  $\MET > 275\GeV$ ($\HT >  600\GeV$ and  $\MET > 200\GeV$).  
65 < In  practice, two corrections must be applied to this prediction, as described below.
66 <
67 < %
68 < % Now describe the corrections
65 < %
66 < The first correction  accounts for the $\MET >  50\GeV$ requirement in the
67 < preselection, which is needed to  reduce the DY background.  We
68 < rescale  the  prediction by  a  factor equal  to  the  inverse of  the
69 < fraction  of  events  passing  the preselection which  also  satisfy  the
70 < requirement  $\pt(\ell\ell) >  50\GeVc$.  
71 < For the \Ht\ $>$ 300 GeV requirement corresponding to the high \MET\ signal region,
72 < we determine this correction from data and find  $K_{50}=1.5 \pm 0.3$.  
73 < For the \Ht\ $>$ 600 GeV requirement corresponding to the high \Ht\ signal region,
74 < we do not have enough events in data to determine this correction with statistical
75 < precision, so we instead extract it from MC and find $K_{50}=1.3 \pm 0.2$.
76 < The  second  correction ($K_C$) is  associated with the  known polarization  of the  $W$, which
77 < introduces a difference  between the $\pt(\ell\ell)$ and $\pt(\nu\nu)$
78 < distributions. The correction $K_C$ also takes into account detector effects such as the hadronic energy
79 < scale and  resolution which affect  the \MET\ but  not $\pt(\ell\ell)$.
80 < The  total correction factor  is $K_{50}  \times K_C  = 2.2  \pm 0.9$ ($1.7 \pm 0.6$) for the
81 < high \MET (high \Ht) signal regions, where the uncertainty includes the statistical uncertainty
82 < in the extraction of $K$ and $K_C$ and the 5\%  uncertainty in  the hadronic energy scale~\cite{ref:jes}.
65 > In  practice, we apply two corrections to this prediction, following the same procedure as in Ref.~\cite{ref:ospaper}.
66 > The first correction is $K_{50}=1.5 \pm 0.3$ ($1.3 \pm 0.2$) for the high \MET\ (high \Ht) signal region.
67 > The  second correction factor  is $K_C  = 1.5  \pm 0.5$ ($1.3 \pm 0.4$) for the
68 > high \MET (high \Ht) signal region.
69  
70   Our third background estimation method is based on the fact that many models of new physics
71 < produce an excess of SF with respect to OF lepton pairs. In SUSY, such an excess may be produced
72 < in the decay $\chi_2^0 \to \chi_1^0 \ell^+\ell^-$ or in the decay of $Z$ bosons produced in
73 < the cascade decays of heavy, colored objects. In contrast, for the \ttbar\ background the
74 < rates of SF and OF lepton pairs are the same, as is also the case for other SM backgrounds
89 < such as $W^+W^-$ or DY$\to\tau^+\tau^-$. We quantify the excess of SF vs. OF pairs using the
71 > produce an excess of SF with respect to OF lepton pairs, while for the \ttbar\ background the
72 > rates of SF and OF lepton pairs are the same. Hence we make use of the OF subtraction technique
73 > discussed in Sec.~\ref{sec:fit} in which we performed a shape analysis of the dilepton mass distribution.
74 > Here we perform a counting experiment, by quantifying the  the excess of SF vs. OF pairs using the
75   quantity
76  
77   \begin{equation}
78   \label{eq:ofhighpt}
79 < \Delta = R_{\mu e}N(ee) + \frac{1}{R_{\mu e}}N(\mu\mu) - N(e\mu),
79 > \Delta = R_{\mu e}N(ee) + \frac{1}{R_{\mu e}}N(\mu\mu) - N(e\mu).
80   \end{equation}
81  
82 < where $R_{\mu e} = 1.13 \pm 0.05$ is the ratio of muon to electron selection efficiencies,
82 > Here $R_{\mu e} = 1.13 \pm 0.05$ is the ratio of muon to electron selection efficiencies,
83   evaluated by taking the square root of the ratio of the number of
84   $Z \to \mu^+\mu^-$ to $Z \to e^+e^-$ events in data, in the mass range 76-106 GeV with no jets or
85   \met\ requirements. The quantity $\Delta$ is predicted to be 0 for processes with
# Line 108 | Line 93 | All background estimation methods based
93   in the control regions, which tends to decrease the significance of a signal
94   which may be present in the data by increasing the background prediction.
95   In general, it is difficult to quantify these effects because we
96 < do not know what signal may be present in the data.  Having two
96 > do not know what signal may be present in the data.  Having three
97   independent methods (in addition to expectations from MC)
98   adds redundancy because signal contamination can have different effects
99   in the different control regions for the two methods.

Diff Legend

Removed lines
+ Added lines
< Changed lines
> Changed lines