ViewVC Help
View File | Revision Log | Show Annotations | Root Listing
root/cvsroot/UserCode/benhoob/cmsnotes/OSPAS2011/datadriven.tex
Revision: 1.1
Committed: Mon Jun 13 12:37:13 2011 UTC (13 years, 11 months ago) by benhoob
Content type: application/x-tex
Branch: MAIN
Log Message:
Initial commit

File Contents

# User Rev Content
1 benhoob 1.1 \section{Background Estimates from Data}
2     \label{sec:datadriven}
3     We use three independent methods to estimate from data the background in the signal region.
4     The first method is a novel technique based on the ABCD method used in our 2010 analysis~\cite{ref:ospaper},
5     and exploits the fact that \HT\ and $y$ are nearly uncorrelated for the $t\bar{t}$ background;
6     this method is referred to as the ABCD' technique. First, we extract the $y$ and \Ht\ distributions
7     $f(y)$ and $g(H_T)$ from data, using events from control regions which are dominated by background.
8     Because $y$ and \Ht\ are weakly-correlated, we can predict the distribution of events in the $y$ vs. \Ht\ plane as:
9    
10     \begin{equation}
11     \frac{\partial^2 N}{\partial y \partial H_T} = f(y)g(H_T),
12     \end{equation}
13    
14     allowing us to deduce the number of events falling in any region of this plane. In particular,
15     we can deduce the number of events falling in our signal regions defined by requirements on \MET\ and \Ht.
16    
17     We measure the $f(y)$ and $g(H_T)$ distributions using events in the regions indicated in Fig.~\ref{fig:abcdprime}
18     Next, we randomly sample values of $y$ and \Ht\ from these distributions; each pair of $y$ and \Ht\ values is a pseudo-event.
19     We generate a large ensemble of pseudo-events, and find the ratio $R_{S/C}$, the ratio of the
20     number of pseudo-events falling in the signal region to the number of pseudo-events
21     falling in a control region defined by the same requirements used to select events
22     to measure $f(y)$ and $g(H_T)$. We then
23     multiply this ratio by the number of \ttbar\ MC events which fall in the control region
24     to get the predicted yield, ie. $N_{pred} = R_{S/C} \times N({\rm control})$.
25     To estimate the statistical uncertainty in the predicted background, we smear the bin contents
26     of $f(y)$ and $g(H_T)$ according to their uncertainties. We repeat the prediction 20 times
27     with these smeared distributions, and take the RMS of the deviation from the nominal prediction
28     as the statistical uncertainty. We have studied this technique using toy MC studies based on
29     similar event samples of similar size to the expected yield in data for 1 fb$^{-1}$.
30     Based on these studies we correct the predicted backgrounds yields by factors of 1.2 $\pm$ 0.5
31     (1.0 $\pm$ 0.5) for the high \MET\ (high \Ht) signal region.
32    
33    
34     The second background estimate, henceforth referred to as the dilepton transverse momentum ($\pt(\ell\ell)$) method,
35     is based on the idea~\cite{ref:victory} that in dilepton $t\bar{t}$ events the
36     \pt\ distributions of the charged leptons and neutrinos from $W$
37     decays are related, because of the common boosts from the top and $W$
38     decays. This relation is governed by the polarization of the $W$'s,
39     which is well understood in top
40     decays in the SM~\cite{Wpolarization,Wpolarization2} and can therefore be
41     reliably accounted for. We then use the observed
42     $\pt(\ell\ell)$ distribution to model the $\pt(\nu\nu)$ distribution,
43     which is identified with \MET. Thus, we use the number of observed
44     events with $\HT > 300\GeV$ and $\pt(\ell\ell) > 275\GeV^{1/2}$
45     ($\HT > 600\GeV$ and $\pt(\ell\ell) > 200\GeV^{1/2}$ )
46     to predict the number of background events with
47     $\HT > 300\GeV$ and $\MET = > 275\GeV^{1/2}$ ($\HT > 600\GeV$ and $\MET = > 200\GeV^{1/2}$).
48     In practice, two corrections must be applied to this prediction, as described below.
49    
50     %
51     % Now describe the corrections
52     %
53     The first correction accounts for the $\MET > 50\GeV$ requirement in the
54     preselection, which is needed to reduce the DY background. We
55     rescale the prediction by a factor equal to the inverse of the
56     fraction of events passing the preselection which also satisfy the
57     requirement $\pt(\ell\ell) > 50\GeVc$.
58     For the \Ht $>$ 300 GeV requirement corresponding to the high \MET\ signal region,
59     we determine this correction from data and find $K_{50}=1.5 \pm 0.3$.
60     For the \Ht $>$ 600 GeV requirement corresponding to the high \Ht\ signal region,
61     we do not have enough events in data to determine this correction with statistical
62     precisions, so we instead extract it from MC and find $K_{50}=1.3 \pm 0.2$.
63     The second correction ($K_C$) is associated with the known polarization of the $W$, which
64     introduces a difference between the $\pt(\ell\ell)$ and $\pt(\nu\nu)$
65     distributions. The correction $K_C$ also takes into account detector effects such as the hadronic energy
66     scale and resolution which affect the \MET\ but not $\pt(\ell\ell)$.
67     The total correction factor is $K_{50} \times K_C = 2.2 \pm 0.9$ ($1.7 \pm 0.6$) for the
68     high \MET (high \Ht) signal regions, where the uncertainty includes the MC statistical uncertainty
69     in the extraction of $K_C$ and the 5\% uncertainty in the hadronic energy scale~\cite{ref:jes}.
70    
71     Our third background estimation method is based on the fact that many models of new physics
72     produce an excess of SF with respect to OF lepton pairs. In SUSY, such an excess may produced
73     in the decay $\chi_2^0 \to \chi_1^0 \ell^+\ell^-$ or in the decay of $Z$ bosons produced in
74     the cascade decays of heavy, colored objects. In contrast, for the \ttbar\ background the
75     rates of SF and OF lepton pairs are the same, as is also the case for other SM backgrounds
76     such as $W^+W^-$ or DY$\to\tau^+\tau^-$. We quantify the excess of SF vs. OF pairs using the
77     quantity
78    
79     \begin{equation}
80     \label{eq:ofhighpt}
81     \Delta = R_{\mu e}N(ee) + \frac{1}{R_{\mu e}}N(\mu\mu) - N(e\mu),
82     \end{equation}
83    
84     where $R_{\mu e} = 1.13 \pm 0.05$ is the ratio of muon to electron selection efficiencies.
85     This quantity is evaluated by taking the square root of the ratio of the number of observed
86     $Z \to \mu^+\mu^-$ to $Z \to e^+e^-$ events, in the mass range 76-106 GeV with no jets or
87     \met\ requirements. The quantity $\Delta$ is predicted to be 0 for processes with
88     uncorrelated lepton flavors. In order for this technique to work, the kinematic selection
89     applied to events in all dilepton flavor channels must be the same, which is not the case
90     for our default selection because the $Z$ mass veto is applied only to same-flavor channels.Therefore when applying the OF subtraction technique we also apply the $Z$ mass veto also
91     to the $e\mu$ channel.
92    
93     All background estimation methods based on data are in principle subject to signal contamination
94     in the control regions, which tends to decrease the significance of a signal
95     which may be present in the data by increasing the background prediction.
96     In general, it is difficult to quantify these effects because we
97     do not know what signal may be present in the data. Having two
98     independent methods (in addition to expectations from MC)
99     adds redundancy because signal contamination can have different effects
100     in the different control regions for the two methods.
101     For example, in the extreme case of a
102     BSM signal with identical distributions of $\pt(\ell \ell)$ and \MET, an excess of events might be seen
103     in the ABCD method but not in the $\pt(\ell \ell)$ method.
104