1 |
< |
\section{Background Estimates from Data} |
1 |
> |
\section{Counting Experiments} |
2 |
|
\label{sec:datadriven} |
3 |
|
|
4 |
|
To look for possible BSM contributions, we define 2 signal regions that preserve about |
12 |
|
For the high \MET\ (high \Ht) signal region, the MC predicts 2.6 (2.5) SM events, |
13 |
|
dominated by dilepton $t\bar{t}$; the expected LM1 yield is 17 (14) and the |
14 |
|
expected LM3 yield is 6.4 (6.7). The signal regions are indicated in Fig.~\ref{fig:met_ht}. |
15 |
+ |
These signal regions are tighter than the one used in our published 2010 analysis since |
16 |
+ |
with the larger data sample they give improved sensitivity to contributions from new physics. |
17 |
|
|
18 |
< |
We use three independent methods to estimate from data the background in the signal region. |
18 |
> |
We perform counting experiments in these signal regions, and use three independent methods to estimate from data the background in the signal region. |
19 |
|
The first method is a novel technique based on the ABCD method, which we used in our 2010 analysis~\cite{ref:ospaper}, |
20 |
|
and exploits the fact that \HT\ and $y \equiv \MET/\sqrt{H_T}$ are nearly uncorrelated for the $t\bar{t}$ background; |
21 |
|
this method is referred to as the ABCD' technique. First, we extract the $y$ and \Ht\ distributions |
23 |
|
Because $y$ and \Ht\ are weakly-correlated, the distribution of events in the $y$ vs. \Ht\ plane is described by: |
24 |
|
|
25 |
|
\begin{equation} |
26 |
+ |
\label{eq:abcdprime} |
27 |
|
\frac{\partial^2 N}{\partial y \partial H_T} = f(y)g(H_T), |
28 |
|
\end{equation} |
29 |
|
|
30 |
|
allowing us to deduce the number of events falling in any region of this plane. In particular, |
31 |
|
we can deduce the number of events falling in our signal regions defined by requirements on \MET\ and \Ht. |
32 |
|
|
33 |
< |
We measure the $f(y)$ and $g(H_T)$ distributions using events in the regions indicated in Fig.~\ref{fig:abcdprimedata} |
34 |
< |
Next, we randomly sample values of $y$ and \Ht\ from these distributions; each pair of $y$ and \Ht\ values is a pseudo-event. |
35 |
< |
We generate a large ensemble of pseudo-events, and find the ratio $R_{S/C}$, the ratio of the |
36 |
< |
number of pseudo-events falling in the signal region to the number of pseudo-events |
37 |
< |
falling in a control region defined by the same requirements used to select events |
38 |
< |
to measure $f(y)$ and $g(H_T)$. We then |
39 |
< |
multiply this ratio by the number events which fall in the control region in data |
40 |
< |
to get the predicted yield, ie. $N_{pred} = R_{S/C} \times N({\rm control})$. |
41 |
< |
To estimate the statistical uncertainty in the predicted background, we smear the bin contents |
42 |
< |
of $f(y)$ and $g(H_T)$ according to their uncertainties. We repeat the prediction 20 times |
43 |
< |
with these smeared distributions, and take the RMS of the deviation from the nominal prediction |
33 |
> |
We measure the $f(y)$ and $g(H_T)$ distributions using events in the regions indicated in Fig.~\ref{fig:abcdprimedata}, |
34 |
> |
and predict the background yields in the signal regions using Eq.~\ref{eq:abcdprime}. |
35 |
> |
%Next, we randomly sample values of $y$ and \Ht\ from these distributions; each pair of $y$ and \Ht\ values is a pseudo-event. |
36 |
> |
%We generate a large ensemble of pseudo-events, and find the ratio $R_{S/C}$, the ratio of the |
37 |
> |
%number of pseudo-events falling in the signal region to the number of pseudo-events |
38 |
> |
%falling in a control region defined by the same requirements used to select events |
39 |
> |
%to measure $f(y)$ and $g(H_T)$. We then |
40 |
> |
%multiply this ratio by the number events which fall in the control region in data |
41 |
> |
%to get the predicted yield, ie. $N_{pred} = R_{S/C} \times N({\rm control})$. |
42 |
> |
To estimate the statistical uncertainty in the predicted background, the bin contents |
43 |
> |
of $f(y)$ and $g(H_T)$ are smeared according to their Poisson uncertainties, the prediction is repeated 20 times |
44 |
> |
with these smeared distributions, and the RMS of the deviation from the nominal prediction is taken |
45 |
|
as the statistical uncertainty. We have studied this technique using toy MC studies based on |
46 |
|
event samples of similar size to the expected yield in data for 1 fb$^{-1}$. |
47 |
|
Based on these studies we correct the predicted background yields by factors of 1.2 $\pm$ 0.5 |
59 |
|
$\pt(\ell\ell)$ distribution to model the $\pt(\nu\nu)$ distribution, |
60 |
|
which is identified with \MET. Thus, we use the number of observed |
61 |
|
events with $\HT > 300\GeV$ and $\pt(\ell\ell) > 275\GeV$ |
62 |
< |
($\HT > 600\GeV$ and $\pt(\ell\ell) > 200\GeV^{1/2}$ ) |
62 |
> |
($\HT > 600\GeV$ and $\pt(\ell\ell) > 200\GeV$ ) |
63 |
|
to predict the number of background events with |
64 |
|
$\HT > 300\GeV$ and $\MET > 275\GeV$ ($\HT > 600\GeV$ and $\MET > 200\GeV$). |
65 |
< |
In practice, two corrections must be applied to this prediction, as described below. |
66 |
< |
|
67 |
< |
% |
68 |
< |
% Now describe the corrections |
65 |
< |
% |
66 |
< |
The first correction accounts for the $\MET > 50\GeV$ requirement in the |
67 |
< |
preselection, which is needed to reduce the DY background. We |
68 |
< |
rescale the prediction by a factor equal to the inverse of the |
69 |
< |
fraction of events passing the preselection which also satisfy the |
70 |
< |
requirement $\pt(\ell\ell) > 50\GeVc$. |
71 |
< |
For the \Ht\ $>$ 300 GeV requirement corresponding to the high \MET\ signal region, |
72 |
< |
we determine this correction from data and find $K_{50}=1.5 \pm 0.3$. |
73 |
< |
For the \Ht\ $>$ 600 GeV requirement corresponding to the high \Ht\ signal region, |
74 |
< |
we do not have enough events in data to determine this correction with statistical |
75 |
< |
precision, so we instead extract it from MC and find $K_{50}=1.3 \pm 0.2$. |
76 |
< |
The second correction ($K_C$) is associated with the known polarization of the $W$, which |
77 |
< |
introduces a difference between the $\pt(\ell\ell)$ and $\pt(\nu\nu)$ |
78 |
< |
distributions. The correction $K_C$ also takes into account detector effects such as the hadronic energy |
79 |
< |
scale and resolution which affect the \MET\ but not $\pt(\ell\ell)$. |
80 |
< |
The total correction factor is $K_{50} \times K_C = 2.2 \pm 0.9$ ($1.7 \pm 0.6$) for the |
81 |
< |
high \MET (high \Ht) signal regions, where the uncertainty includes the statistical uncertainty |
82 |
< |
in the extraction of $K$ and $K_C$ and the 5\% uncertainty in the hadronic energy scale~\cite{ref:jes}. |
65 |
> |
In practice, we apply two corrections to this prediction, following the same procedure as in Ref.~\cite{ref:ospaper}. |
66 |
> |
The first correction is $K_{50}=1.5 \pm 0.3$ ($1.3 \pm 0.2$) for the high \MET\ (high \Ht) signal region. |
67 |
> |
The second correction factor is $K_C = 1.5 \pm 0.5$ ($1.3 \pm 0.4$) for the |
68 |
> |
high \MET (high \Ht) signal region. |
69 |
|
|
70 |
|
Our third background estimation method is based on the fact that many models of new physics |
71 |
< |
produce an excess of SF with respect to OF lepton pairs. In SUSY, such an excess may be produced |
72 |
< |
in the decay $\chi_2^0 \to \chi_1^0 \ell^+\ell^-$ or in the decay of $Z$ bosons produced in |
73 |
< |
the cascade decays of heavy, colored objects. In contrast, for the \ttbar\ background the |
74 |
< |
rates of SF and OF lepton pairs are the same, as is also the case for other SM backgrounds |
89 |
< |
such as $W^+W^-$ or DY$\to\tau^+\tau^-$. We quantify the excess of SF vs. OF pairs using the |
71 |
> |
produce an excess of SF with respect to OF lepton pairs, while for the \ttbar\ background the |
72 |
> |
rates of SF and OF lepton pairs are the same. Hence we make use of the OF subtraction technique |
73 |
> |
discussed in Sec.~\ref{sec:fit} in which we performed a shape analysis of the dilepton mass distribution. |
74 |
> |
Here we perform a counting experiment, by quantifying the excess of SF vs. OF pairs using the |
75 |
|
quantity |
76 |
|
|
77 |
|
\begin{equation} |
78 |
|
\label{eq:ofhighpt} |
79 |
< |
\Delta = R_{\mu e}N(ee) + \frac{1}{R_{\mu e}}N(\mu\mu) - N(e\mu), |
79 |
> |
\Delta = R_{\mu e}N(ee) + \frac{1}{R_{\mu e}}N(\mu\mu) - N(e\mu). |
80 |
|
\end{equation} |
81 |
|
|
82 |
< |
where $R_{\mu e} = 1.13 \pm 0.05$ is the ratio of muon to electron selection efficiencies, |
98 |
< |
evaluated by taking the square root of the ratio of the number of |
99 |
< |
$Z \to \mu^+\mu^-$ to $Z \to e^+e^-$ events in data, in the mass range 76-106 GeV with no jets or |
100 |
< |
\met\ requirements. The quantity $\Delta$ is predicted to be 0 for processes with |
82 |
> |
This quantity is predicted to be 0 for processes with |
83 |
|
uncorrelated lepton flavors. In order for this technique to work, the kinematic selection |
84 |
|
applied to events in all dilepton flavor channels must be the same, which is not the case |
85 |
|
for our default selection because the $Z$ mass veto is applied only to same-flavor channels. |
90 |
|
in the control regions, which tends to decrease the significance of a signal |
91 |
|
which may be present in the data by increasing the background prediction. |
92 |
|
In general, it is difficult to quantify these effects because we |
93 |
< |
do not know what signal may be present in the data. Having two |
93 |
> |
do not know what signal may be present in the data. Having three |
94 |
|
independent methods (in addition to expectations from MC) |
95 |
|
adds redundancy because signal contamination can have different effects |
96 |
< |
in the different control regions for the two methods. |
96 |
> |
in the different control regions for the three methods. |
97 |
|
For example, in the extreme case of a |
98 |
|
BSM signal with identical distributions of $\pt(\ell \ell)$ and \MET, an excess of events might be seen |
99 |
|
in the ABCD' method but not in the $\pt(\ell \ell)$ method. |