[ViewVC] Diff of: cvsroot/UserCode/benhoob/cmsnotes/OSPAS2011/datadriven.tex

Comparing UserCode/benhoob/cmsnotes/OSPAS2011/datadriven.tex (file contents):
Revision 1.2 by benhoob, Mon Jun 13 16:39:03 2011 UTC vs.
Revision 1.8 by benhoob, Mon Jun 20 11:46:53 2011 UTC

#	Line 1 \| Line 1
1	<	\section{Background Estimates from Data}
1	>	\section{Counting Experiments}
2		\label{sec:datadriven}
3	<	We use three independent methods to estimate from data the background in the signal region.
4	<	The first method is a novel technique based on the ABCD method used in our 2010 analysis~\cite{ref:ospaper},
5	<	and exploits the fact that \HT\ and $y$ are nearly uncorrelated for the $t\bar{t}$ background;
3	>
4	>	To look for possible BSM contributions, we define 2 signal regions that reject all but
5	>	0.1\% of the dilepton $t\bar{t}$ events, by adding requirements of large \MET\ and \Ht:
6	>
7	>	\begin{itemize}
8	>	\item high \MET\ signal region: \MET\ $>$ 275~GeV, \Ht\ $>$ 300~GeV,
9	>	\item high \Ht\ signal region: \MET\ $>$ 200~GeV, \Ht\ $>$ 600~GeV.
10	>	\end{itemize}
11	>
12	>	For the high \MET\ (high \Ht) signal region, the MC predicts 2.6 (2.5) SM events,
13	>	dominated by dilepton $t\bar{t}$; the expected LM1 yield is 17 (14) and the
14	>	expected LM3 yield is 6.4 (6.7). The signal regions are indicated in Fig.~\ref{fig:met_ht}.
15	>	These signal regions are tighter than the one used in our published 2010 analysis since
16	>	with the larger data sample they allow us to explore phase space farther from the core
17	>	of the SM distributions.
18	>
19	>
20	>	We perform counting experiments in these signal regions, and use three independent methods to estimate from data the background in the signal region.
21	>	The first method is a novel technique which is a variation of the ABCD method, which we used in our 2010 analysis~\cite{ref:ospaper},
22	>	and exploits the fact that \HT\ and $y \equiv \MET/\sqrt{H_T}$ are nearly uncorrelated for the $t\bar{t}$ background;
23		this method is referred to as the ABCD' technique. First, we extract the $y$ and \Ht\ distributions
24		$f(y)$ and $g(H_T)$ from data, using events from control regions which are dominated by background.
25	<	Because $y$ and \Ht\ are weakly-correlated, we can predict the distribution of events in the $y$ vs. \Ht\ plane as:
25	>	Because $y$ and \Ht\ are weakly-correlated, the distribution of events in the $y$ vs. \Ht\ plane is described by:
26
27		\begin{equation}
28	+	\label{eq:abcdprime}
29		\frac{\partial^2 N}{\partial y \partial H_T} = f(y)g(H_T),
30		\end{equation}
31
32		allowing us to deduce the number of events falling in any region of this plane. In particular,
33		we can deduce the number of events falling in our signal regions defined by requirements on \MET\ and \Ht.
34
35	<	We measure the $f(y)$ and $g(H_T)$ distributions using events in the regions indicated in Fig.~\ref{fig:abcdprimedata}
36	<	Next, we randomly sample values of $y$ and \Ht\ from these distributions; each pair of $y$ and \Ht\ values is a pseudo-event.
37	<	We generate a large ensemble of pseudo-events, and find the ratio $R_{S/C}$, the ratio of the
38	<	number of pseudo-events falling in the signal region to the number of pseudo-events
39	<	falling in a control region defined by the same requirements used to select events
40	<	to measure $f(y)$ and $g(H_T)$. We then
41	<	multiply this ratio by the number of \ttbar\ MC events which fall in the control region
42	<	to get the predicted yield, ie. $N_{pred} = R_{S/C} \times N({\rm control})$.
43	<	To estimate the statistical uncertainty in the predicted background, we smear the bin contents
44	<	of $f(y)$ and $g(H_T)$ according to their uncertainties. We repeat the prediction 20 times
45	<	with these smeared distributions, and take the RMS of the deviation from the nominal prediction
46	<	as the statistical uncertainty. We have studied this technique using toy MC studies based on
47	<	similar event samples of similar size to the expected yield in data for 1 fb$^{-1}$.
48	<	Based on these studies we correct the predicted backgrounds yields by factors of 1.2 $\pm$ 0.5
35	>	We measure the $f(y)$ and $g(H_T)$ distributions using events in the regions indicated in Fig.~\ref{fig:abcdprimedata},
36	>	and predict the background yields in the signal regions using Eq.~\ref{eq:abcdprime}.
37	>	%Next, we randomly sample values of $y$ and \Ht\ from these distributions; each pair of $y$ and \Ht\ values is a pseudo-event.
38	>	%We generate a large ensemble of pseudo-events, and find the ratio $R_{S/C}$, the ratio of the
39	>	%number of pseudo-events falling in the signal region to the number of pseudo-events
40	>	%falling in a control region defined by the same requirements used to select events
41	>	%to measure $f(y)$ and $g(H_T)$. We then
42	>	%multiply this ratio by the number events which fall in the control region in data
43	>	%to get the predicted yield, ie. $N_{pred} = R_{S/C} \times N({\rm control})$.
44	>	To estimate the statistical uncertainty in the predicted background, the bin contents
45	>	of $f(y)$ and $g(H_T)$ are smeared according to their Poisson uncertainties.
46	>	We have studied this technique using toy MC studies based on
47	>	event samples of similar size to the expected yield in data for 1 fb$^{-1}$.
48	>	Based on these studies we correct the predicted background yields by factors of 1.2 $\pm$ 0.5
49		(1.0 $\pm$ 0.5) for the high \MET\ (high \Ht) signal region.
50
51
#	Line 41 \| Line 59 \| decays in the SM~\cite{Wpolarization,Wpo
59		reliably accounted for. We then use the observed
60		$\pt(\ell\ell)$ distribution to model the $\pt(\nu\nu)$ distribution,
61		which is identified with \MET. Thus, we use the number of observed
62	<	events with $\HT > 300\GeV$ and $\pt(\ell\ell) > 275\GeV^{1/2}$
63	<	($\HT > 600\GeV$ and $\pt(\ell\ell) > 200\GeV^{1/2}$ )
62	>	events with $\HT > 300\GeV$ and $\pt(\ell\ell) > 275\GeV$
63	>	($\HT > 600\GeV$ and $\pt(\ell\ell) > 200\GeV$ )
64		to predict the number of background events with
65	<	$\HT > 300\GeV$ and $\MET = > 275\GeV^{1/2}$ ($\HT > 600\GeV$ and $\MET = > 200\GeV^{1/2}$).
66	<	In practice, two corrections must be applied to this prediction, as described below.
67	<
68	<	%
69	<	% Now describe the corrections
52	<	%
53	<	The first correction accounts for the $\MET > 50\GeV$ requirement in the
54	<	preselection, which is needed to reduce the DY background. We
55	<	rescale the prediction by a factor equal to the inverse of the
56	<	fraction of events passing the preselection which also satisfy the
57	<	requirement $\pt(\ell\ell) > 50\GeVc$.
58	<	For the \Ht $>$ 300 GeV requirement corresponding to the high \MET\ signal region,
59	<	we determine this correction from data and find $K_{50}=1.5 \pm 0.3$.
60	<	For the \Ht $>$ 600 GeV requirement corresponding to the high \Ht\ signal region,
61	<	we do not have enough events in data to determine this correction with statistical
62	<	precisions, so we instead extract it from MC and find $K_{50}=1.3 \pm 0.2$.
63	<	The second correction ($K_C$) is associated with the known polarization of the $W$, which
64	<	introduces a difference between the $\pt(\ell\ell)$ and $\pt(\nu\nu)$
65	<	distributions. The correction $K_C$ also takes into account detector effects such as the hadronic energy
66	<	scale and resolution which affect the \MET\ but not $\pt(\ell\ell)$.
67	<	The total correction factor is $K_{50} \times K_C = 2.2 \pm 0.9$ ($1.7 \pm 0.6$) for the
68	<	high \MET (high \Ht) signal regions, where the uncertainty includes the MC statistical uncertainty
69	<	in the extraction of $K_C$ and the 5\% uncertainty in the hadronic energy scale~\cite{ref:jes}.
65	>	$\HT > 300\GeV$ and $\MET > 275\GeV$ ($\HT > 600\GeV$ and $\MET > 200\GeV$).
66	>	In practice, we apply two corrections to this prediction, following the same procedure as in Ref.~\cite{ref:ospaper}.
67	>	The first correction is $K_{50}=1.5 \pm 0.3$ ($1.3 \pm 0.2$) for the high \MET\ (high \Ht) signal region.
68	>	The second correction factor is $K_C = 1.5 \pm 0.5$ ($1.3 \pm 0.4$) for the
69	>	high \MET (high \Ht) signal region.
70
71		Our third background estimation method is based on the fact that many models of new physics
72	<	produce an excess of SF with respect to OF lepton pairs. In SUSY, such an excess may produced
73	<	in the decay $\chi_2^0 \to \chi_1^0 \ell^+\ell^-$ or in the decay of $Z$ bosons produced in
74	<	the cascade decays of heavy, colored objects. In contrast, for the \ttbar\ background the
75	<	rates of SF and OF lepton pairs are the same, as is also the case for other SM backgrounds
76	<	such as $W^+W^-$ or DY$\to\tau^+\tau^-$. We quantify the excess of SF vs. OF pairs using the
72	>	produce an excess of SF with respect to OF lepton pairs, while for the \ttbar\ background the
73	>	rates of SF and OF lepton pairs are the same. Hence we make use of the OF subtraction technique
74	>	discussed in Sec.~\ref{sec:fit} in which we performed a shape analysis of the dilepton mass distribution.
75	>	Here we perform a counting experiment, by quantifying the excess of SF vs. OF pairs using the
76		quantity
77
78		\begin{equation}
79		\label{eq:ofhighpt}
80	<	\Delta = R_{\mu e}N(ee) + \frac{1}{R_{\mu e}}N(\mu\mu) - N(e\mu),
80	>	\Delta = R_{\mu e}N(ee) + \frac{1}{R_{\mu e}}N(\mu\mu) - N(e\mu).
81		\end{equation}
82
83	<	where $R_{\mu e} = 1.13 \pm 0.05$ is the ratio of muon to electron selection efficiencies.
85	<	This quantity is evaluated by taking the square root of the ratio of the number of observed
86	<	$Z \to \mu^+\mu^-$ to $Z \to e^+e^-$ events, in the mass range 76-106 GeV with no jets or
87	<	\met\ requirements. The quantity $\Delta$ is predicted to be 0 for processes with
83	>	This quantity is predicted to be 0 for processes with
84		uncorrelated lepton flavors. In order for this technique to work, the kinematic selection
85		applied to events in all dilepton flavor channels must be the same, which is not the case
86	<	for our default selection because the $Z$ mass veto is applied only to same-flavor channels.Therefore when applying the OF subtraction technique we also apply the $Z$ mass veto also
86	>	for our default selection because the $Z$ mass veto is applied only to same-flavor channels.
87	>	Therefore when applying the OF subtraction technique we also apply the $Z$ mass veto
88		to the $e\mu$ channel.
89
90		All background estimation methods based on data are in principle subject to signal contamination
91		in the control regions, which tends to decrease the significance of a signal
92		which may be present in the data by increasing the background prediction.
93		In general, it is difficult to quantify these effects because we
94	<	do not know what signal may be present in the data. Having two
94	>	do not know what signal may be present in the data. Having three
95		independent methods (in addition to expectations from MC)
96		adds redundancy because signal contamination can have different effects
97	<	in the different control regions for the two methods.
97	>	in the different control regions for the three methods.
98		For example, in the extreme case of a
99		BSM signal with identical distributions of $\pt(\ell \ell)$ and \MET, an excess of events might be seen
100	<	in the ABCD method but not in the $\pt(\ell \ell)$ method.
100	>	in the ABCD' method but not in the $\pt(\ell \ell)$ method.
101

Diff Legend

-–
+Removed lines
-+
+Added lines
-<
+Changed lines
->
+Changed lines

Comparing UserCode/benhoob/cmsnotes/OSPAS2011/datadriven.tex (file contents): Revision 1.2 by benhoob, Mon Jun 13 16:39:03 2011 UTC vs. Revision 1.8 by benhoob, Mon Jun 20 11:46:53 2011 UTC

Diff Legend

Comparing UserCode/benhoob/cmsnotes/OSPAS2011/datadriven.tex (file contents):
Revision 1.2 by benhoob, Mon Jun 13 16:39:03 2011 UTC vs.
Revision 1.8 by benhoob, Mon Jun 20 11:46:53 2011 UTC