Friis/TancNote/outline.tex

\documentclass{article}

%\title{New techniques for decay mode reconstruction and identification of
%hadronic tau lepton decays [outline]}
\title{The Tau Neural Classifier algorithm: tau identification and decay mode reconstruction using neural networks}
\author{Evan K. Friis}

\begin{document}

\maketitle 
\tableofcontents 

\abstract{
The Tau Neural Classifier (TaNC) is a novel algorithm for identification of
hadronic tau decays.  The algorithm includes two components, the reconstruction
of tau lepton hadronic decay modes and discrimination of tau lepton hadronic
decays from quark and gluon jets.  The reconstruction of decay modes is based
on the reconstruction of individual charged hadrons and photons by the
particle--flow algorithm and is utilized in the discrimination to train a set
of neural networks using input variables that are sensitive to particular decay
modes.  We observe a significant improvement in identification performance in
comparison to previous algorithms.  }

\section{Introduction}

A good tau identification performance is important for the discovery potential
of many possible new physics signals at the LHC.
\begin{itemize}
\item typically are signal processes
\item quark and gluon jets produced with significantly larger cross--sections
\item efficient identification of hadronic tau decays and low misidentification rate for quarks and gluons
      thus essential for many searches for new physics
\end{itemize}

New physics signals may be discovered via tau lepton hadronic decays in early CMS data.
\begin{itemize}
\item for example, MSSM Higgs to production cross--section of which is enhanced by tan(beta)
\item but also for discovery of Standard Model Higgs, a good tau identification performance is important,
      as Higgs $\rightarrow$ tau decays have the second largest branching fraction
\end{itemize}

Tau leptons are unique in that they are the only type of leptons which are heavy enough to decay to hadrons.
\begin{itemize}
\item lifetime $c \cdot \tau = 87 \mu$~m
\item BR(e) ~ BR(mu) ~ 17%
\item BR(hadrons) ~ 65%;
      mostly either one or three charged pions plus zero to two neutral pions,
      which almost instantaneously decay to photons
\end{itemize}

In this note, we will concentrate on the identification of hadronic tau decays.
\begin{itemize}
\item tau decays to electrons and muons are difficult to distinguish from electrons and muons produced in $pp$ collision
     (strategy depends on analysis, tau decays to electrons and muons typically identified by requiring
      two leptons of different flavor)
\item discrimination of hadronic tau decays from electrons and muons is described in PFT--08--001
\item ``signal'' signature the identification of which we aim to improve with the Tau Neural Classifier (TaNC)
      is collimated jet containing either one or three tracks reconstructed in Pixel and silicon Strip tracker,
      plus low number of neutral electromagnetic showers reconstructed in the ECAL
\end{itemize}

\subsection{TaNC motivation}
The different hadronic decay modes of the tau come from different resonance.  Provides
additional information.  Can re-frame the search into search for rhos, a1s, etc.
\begin{itemize}
   \item Each decay mode has a different topology and different possibilities
      for discrimination.
   \item The tau decay can have 1 || 3 pions and a number of pi0s.
   \item Each decay mode multiplicity maps directly to a resonance (@ 95\%
      level)
   \item This note presents two complimentary techniques: a method to
      reconstruct the decay mode and an ensemble of neural network discriminants 
      used to classify tau--candidates.
   \item Plot: True visible invariant mass for different decay modes
\end{itemize}

\section{Decay Mode Reconstruction}
The signal 
CV: add reference to shrinking cone note CMS AN--2008/026
cone photons are merged into candidate pi0s and the candidates are
subject to a minimum pT quality requirement to remove contamination from various
sources.
\begin{itemize}
   \item pi0s undergo prompt decay to photons.
   \item The number of photons present in the signal cone has a long tail due to
      UE, PU, showers, photon conversions.
   \item Plot: number of photons versus number of pi-zeros
\end{itemize}

\subsection{Photon Merging}
Photons are merged into composite pi0s by looking at the invariant mass of each
combination of photons.
\begin{itemize}
   \item Only photon pairs that have mass less than 0.2GeV are considered.
   \item CMS Ecal granularity and particle flow clustering provide excellent
      resolution.
   \item Plot: di photon mass for decay mode 1.
\end{itemize}

\subsection{Quality requirements}
To remove contamination from pile-up and underlying event, a minimum pt quality
requirement is applied to the remaining photon candidates.
\begin{itemize}
   \item The lowest pt photon is required to carry 10\% of the composite visible
      pt
   \item This removes contaminant photons while preserving single photons that
      correspond to pi0s
   \item Plots: photon pt fraction for DM0 and DM1
\end{itemize}

\subsection{Results}
The decay mode reconstruction algorithm dramatically improves the determination
of the decay mode.
\begin{itemize}
   \item Tails removed
   \item Mean improved
   \item Plot: correlation plot
\end{itemize}

The distribution of the decay modes is different for signal and background.  The
decay mode determination is slightly dependent on pt and eta.

\begin{itemize}
   \item pt turn on curve is due to pt quality thresholds and cone size
   \item Blowup of 1prong1pi0 fraction at eta = 2.5 due to loss of tracker + no
      loss of ECAL?  
   \item NB that the distribution of the decay modes is another handle that the
      TaNC has.
   \item Plot: Decay mode for sig/bkg vs. pt and eta
\end{itemize}

\section{Neural network classification}
For each decay mode, a different neural network is used.  
\begin{itemize}
   \item The five decay modes we use constitute 95\% of hadronic decays.
   \item Table of the five decays
   \item Other decay modes are discarded.
   \item Each neural net has inputs that are specific to that decay mode.
   \item Each neural net is trained on a tau--candidates reconstructed with the
      associated decay mode.
   \item During final discrimination, the neural network associated with the
      reconstructed decay mode of the tau candidate is used to do the
      classification.
   \item Since five neural networks are used a strategy must be used to select
      the cut used on each neural network output.
\end{itemize}

\subsection{Neural network discriminants}
The neural networks use 
%discriminants 
as input variables observables 
specific to each decay mode.
%Discriminants 
The observables are listed in the appendix.  
Common 
%discriminants 
observables include:
\begin{itemize}
   \item Pt/Eta
   \item Invariant mass
   \item Pt and DR from axis of signal objects
   \item Pt and DR from axis of isolation objects
   \item Number of charged isolation objects
   \item Sum charged pt in isolation
   \item For three body decays, the two dalitz variables
   \item Include separation and correlation plots for all variables?
CV: yes, please (in appendix)
\end{itemize}

\subsection{Neural network training} 

The signal and background samples are split into five subsamples corresponding
to each decay mode.   
\begin{itemize}
   \item Ztautau matched to hadronic taus for signal, QCD Dijet for bkg
   \item The leading pion pt requirement is applied.
   \item Table of signal/background training events for each mode.
\end{itemize}

The decay mode is dependent on pt and eta and this dependence must be invisible
to the neural network.
\begin{itemize}
   \item The kinematics are very different for signal/background
   \item We want to prevent the NN from training on these differences
   \item Weighting is applied so the weighted pt/eta distributions are identical
   \item Since the probability for a given decay mode to occur is kinematically
      dependent, the weighting is applied to the subset of the sample that
      corresponds to ensemble of allowed decay modes.
\end{itemize}

The neural networks are implemented as TMVA back-propagating neural networks.
\begin{itemize}
   \item Number of hidden nodes = Kolmogorov function N + 1 (2*N + 1)
   \item 500 training epochs, testing for over-training every ten
   \item No over-training is detected. (need plots?)
CV: yes, please show NN output error on training and on validation dataset 
(two curves overlayed on same plot which has training epoch on the x--axis and NN output error on the y--axis)
for at least one of the decay modes/neural networks (as example)
\end{itemize}

\subsection{Individual neural network performance} 
The separation power of the individual neural net is different.  The ultimate separation
power of the algorithm depends on both the individual neural net separation
performance and decay mode distribution differences between signal and
background.
\begin{itemize}
   \item Plots of each decay mode separation
   \item Example: 1prong1pi0 has no discrimination power for isolated OneProng
      QCD
\end{itemize}

\subsection{Neural network output selections}
Since there are five neural networks, a discrimination working point requires
selection of a point in five-D space.
\begin{itemize}
   \item Monte Carlo cut point selection
   \item A 5D point is added to the performance curve if it has a higher
      signal efficiency than the current point with the same background mis-tag
      rate.
   \item Separate samples are used for selecting the 5D curve, and evaluating
      its performance.
\end{itemize}

The 5D performance curve can also be parameterized by using the probability for a
tau--candidate to be identified for a given decay mode.
\begin{itemize}
   \item The method transforms the output of each neural net according to the
      decay mode probability
   \item The decay mode probability is dependent on pt/eta
   \item Derivation of transform
   \item Net discriminant output is now a single continuous variable
   \item Recommended method of using the TaNC
   \item Plot: comparison of transform to MC-determined optimal curve
\end{itemize}

\subsection{Algorithm Performance}
The TaNC algorithm identifies true hadronic tau decays with a much higher purity
than algorithms previously used in CMS analyses.
\begin{itemize}
   \item Plot: performance curve
   \item With transform, cut is a continuous variable
   \item Comparison with shrinking/fixed cone
\end{itemize}

\section{Future work}
The TaNC algorithm has been optimized for the initial stages of LHC operation.
\begin{itemize}
   \item Will need to be retrained when luminosity changes
   \item Once enough data comes, backgrounds will be trained with data events
\end{itemize}

\end{document}
Revision:	1.4
Committed:	Wed Mar 31 01:22:22 2010 UTC (15 years, 1 month ago) by friis
Content type:	application/x-tex
Branch:	MAIN
CVS Tags:	HEAD
Changes since 1.3:	+13 -59 lines
Log Message:	Slow but steady
#	User	Rev	Content
1	friis	1.1	\documentclass{article}
2
3	veelken	1.3	%\title{New techniques for decay mode reconstruction and identification of
4			%hadronic tau lepton decays [outline]}
5			\title{The Tau Neural Classifier algorithm: tau identification and decay mode reconstruction using neural networks}
6	friis	1.1	\author{Evan K. Friis}
7
8			\begin{document}
9
10			\maketitle
11			\tableofcontents
12
13	friis	1.2	\abstract{
14	friis	1.4	The Tau Neural Classifier (TaNC) is a novel algorithm for identification of
15			hadronic tau decays. The algorithm includes two components, the reconstruction
16			of tau lepton hadronic decay modes and discrimination of tau lepton hadronic
17			decays from quark and gluon jets. The reconstruction of decay modes is based
18			on the reconstruction of individual charged hadrons and photons by the
19			particle--flow algorithm and is utilized in the discrimination to train a set
20			of neural networks using input variables that are sensitive to particular decay
21			modes. We observe a significant improvement in identification performance in
22			comparison to previous algorithms. }
23	friis	1.2
24	friis	1.1	\section{Introduction}
25	veelken	1.3
26	friis	1.4	A good tau identification performance is important for the discovery potential
27			of many possible new physics signals at the LHC.
28	veelken	1.3	\begin{itemize}
29			\item typically are signal processes
30			\item quark and gluon jets produced with significantly larger cross--sections
31			\item efficient identification of hadronic tau decays and low misidentification rate for quarks and gluons
32			thus essential for many searches for new physics
33			\end{itemize}
34
35			New physics signals may be discovered via tau lepton hadronic decays in early CMS data.
36			\begin{itemize}
37			\item for example, MSSM Higgs to production cross--section of which is enhanced by tan(beta)
38			\item but also for discovery of Standard Model Higgs, a good tau identification performance is important,
39			as Higgs $\rightarrow$ tau decays have the second largest branching fraction
40			\end{itemize}
41
42			Tau leptons are unique in that they are the only type of leptons which are heavy enough to decay to hadrons.
43			\begin{itemize}
44			\item lifetime $c \cdot \tau = 87 \mu$~m
45			\item BR(e) ~ BR(mu) ~ 17%
46			\item BR(hadrons) ~ 65%;
47			mostly either one or three charged pions plus zero to two neutral pions,
48	friis	1.4	which almost instantaneously decay to photons
49	veelken	1.3	\end{itemize}
50
51			In this note, we will concentrate on the identification of hadronic tau decays.
52			\begin{itemize}
53			\item tau decays to electrons and muons are difficult to distinguish from electrons and muons produced in $pp$ collision
54			(strategy depends on analysis, tau decays to electrons and muons typically identified by requiring
55	friis	1.4	two leptons of different flavor)
56	veelken	1.3	\item discrimination of hadronic tau decays from electrons and muons is described in PFT--08--001
57			\item ``signal'' signature the identification of which we aim to improve with the Tau Neural Classifier (TaNC)
58			is collimated jet containing either one or three tracks reconstructed in Pixel and silicon Strip tracker,
59			plus low number of neutral electromagnetic showers reconstructed in the ECAL
60	friis	1.2	\end{itemize}
61	friis	1.1
62			\subsection{TaNC motivation}
63	friis	1.2	The different hadronic decay modes of the tau come from different resonance. Provides
64			additional information. Can re-frame the search into search for rhos, a1s, etc.
65			\begin{itemize}
66			\item Each decay mode has a different topology and different possibilities
67			for discrimination.
68			\item The tau decay can have 1 \|\| 3 pions and a number of pi0s.
69			\item Each decay mode multiplicity maps directly to a resonance (@ 95\%
70			level)
71			\item This note presents two complimentary techniques: a method to
72			reconstruct the decay mode and an ensemble of neural network discriminants
73			used to classify tau--candidates.
74			\item Plot: True visible invariant mass for different decay modes
75			\end{itemize}
76
77			\section{Decay Mode Reconstruction}
78	veelken	1.3	The signal
79			CV: add reference to shrinking cone note CMS AN--2008/026
80			cone photons are merged into candidate pi0s and the candidates are
81	friis	1.2	subject to a minimum pT quality requirement to remove contamination from various
82			sources.
83			\begin{itemize}
84			\item pi0s undergo prompt decay to photons.
85			\item The number of photons present in the signal cone has a long tail due to
86			UE, PU, showers, photon conversions.
87			\item Plot: number of photons versus number of pi-zeros
88			\end{itemize}
89	friis	1.1
90			\subsection{Photon Merging}
91	friis	1.2	Photons are merged into composite pi0s by looking at the invariant mass of each
92			combination of photons.
93			\begin{itemize}
94			\item Only photon pairs that have mass less than 0.2GeV are considered.
95			\item CMS Ecal granularity and particle flow clustering provide excellent
96			resolution.
97			\item Plot: di photon mass for decay mode 1.
98			\end{itemize}
99
100			\subsection{Quality requirements}
101			To remove contamination from pile-up and underlying event, a minimum pt quality
102			requirement is applied to the remaining photon candidates.
103			\begin{itemize}
104			\item The lowest pt photon is required to carry 10\% of the composite visible
105			pt
106			\item This removes contaminant photons while preserving single photons that
107			correspond to pi0s
108			\item Plots: photon pt fraction for DM0 and DM1
109			\end{itemize}
110
111			\subsection{Results}
112			The decay mode reconstruction algorithm dramatically improves the determination
113			of the decay mode.
114			\begin{itemize}
115			\item Tails removed
116			\item Mean improved
117			\item Plot: correlation plot
118			\end{itemize}
119
120			The distribution of the decay modes is different for signal and background. The
121			decay mode determination is slightly dependent on pt and eta.
122
123			\begin{itemize}
124			\item pt turn on curve is due to pt quality thresholds and cone size
125			\item Blowup of 1prong1pi0 fraction at eta = 2.5 due to loss of tracker + no
126			loss of ECAL?
127			\item NB that the distribution of the decay modes is another handle that the
128			TaNC has.
129			\item Plot: Decay mode for sig/bkg vs. pt and eta
130			\end{itemize}
131	friis	1.1
132			\section{Neural network classification}
133	friis	1.2	For each decay mode, a different neural network is used.
134			\begin{itemize}
135			\item The five decay modes we use constitute 95\% of hadronic decays.
136			\item Table of the five decays
137			\item Other decay modes are discarded.
138			\item Each neural net has inputs that are specific to that decay mode.
139			\item Each neural net is trained on a tau--candidates reconstructed with the
140			associated decay mode.
141			\item During final discrimination, the neural network associated with the
142			reconstructed decay mode of the tau candidate is used to do the
143			classification.
144			\item Since five neural networks are used a strategy must be used to select
145			the cut used on each neural network output.
146			\end{itemize}
147
148			\subsection{Neural network discriminants}
149	veelken	1.3	The neural networks use
150			%discriminants
151			as input variables observables
152			specific to each decay mode.
153			%Discriminants
154			The observables are listed in the appendix.
155			Common
156			%discriminants
157			observables include:
158	friis	1.2	\begin{itemize}
159			\item Pt/Eta
160			\item Invariant mass
161			\item Pt and DR from axis of signal objects
162			\item Pt and DR from axis of isolation objects
163			\item Number of charged isolation objects
164			\item Sum charged pt in isolation
165			\item For three body decays, the two dalitz variables
166			\item Include separation and correlation plots for all variables?
167	veelken	1.3	CV: yes, please (in appendix)
168	friis	1.2	\end{itemize}
169
170			\subsection{Neural network training}
171
172			The signal and background samples are split into five subsamples corresponding
173			to each decay mode.
174			\begin{itemize}
175			\item Ztautau matched to hadronic taus for signal, QCD Dijet for bkg
176			\item The leading pion pt requirement is applied.
177			\item Table of signal/background training events for each mode.
178			\end{itemize}
179
180			The decay mode is dependent on pt and eta and this dependence must be invisible
181			to the neural network.
182			\begin{itemize}
183			\item The kinematics are very different for signal/background
184			\item We want to prevent the NN from training on these differences
185			\item Weighting is applied so the weighted pt/eta distributions are identical
186			\item Since the probability for a given decay mode to occur is kinematically
187			dependent, the weighting is applied to the subset of the sample that
188			corresponds to ensemble of allowed decay modes.
189			\end{itemize}
190
191			The neural networks are implemented as TMVA back-propagating neural networks.
192			\begin{itemize}
193			\item Number of hidden nodes = Kolmogorov function N + 1 (2*N + 1)
194			\item 500 training epochs, testing for over-training every ten
195			\item No over-training is detected. (need plots?)
196	veelken	1.3	CV: yes, please show NN output error on training and on validation dataset
197			(two curves overlayed on same plot which has training epoch on the x--axis and NN output error on the y--axis)
198			for at least one of the decay modes/neural networks (as example)
199	friis	1.2	\end{itemize}
200
201			\subsection{Individual neural network performance}
202			The separation power of the individual neural net is different. The ultimate separation
203			power of the algorithm depends on both the individual neural net separation
204			performance and decay mode distribution differences between signal and
205			background.
206			\begin{itemize}
207			\item Plots of each decay mode separation
208			\item Example: 1prong1pi0 has no discrimination power for isolated OneProng
209			QCD
210			\end{itemize}
211
212			\subsection{Neural network output selections}
213			Since there are five neural networks, a discrimination working point requires
214			selection of a point in five-D space.
215			\begin{itemize}
216			\item Monte Carlo cut point selection
217			\item A 5D point is added to the performance curve if it has a higher
218			signal efficiency than the current point with the same background mis-tag
219			rate.
220			\item Separate samples are used for selecting the 5D curve, and evaluating
221			its performance.
222			\end{itemize}
223
224			The 5D performance curve can also be parameterized by using the probability for a
225			tau--candidate to be identified for a given decay mode.
226			\begin{itemize}
227			\item The method transforms the output of each neural net according to the
228			decay mode probability
229			\item The decay mode probability is dependent on pt/eta
230			\item Derivation of transform
231			\item Net discriminant output is now a single continuous variable
232			\item Recommended method of using the TaNC
233			\item Plot: comparison of transform to MC-determined optimal curve
234			\end{itemize}
235
236			\subsection{Algorithm Performance}
237			The TaNC algorithm identifies true hadronic tau decays with a much higher purity
238			than algorithms previously used in CMS analyses.
239			\begin{itemize}
240			\item Plot: performance curve
241			\item With transform, cut is a continuous variable
242			\item Comparison with shrinking/fixed cone
243			\end{itemize}
244
245			\section{Future work}
246			The TaNC algorithm has been optimized for the initial stages of LHC operation.
247			\begin{itemize}
248			\item Will need to be retrained when luminosity changes
249			\item Once enough data comes, backgrounds will be trained with data events
250			\end{itemize}
251	friis	1.1
252			\end{document}