Friis/TancNote/outline.tex

\documentclass{article}

%\title{New techniques for decay mode reconstruction and identification of
%hadronic tau lepton decays [outline]}
\title{The Tau Neural Classifier algorithm: tau identification and decay mode reconstruction using neural networks}
\author{Evan K. Friis}

\begin{document}

\maketitle 
\tableofcontents 

\abstract{
%Description of a new method for identifying hadronically decaying taus that
%improves the tau identification efficiency on hadroncially decaying taus from
%Z->tautau events while lowering the number of quark and gluon jets from QCD
%di--jet events that are mis-tagged as taus.
%jets.
%\begin{itemize} 
%   \item Reconstructs the decay mode of the tau
%   \item Novel neural networks corresponding to different decay modes of the tau
%\end{itemize}
The Tau Neural Clssifier (TaNC) is a novel algorithm for identification of hadronic tau decays.
The algorithm includes two ocmponents, the reconstruction of tau lepton hadronic decay modes
and discrimination of tau lepton hadronic decays from quark and gluon jets.
The reconstruction of decay modes is based on the reconstruction of individual charged hadrons and photons
by the particle--flow algorithm
and is utilized in the discrimination to train a set of neural networks using input variables
that are sensitive to particular decay modes.
We observe a significant improvement in identification performance in comparisson to previous algorithms.
}

\section{Introduction}
%Taus are an important part of the physics program at CMS.  
%\begin{itemize} 
%   \item Higgs Boson have an enhanced coupling to taus due to their high mass.
%   \item In MSSM, this coupling is enhanced by tanBeta
%   \item For certain Higgs mass ranges, the tau decay channel offers best
%      discovery potential.
%   \item Tau leptons can decay to electrons or muons.
%   \item But Tau leptons are unique in that their are the only lepton that can decay
%      to hadrons. (1 or 3 pions)
%   \item In this paper we describe a novel method for identifying hadronic
%      decays of taus.
%   \item Methods for discriminating against electron and muons are described in
%      PFT-08-001
%\end{itemize}
%
%Identifying taus is difficult at hadron colliders.
%\begin{itemize}
%   \item Taus production in channels of interest is a relatively rare
%      phenomenon.  
%   \item The decay signature of the tau lepton is very similar to electron,
%      muon, quark and gluon jets which are produced in abundance.
%\end{itemize}
%
%\subsection{Tau Identification}
%A description of the tau identification algorithms used in past CMS physics analysis.
%We propose an extension to these methods.
%\begin{itemize}
%   \item CaloTaus versus PFTaus
%   \item ParticleFlow blurb
%   \item PFTau have better ET and angular resolution and can resolve individual
%      photons
%   \item To remove QCD, and isolation requirement is applied, described in
%      PFT-08-001
%   \item A Et dependent signal cone has been developed to separate
%      signal and isolation regions.
%   \item Performance is on the order of O(0.01)
%   \item Plot: Shrinking Cone performance from PFT-08-001
%\end{itemize}

A good tau identification performance is important for the discovery potential of many possible new physics signals at the LHC.
\begin{itemize}
\item typically are signal processes
\item quark and gluon jets produced with significantly larger cross--sections
\item efficient identification of hadronic tau decays and low misidentification rate for quarks and gluons
      thus essential for many searches for new physics
\end{itemize}

New physics signals may be discovered via tau lepton hadronic decays in early CMS data.
\begin{itemize}
\item for example, MSSM Higgs to production cross--section of which is enhanced by tan(beta)
\item but also for discovery of Standard Model Higgs, a good tau identification performance is important,
      as Higgs $\rightarrow$ tau decays have the second largest branching fraction
\end{itemize}

Tau leptons are unique in that they are the only type of leptons which are heavy enough to decay to hadrons.
\begin{itemize}
\item lifetime $c \cdot \tau = 87 \mu$~m
\item BR(e) ~ BR(mu) ~ 17%
\item BR(hadrons) ~ 65%;
      mostly either one or three charged pions plus zero to two neutral pions,
      which almost instanteneously decay to photons
\end{itemize}

In this note, we will concentrate on the identification of hadronic tau decays.
\begin{itemize}
\item tau decays to electrons and muons are difficult to distinguish from electrons and muons produced in $pp$ collision
     (strategy depends on analysis, tau decays to electrons and muons typically identified by requiring
      two leptons of differenct flavor)
\item discrimination of hadronic tau decays from electrons and muons is described in PFT--08--001
\item ``signal'' signature the identification of which we aim to improve with the Tau Neural Classifier (TaNC)
      is collimated jet containing either one or three tracks reconstructed in Pixel and silicon Strip tracker,
      plus low number of neutral electromagnetic showers reconstructed in the ECAL
\end{itemize}

\subsection{TaNC motivation}
The different hadronic decay modes of the tau come from different resonance.  Provides
additional information.  Can re-frame the search into search for rhos, a1s, etc.
\begin{itemize}
   \item Each decay mode has a different topology and different possibilities
      for discrimination.
   \item The tau decay can have 1 || 3 pions and a number of pi0s.
   \item Each decay mode multiplicity maps directly to a resonance (@ 95\%
      level)
   \item This note presents two complimentary techniques: a method to
      reconstruct the decay mode and an ensemble of neural network discriminants 
      used to classify tau--candidates.
   \item Plot: True visible invariant mass for different decay modes
\end{itemize}

\section{Decay Mode Reconstruction}
The signal 
CV: add reference to shrinking cone note CMS AN--2008/026
cone photons are merged into candidate pi0s and the candidates are
subject to a minimum pT quality requirement to remove contamination from various
sources.
\begin{itemize}
   \item pi0s undergo prompt decay to photons.
   \item The number of photons present in the signal cone has a long tail due to
      UE, PU, showers, photon conversions.
   \item Plot: number of photons versus number of pi-zeros
\end{itemize}

\subsection{Photon Merging}
Photons are merged into composite pi0s by looking at the invariant mass of each
combination of photons.
\begin{itemize}
   \item Only photon pairs that have mass less than 0.2GeV are considered.
   \item CMS Ecal granularity and particle flow clustering provide excellent
      resolution.
   \item Plot: di photon mass for decay mode 1.
\end{itemize}

\subsection{Quality requirements}
To remove contamination from pile-up and underlying event, a minimum pt quality
requirement is applied to the remaining photon candidates.
\begin{itemize}
   \item The lowest pt photon is required to carry 10\% of the composite visible
      pt
   \item This removes contaminant photons while preserving single photons that
      correspond to pi0s
   \item Plots: photon pt fraction for DM0 and DM1
\end{itemize}

\subsection{Results}
The decay mode reconstruction algorithm dramatically improves the determination
of the decay mode.
\begin{itemize}
   \item Tails removed
   \item Mean improved
   \item Plot: correlation plot
\end{itemize}

The distribution of the decay modes is different for signal and background.  The
decay mode determination is slightly dependent on pt and eta.

\begin{itemize}
   \item pt turn on curve is due to pt quality thresholds and cone size
   \item Blowup of 1prong1pi0 fraction at eta = 2.5 due to loss of tracker + no
      loss of ECAL?  
   \item NB that the distribution of the decay modes is another handle that the
      TaNC has.
   \item Plot: Decay mode for sig/bkg vs. pt and eta
\end{itemize}

\section{Neural network classification}
For each decay mode, a different neural network is used.  
\begin{itemize}
   \item The five decay modes we use constitute 95\% of hadronic decays.
   \item Table of the five decays
   \item Other decay modes are discarded.
   \item Each neural net has inputs that are specific to that decay mode.
   \item Each neural net is trained on a tau--candidates reconstructed with the
      associated decay mode.
   \item During final discrimination, the neural network associated with the
      reconstructed decay mode of the tau candidate is used to do the
      classification.
   \item Since five neural networks are used a strategy must be used to select
      the cut used on each neural network output.
\end{itemize}

\subsection{Neural network discriminants}
The neural networks use 
%discriminants 
as input variables observables 
specific to each decay mode.
%Discriminants 
The observables are listed in the appendix.  
Common 
%discriminants 
observables include:
\begin{itemize}
   \item Pt/Eta
   \item Invariant mass
   \item Pt and DR from axis of signal objects
   \item Pt and DR from axis of isolation objects
   \item Number of charged isolation objects
   \item Sum charged pt in isolation
   \item For three body decays, the two dalitz variables
   \item Include separation and correlation plots for all variables?
CV: yes, please (in appendix)
\end{itemize}

\subsection{Neural network training} 

The signal and background samples are split into five subsamples corresponding
to each decay mode.   
\begin{itemize}
   \item Ztautau matched to hadronic taus for signal, QCD Dijet for bkg
   \item The leading pion pt requirement is applied.
   \item Table of signal/background training events for each mode.
\end{itemize}

The decay mode is dependent on pt and eta and this dependence must be invisible
to the neural network.
\begin{itemize}
   \item The kinematics are very different for signal/background
   \item We want to prevent the NN from training on these differences
   \item Weighting is applied so the weighted pt/eta distributions are identical
   \item Since the probability for a given decay mode to occur is kinematically
      dependent, the weighting is applied to the subset of the sample that
      corresponds to ensemble of allowed decay modes.
\end{itemize}

The neural networks are implemented as TMVA back-propagating neural networks.
\begin{itemize}
   \item Number of hidden nodes = Kolmogorov function N + 1 (2*N + 1)
   \item 500 training epochs, testing for over-training every ten
   \item No over-training is detected. (need plots?)
CV: yes, please show NN output error on training and on validation dataset 
(two curves overlayed on same plot which has training epoch on the x--axis and NN output error on the y--axis)
for at least one of the decay modes/neural networks (as example)
\end{itemize}

\subsection{Individual neural network performance} 
The separation power of the individual neural net is different.  The ultimate separation
power of the algorithm depends on both the individual neural net separation
performance and decay mode distribution differences between signal and
background.
\begin{itemize}
   \item Plots of each decay mode separation
   \item Example: 1prong1pi0 has no discrimination power for isolated OneProng
      QCD
\end{itemize}

\subsection{Neural network output selections}
Since there are five neural networks, a discrimination working point requires
selection of a point in five-D space.
\begin{itemize}
   \item Monte Carlo cut point selection
   \item A 5D point is added to the performance curve if it has a higher
      signal efficiency than the current point with the same background mis-tag
      rate.
   \item Separate samples are used for selecting the 5D curve, and evaluating
      its performance.
\end{itemize}

The 5D performance curve can also be parameterized by using the probability for a
tau--candidate to be identified for a given decay mode.
\begin{itemize}
   \item The method transforms the output of each neural net according to the
      decay mode probability
   \item The decay mode probability is dependent on pt/eta
   \item Derivation of transform
   \item Net discriminant output is now a single continuous variable
   \item Recommended method of using the TaNC
   \item Plot: comparison of transform to MC-determined optimal curve
\end{itemize}

\subsection{Algorithm Performance}
The TaNC algorithm identifies true hadronic tau decays with a much higher purity
than algorithms previously used in CMS analyses.
\begin{itemize}
   \item Plot: performance curve
   \item With transform, cut is a continuous variable
   \item Comparison with shrinking/fixed cone
\end{itemize}

\section{Future work}
The TaNC algorithm has been optimized for the initial stages of LHC operation.
\begin{itemize}
   \item Will need to be retrained when luminosity changes
   \item Once enough data comes, backgrounds will be trained with data events
\end{itemize}

\end{document}
Revision:	1.3
Committed:	Fri Mar 19 11:29:20 2010 UTC (15 years, 1 month ago) by veelken
Content type:	application/x-tex
Branch:	MAIN
Changes since 1.2:	+108 -52 lines
Log Message:	changed title, improved abstract and introduction, added a few comments
#	User	Rev	Content
1	friis	1.1	\documentclass{article}
2
3	veelken	1.3	%\title{New techniques for decay mode reconstruction and identification of
4			%hadronic tau lepton decays [outline]}
5			\title{The Tau Neural Classifier algorithm: tau identification and decay mode reconstruction using neural networks}
6	friis	1.1	\author{Evan K. Friis}
7
8			\begin{document}
9
10			\maketitle
11			\tableofcontents
12
13	friis	1.2	\abstract{
14	veelken	1.3	%Description of a new method for identifying hadronically decaying taus that
15			%improves the tau identification efficiency on hadroncially decaying taus from
16			%Z->tautau events while lowering the number of quark and gluon jets from QCD
17			%di--jet events that are mis-tagged as taus.
18			%jets.
19			%\begin{itemize}
20			% \item Reconstructs the decay mode of the tau
21			% \item Novel neural networks corresponding to different decay modes of the tau
22			%\end{itemize}
23			The Tau Neural Clssifier (TaNC) is a novel algorithm for identification of hadronic tau decays.
24			The algorithm includes two ocmponents, the reconstruction of tau lepton hadronic decay modes
25			and discrimination of tau lepton hadronic decays from quark and gluon jets.
26			The reconstruction of decay modes is based on the reconstruction of individual charged hadrons and photons
27			by the particle--flow algorithm
28			and is utilized in the discrimination to train a set of neural networks using input variables
29			that are sensitive to particular decay modes.
30			We observe a significant improvement in identification performance in comparisson to previous algorithms.
31	friis	1.2	}
32
33	friis	1.1	\section{Introduction}
34	veelken	1.3	%Taus are an important part of the physics program at CMS.
35			%\begin{itemize}
36			% \item Higgs Boson have an enhanced coupling to taus due to their high mass.
37			% \item In MSSM, this coupling is enhanced by tanBeta
38			% \item For certain Higgs mass ranges, the tau decay channel offers best
39			% discovery potential.
40			% \item Tau leptons can decay to electrons or muons.
41			% \item But Tau leptons are unique in that their are the only lepton that can decay
42			% to hadrons. (1 or 3 pions)
43			% \item In this paper we describe a novel method for identifying hadronic
44			% decays of taus.
45			% \item Methods for discriminating against electron and muons are described in
46			% PFT-08-001
47			%\end{itemize}
48			%
49			%Identifying taus is difficult at hadron colliders.
50			%\begin{itemize}
51			% \item Taus production in channels of interest is a relatively rare
52			% phenomenon.
53			% \item The decay signature of the tau lepton is very similar to electron,
54			% muon, quark and gluon jets which are produced in abundance.
55			%\end{itemize}
56			%
57			%\subsection{Tau Identification}
58			%A description of the tau identification algorithms used in past CMS physics analysis.
59			%We propose an extension to these methods.
60			%\begin{itemize}
61			% \item CaloTaus versus PFTaus
62			% \item ParticleFlow blurb
63			% \item PFTau have better ET and angular resolution and can resolve individual
64			% photons
65			% \item To remove QCD, and isolation requirement is applied, described in
66			% PFT-08-001
67			% \item A Et dependent signal cone has been developed to separate
68			% signal and isolation regions.
69			% \item Performance is on the order of O(0.01)
70			% \item Plot: Shrinking Cone performance from PFT-08-001
71			%\end{itemize}
72
73			A good tau identification performance is important for the discovery potential of many possible new physics signals at the LHC.
74			\begin{itemize}
75			\item typically are signal processes
76			\item quark and gluon jets produced with significantly larger cross--sections
77			\item efficient identification of hadronic tau decays and low misidentification rate for quarks and gluons
78			thus essential for many searches for new physics
79			\end{itemize}
80
81			New physics signals may be discovered via tau lepton hadronic decays in early CMS data.
82			\begin{itemize}
83			\item for example, MSSM Higgs to production cross--section of which is enhanced by tan(beta)
84			\item but also for discovery of Standard Model Higgs, a good tau identification performance is important,
85			as Higgs $\rightarrow$ tau decays have the second largest branching fraction
86			\end{itemize}
87
88			Tau leptons are unique in that they are the only type of leptons which are heavy enough to decay to hadrons.
89			\begin{itemize}
90			\item lifetime $c \cdot \tau = 87 \mu$~m
91			\item BR(e) ~ BR(mu) ~ 17%
92			\item BR(hadrons) ~ 65%;
93			mostly either one or three charged pions plus zero to two neutral pions,
94			which almost instanteneously decay to photons
95			\end{itemize}
96
97			In this note, we will concentrate on the identification of hadronic tau decays.
98			\begin{itemize}
99			\item tau decays to electrons and muons are difficult to distinguish from electrons and muons produced in $pp$ collision
100			(strategy depends on analysis, tau decays to electrons and muons typically identified by requiring
101			two leptons of differenct flavor)
102			\item discrimination of hadronic tau decays from electrons and muons is described in PFT--08--001
103			\item ``signal'' signature the identification of which we aim to improve with the Tau Neural Classifier (TaNC)
104			is collimated jet containing either one or three tracks reconstructed in Pixel and silicon Strip tracker,
105			plus low number of neutral electromagnetic showers reconstructed in the ECAL
106	friis	1.2	\end{itemize}
107	friis	1.1
108			\subsection{TaNC motivation}
109	friis	1.2	The different hadronic decay modes of the tau come from different resonance. Provides
110			additional information. Can re-frame the search into search for rhos, a1s, etc.
111			\begin{itemize}
112			\item Each decay mode has a different topology and different possibilities
113			for discrimination.
114			\item The tau decay can have 1 \|\| 3 pions and a number of pi0s.
115			\item Each decay mode multiplicity maps directly to a resonance (@ 95\%
116			level)
117			\item This note presents two complimentary techniques: a method to
118			reconstruct the decay mode and an ensemble of neural network discriminants
119			used to classify tau--candidates.
120			\item Plot: True visible invariant mass for different decay modes
121			\end{itemize}
122
123			\section{Decay Mode Reconstruction}
124	veelken	1.3	The signal
125			CV: add reference to shrinking cone note CMS AN--2008/026
126			cone photons are merged into candidate pi0s and the candidates are
127	friis	1.2	subject to a minimum pT quality requirement to remove contamination from various
128			sources.
129			\begin{itemize}
130			\item pi0s undergo prompt decay to photons.
131			\item The number of photons present in the signal cone has a long tail due to
132			UE, PU, showers, photon conversions.
133			\item Plot: number of photons versus number of pi-zeros
134			\end{itemize}
135	friis	1.1
136			\subsection{Photon Merging}
137	friis	1.2	Photons are merged into composite pi0s by looking at the invariant mass of each
138			combination of photons.
139			\begin{itemize}
140			\item Only photon pairs that have mass less than 0.2GeV are considered.
141			\item CMS Ecal granularity and particle flow clustering provide excellent
142			resolution.
143			\item Plot: di photon mass for decay mode 1.
144			\end{itemize}
145
146			\subsection{Quality requirements}
147			To remove contamination from pile-up and underlying event, a minimum pt quality
148			requirement is applied to the remaining photon candidates.
149			\begin{itemize}
150			\item The lowest pt photon is required to carry 10\% of the composite visible
151			pt
152			\item This removes contaminant photons while preserving single photons that
153			correspond to pi0s
154			\item Plots: photon pt fraction for DM0 and DM1
155			\end{itemize}
156
157			\subsection{Results}
158			The decay mode reconstruction algorithm dramatically improves the determination
159			of the decay mode.
160			\begin{itemize}
161			\item Tails removed
162			\item Mean improved
163			\item Plot: correlation plot
164			\end{itemize}
165
166			The distribution of the decay modes is different for signal and background. The
167			decay mode determination is slightly dependent on pt and eta.
168
169			\begin{itemize}
170			\item pt turn on curve is due to pt quality thresholds and cone size
171			\item Blowup of 1prong1pi0 fraction at eta = 2.5 due to loss of tracker + no
172			loss of ECAL?
173			\item NB that the distribution of the decay modes is another handle that the
174			TaNC has.
175			\item Plot: Decay mode for sig/bkg vs. pt and eta
176			\end{itemize}
177	friis	1.1
178			\section{Neural network classification}
179	friis	1.2	For each decay mode, a different neural network is used.
180			\begin{itemize}
181			\item The five decay modes we use constitute 95\% of hadronic decays.
182			\item Table of the five decays
183			\item Other decay modes are discarded.
184			\item Each neural net has inputs that are specific to that decay mode.
185			\item Each neural net is trained on a tau--candidates reconstructed with the
186			associated decay mode.
187			\item During final discrimination, the neural network associated with the
188			reconstructed decay mode of the tau candidate is used to do the
189			classification.
190			\item Since five neural networks are used a strategy must be used to select
191			the cut used on each neural network output.
192			\end{itemize}
193
194			\subsection{Neural network discriminants}
195	veelken	1.3	The neural networks use
196			%discriminants
197			as input variables observables
198			specific to each decay mode.
199			%Discriminants
200			The observables are listed in the appendix.
201			Common
202			%discriminants
203			observables include:
204	friis	1.2	\begin{itemize}
205			\item Pt/Eta
206			\item Invariant mass
207			\item Pt and DR from axis of signal objects
208			\item Pt and DR from axis of isolation objects
209			\item Number of charged isolation objects
210			\item Sum charged pt in isolation
211			\item For three body decays, the two dalitz variables
212			\item Include separation and correlation plots for all variables?
213	veelken	1.3	CV: yes, please (in appendix)
214	friis	1.2	\end{itemize}
215
216			\subsection{Neural network training}
217
218			The signal and background samples are split into five subsamples corresponding
219			to each decay mode.
220			\begin{itemize}
221			\item Ztautau matched to hadronic taus for signal, QCD Dijet for bkg
222			\item The leading pion pt requirement is applied.
223			\item Table of signal/background training events for each mode.
224			\end{itemize}
225
226			The decay mode is dependent on pt and eta and this dependence must be invisible
227			to the neural network.
228			\begin{itemize}
229			\item The kinematics are very different for signal/background
230			\item We want to prevent the NN from training on these differences
231			\item Weighting is applied so the weighted pt/eta distributions are identical
232			\item Since the probability for a given decay mode to occur is kinematically
233			dependent, the weighting is applied to the subset of the sample that
234			corresponds to ensemble of allowed decay modes.
235			\end{itemize}
236
237			The neural networks are implemented as TMVA back-propagating neural networks.
238			\begin{itemize}
239			\item Number of hidden nodes = Kolmogorov function N + 1 (2*N + 1)
240			\item 500 training epochs, testing for over-training every ten
241			\item No over-training is detected. (need plots?)
242	veelken	1.3	CV: yes, please show NN output error on training and on validation dataset
243			(two curves overlayed on same plot which has training epoch on the x--axis and NN output error on the y--axis)
244			for at least one of the decay modes/neural networks (as example)
245	friis	1.2	\end{itemize}
246
247			\subsection{Individual neural network performance}
248			The separation power of the individual neural net is different. The ultimate separation
249			power of the algorithm depends on both the individual neural net separation
250			performance and decay mode distribution differences between signal and
251			background.
252			\begin{itemize}
253			\item Plots of each decay mode separation
254			\item Example: 1prong1pi0 has no discrimination power for isolated OneProng
255			QCD
256			\end{itemize}
257
258			\subsection{Neural network output selections}
259			Since there are five neural networks, a discrimination working point requires
260			selection of a point in five-D space.
261			\begin{itemize}
262			\item Monte Carlo cut point selection
263			\item A 5D point is added to the performance curve if it has a higher
264			signal efficiency than the current point with the same background mis-tag
265			rate.
266			\item Separate samples are used for selecting the 5D curve, and evaluating
267			its performance.
268			\end{itemize}
269
270			The 5D performance curve can also be parameterized by using the probability for a
271			tau--candidate to be identified for a given decay mode.
272			\begin{itemize}
273			\item The method transforms the output of each neural net according to the
274			decay mode probability
275			\item The decay mode probability is dependent on pt/eta
276			\item Derivation of transform
277			\item Net discriminant output is now a single continuous variable
278			\item Recommended method of using the TaNC
279			\item Plot: comparison of transform to MC-determined optimal curve
280			\end{itemize}
281
282			\subsection{Algorithm Performance}
283			The TaNC algorithm identifies true hadronic tau decays with a much higher purity
284			than algorithms previously used in CMS analyses.
285			\begin{itemize}
286			\item Plot: performance curve
287			\item With transform, cut is a continuous variable
288			\item Comparison with shrinking/fixed cone
289			\end{itemize}
290
291			\section{Future work}
292			The TaNC algorithm has been optimized for the initial stages of LHC operation.
293			\begin{itemize}
294			\item Will need to be retrained when luminosity changes
295			\item Once enough data comes, backgrounds will be trained with data events
296			\end{itemize}
297	friis	1.1
298			\end{document}