TancNote/note/tanc_nn_training.tex

The samples used to train the TaNC neural networks are typical of the signals
and backgrounds found in common physics analyses using taus.  The signal--type
training sample is composed of reconstucted tau--candidates that are matched to
generator level hadronic tau decays coming from simulated $Z \rightarrow
\tau^{+}\tau^{-}$ events.  The background training sample consists of reconstructed
tau--candidates in simulated QCD $2\rightarrow2$ hard scattering events.  The
QCD $P_T$ spectrum is steeply falling, and to obtain sufficient statistics
across a broad range of $P_T$ the sample is split into different $\hat P_{T}$
bins.  Each QCD sub-sample imposes a generator level cut on the transverse
energy of the hard interaction.  

The signal and background samples are split into five subsamples corresponding
to each reconstructed decay mode.  An additional selection is applied to each
subsample by requiring a ``leading pion'': either a charged hadron or gamma
candidate with transverse momentum greater than 5 GeV$/c$.  A large number of
QCD training events is required as the leading pion selection and the
requirement that the decay mode match one of the dominant modes given in table
~\ref{tab:decay_modes} are both effective discriminants.  For each subsample,
10000 signal and background tau--candidates are reserved to be used internally
by the TMVA software to test for over--training. The number of signal and
background entries used for each decay mode subsample is given in table
~\ref{tab:trainingEvents}.

%Chained 100 signal files.
%Chained 208 background files.
%Total signal entries: 874266
%Total background entries: 9526176
%Pruning non-relevant entries.
%After pruning, 584895 signal and 644315 background entries remain.
%**********************************************************************************
%*********************************** Summary **************************************
%**********************************************************************************
%*     NumEvents with weight > 0 (Total NumEvents)                                *
%*--------------------------------------------------------------------------------*
%*shrinkingConePFTauDecayModeProducer   ThreeProngNoPiZero: Signal:      53257(53271)            Background:155793(155841)
%*shrinkingConePFTauDecayModeProducer  ThreeProngOnePiZero: Signal:      13340(13342)            Background:135871(135942)
%*shrinkingConePFTauDecayModeProducer    OneProngTwoPiZero: Signal:      34780(34799)            Background:51181(51337)
%*shrinkingConePFTauDecayModeProducer    OneProngOnePiZero: Signal:      136464(138171)          Background:137739(139592)
%*shrinkingConePFTauDecayModeProducer     OneProngNoPiZero: Signal:      300951(345312)          Background:144204(161603)

\begin{table}
   \centering
   \begin{tabular}{lcc}
      %\multirow{2}{*}{}                         & \multicolumn{2}{c}{Events}    \\
                                                & Signal        & Background    \\
      \hline
      Total number of tau--candidates           & 874266        & 9526176       \\
      Tau--candidates passing preselection      & 584895        & 644315        \\
      Tau--candidates with $W(P_T,\eta)>0$      & 538792        & 488917        \\
      \hline
      Decay Mode                        & \multicolumn{2}{c}{Training Events}   \\
      \hline
      $\pi^{-}$                         & 300951   & 144204                     \\
      $\pi^{-}\pi^0$                    & 135464   & 137739                     \\
      $\pi^{-}\pi^0\pi^0$               & 34780    & 51181                      \\
      $\pi^{-}\pi^{-}\pi^{+}$           & 53247    & 155793                     \\
      $\pi^{-}\pi^{-}\pi^{+}\pi^0$      & 13340    & 135871                     \\
   \end{tabular}
   \label{tab:trainingEvents}
   \caption{Number of events used for neural network training for each
   selected decay mode.}
\end{table}

In both signal and background samples, 20\% of the events are reserved as a
statistically independent sample to evaluate the performance of the neural nets
after the training is completed.  The TaNC uses the ``MLP'' neural network
implementation provided by the TMVA software package, described in ~\cite{TMVA}.
The ``MLP'' classifier is a feed-forward artificial neural network. The neural
network has two layers of hidden nodes and a single node in the output layer.
The number of hidden nodes in the first and second layers are determined by
Kolmogorov's theorem~\cite{kolmogorovsTheorem}; the number of hidden nodes in the first
(second) layer is $N+1 (2*N+1)$, where $N$ is the number of input observables.
The neural network is trained for 500 epochs, and no evidence (see
figure~\ref{fig:overTrainCheck} of overtraining is observed.

\begin{figure}[t]
   \setlength{\unitlength}{1mm}
   \begin{center}
      \begin{picture}(150, 195)(0,0)
         \put(0.5, 130)
         {\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_OneProngNoPiZero.pdf}}}
         \put(65,  130)
         {\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_OneProngOnePiZero.pdf}}}
         \put(0.5, 65) 
         {\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_OneProngTwoPiZero.pdf}}}
         \put(65, 65) 
         {\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_ThreeProngNoPiZero.pdf}}}
         \put(33, 0) 
         {\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_ThreeProngOnePiZero.pdf}}}
      \end{picture}
   \caption{Neural network classification error for training and testing sample
   over 500 training epochs for each decay mode neural network.   
   Classifier over--training would be evidenced by divergence of the classification error
   of the training and testing samples, indicating that the neural net was
   optimizing about statistical fluctations in the training sample.
   }
   \label{fig:overTrainCheck}
   \end{center}
\end{figure}


The neural nets uses as input variables the transverse momentum and $\eta$ of the
tau--candidates.  These variables are included as their correlations with other
observables can increase the separation power of the ensemble of observables.
For example, the opening angle in $\Delta R$ for signal tau--candidates is
inversely related to the transverse momentum, while for background events the
correlation is very small (see~\cite{DavisTau}). In the training signal and
background samples, there is significant discrimination power in the $P_T$
spectrum.   However, it is desirable to eliminate any systematic dependence of
the neural network output on $P_T$ and $\eta$, as in use the TaNC will be
presented with tau--candidates whose $P_T-\eta$ spectrum will be analysis
dependent. The dependence on $P_T$ and $\eta$ is removed by applying a $P_T$ and
$\eta$ dependent weight to the tau--candidates when training the neural nets.  

The weights are defined such that in any region in $P_T-\eta$ where the signal
and background probability density function are different, the sample with
higher probability density is weighted such that the samples have identical
$P_T-\eta$ probability distributions.  This removes regions of $P_T-\eta$ space
where the training sample is exclusively signal or background.  The weights are
computed by
\begin{align*}
   W(P_T, \eta) &=  {\rm less}(p_{sig}(P_T, \eta), p_{bkg}(P_T, \eta))\\
   w_{sig}(P_T, \eta) &=  W(P_T, \eta)/p_{sig}(P_T, \eta) \\
   w_{bkg}(P_T, \eta) &=  W(P_T, \eta)/p_{bkg}(P_T, \eta) 
\end{align*}
where $p_{sig}(P_T,\eta)$ and $p_{bkg}(P_T,\eta)$ are the probility densities of
the signal and background samples after the ``leading pion'' and decay mode
selections. Figure~\ref{fig:nnTrainingWeights} shows the signal and background
training $P_T$ distributions before and after the weighting is applied.


\begin{figure}[t]
\setlength{\unitlength}{1mm}
\begin{center}
\begin{picture}(150,60)(0,0)
\put(10.5, 2){
\mbox{\includegraphics*[height=58mm]{figures/training_weights_unweighted.pdf}}}
\put(86.0, 2){
\mbox{\includegraphics*[height=58mm]{figures/training_weights_weighted.pdf}}}
%\put(-5.5, 112.5){\small (a)}
%\put(72.0, 112.5){\small (b)}
%\put(-5.5, 54.5){\small (c)}
%\put(72.0, 54.5){\small (d)}
\end{picture}
\caption{Transverse momentum spectrum of signal and background
tau--candidates used in neural net training before (left) and after (right) the
application of $P_T-\eta$ dependent weight function.  Application of the weights
lowers the training significance of tau--candidates in regions of $P_T-\eta$
phase space where either the signal or background samples has an excess of
events. }
\label{fig:nnTrainingWeights}
\end{center}
\end{figure} 

Revision:	1.4
Committed:	Thu Apr 22 16:13:12 2010 UTC (15 years ago) by friis
Content type:	application/x-tex
Branch:	MAIN
Changes since 1.3:	+38 -7 lines
Log Message:	Adding NN observables inputs and other miscellaneous stuff
#	User	Rev	Content
1	friis	1.1	The samples used to train the TaNC neural networks are typical of the signals
2			and backgrounds found in common physics analyses using taus. The signal--type
3			training sample is composed of reconstucted tau--candidates that are matched to
4			generator level hadronic tau decays coming from simulated $Z \rightarrow
5			\tau^{+}\tau^{-}$ events. The background training sample consists of reconstructed
6	friis	1.2	tau--candidates in simulated QCD $2\rightarrow2$ hard scattering events. The
7			QCD $P_T$ spectrum is steeply falling, and to obtain sufficient statistics
8			across a broad range of $P_T$ the sample is split into different $\hat P_{T}$
9			bins. Each QCD sub-sample imposes a generator level cut on the transverse
10			energy of the hard interaction.
11
12	friis	1.1	The signal and background samples are split into five subsamples corresponding
13			to each reconstructed decay mode. An additional selection is applied to each
14			subsample by requiring a ``leading pion'': either a charged hadron or gamma
15			candidate with transverse momentum greater than 5 GeV$/c$. A large number of
16			QCD training events is required as the leading pion selection and the
17			requirement that the decay mode match one of the dominant modes given in table
18			~\ref{tab:decay_modes} are both effective discriminants. For each subsample,
19			10000 signal and background tau--candidates are reserved to be used internally
20			by the TMVA software to test for over--training. The number of signal and
21			background entries used for each decay mode subsample is given in table
22			~\ref{tab:trainingEvents}.
23
24	friis	1.2	%Chained 100 signal files.
25			%Chained 208 background files.
26			%Total signal entries: 874266
27			%Total background entries: 9526176
28			%Pruning non-relevant entries.
29			%After pruning, 584895 signal and 644315 background entries remain.
30			%**********************************************************************************
31			%********************************* Summary ************************************
32			%**********************************************************************************
33			%* NumEvents with weight > 0 (Total NumEvents) *
34			%--------------------------------------------------------------------------------
35			%*shrinkingConePFTauDecayModeProducer ThreeProngNoPiZero: Signal: 53257(53271) Background:155793(155841)
36			%*shrinkingConePFTauDecayModeProducer ThreeProngOnePiZero: Signal: 13340(13342) Background:135871(135942)
37			%*shrinkingConePFTauDecayModeProducer OneProngTwoPiZero: Signal: 34780(34799) Background:51181(51337)
38			%*shrinkingConePFTauDecayModeProducer OneProngOnePiZero: Signal: 136464(138171) Background:137739(139592)
39			%*shrinkingConePFTauDecayModeProducer OneProngNoPiZero: Signal: 300951(345312) Background:144204(161603)
40
41	friis	1.1	\begin{table}
42			\centering
43	friis	1.2	\begin{tabular}{lcc}
44			%\multirow{2}{*}{} & \multicolumn{2}{c}{Events} \\
45			& Signal & Background \\
46	friis	1.1	\hline
47	friis	1.2	Total number of tau--candidates & 874266 & 9526176 \\
48			Tau--candidates passing preselection & 584895 & 644315 \\
49			Tau--candidates with $W(P_T,\eta)>0$ & 538792 & 488917 \\
50	friis	1.1	\hline
51	friis	1.2	Decay Mode & \multicolumn{2}{c}{Training Events} \\
52	friis	1.1	\hline
53	friis	1.2	$\pi^{-}$ & 300951 & 144204 \\
54			$\pi^{-}\pi^0$ & 135464 & 137739 \\
55			$\pi^{-}\pi^0\pi^0$ & 34780 & 51181 \\
56			$\pi^{-}\pi^{-}\pi^{+}$ & 53247 & 155793 \\
57			$\pi^{-}\pi^{-}\pi^{+}\pi^0$ & 13340 & 135871 \\
58	friis	1.1	\end{tabular}
59			\label{tab:trainingEvents}
60			\caption{Number of events used for neural network training for each
61			selected decay mode.}
62			\end{table}
63
64	friis	1.4	In both signal and background samples, 20\% of the events are reserved as a
65			statistically independent sample to evaluate the performance of the neural nets
66			after the training is completed. The TaNC uses the ``MLP'' neural network
67			implementation provided by the TMVA software package, described in ~\cite{TMVA}.
68			The ``MLP'' classifier is a feed-forward artificial neural network. The neural
69			network has two layers of hidden nodes and a single node in the output layer.
70			The number of hidden nodes in the first and second layers are determined by
71			Kolmogorov's theorem~\cite{kolmogorovsTheorem}; the number of hidden nodes in the first
72			(second) layer is $N+1 (2*N+1)$, where $N$ is the number of input observables.
73			The neural network is trained for 500 epochs, and no evidence (see
74			figure~\ref{fig:overTrainCheck} of overtraining is observed.
75
76			\begin{figure}[t]
77			\setlength{\unitlength}{1mm}
78			\begin{center}
79			\begin{picture}(150, 195)(0,0)
80			\put(0.5, 130)
81			{\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_OneProngNoPiZero.pdf}}}
82			\put(65, 130)
83			{\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_OneProngOnePiZero.pdf}}}
84			\put(0.5, 65)
85			{\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_OneProngTwoPiZero.pdf}}}
86			\put(65, 65)
87			{\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_ThreeProngNoPiZero.pdf}}}
88			\put(33, 0)
89			{\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_ThreeProngOnePiZero.pdf}}}
90			\end{picture}
91			\caption{Neural network classification error for training and testing sample
92			over 500 training epochs for each decay mode neural network.
93			Classifier over--training would be evidenced by divergence of the classification error
94			of the training and testing samples, indicating that the neural net was
95			optimizing about statistical fluctations in the training sample.
96			}
97			\label{fig:overTrainCheck}
98			\end{center}
99			\end{figure}
100
101	friis	1.1
102	friis	1.2	The neural nets uses as input variables the transverse momentum and $\eta$ of the
103	friis	1.1	tau--candidates. These variables are included as their correlations with other
104			observables can increase the separation power of the ensemble of observables.
105			For example, the opening angle in $\Delta R$ for signal tau--candidates is
106			inversely related to the transverse momentum, while for background events the
107			correlation is very small (see~\cite{DavisTau}). In the training signal and
108			background samples, there is significant discrimination power in the $P_T$
109			spectrum. However, it is desirable to eliminate any systematic dependence of
110	friis	1.2	the neural network output on $P_T$ and $\eta$, as in use the TaNC will be
111			presented with tau--candidates whose $P_T-\eta$ spectrum will be analysis
112			dependent. The dependence on $P_T$ and $\eta$ is removed by applying a $P_T$ and
113			$\eta$ dependent weight to the tau--candidates when training the neural nets.
114
115			The weights are defined such that in any region in $P_T-\eta$ where the signal
116			and background probability density function are different, the sample with
117			higher probability density is weighted such that the samples have identical
118			$P_T-\eta$ probability distributions. This removes regions of $P_T-\eta$ space
119			where the training sample is exclusively signal or background. The weights are
120			computed by
121			\begin{align*}
122			W(P_T, \eta) &= {\rm less}(p_{sig}(P_T, \eta), p_{bkg}(P_T, \eta))\\
123			w_{sig}(P_T, \eta) &= W(P_T, \eta)/p_{sig}(P_T, \eta) \\
124			w_{bkg}(P_T, \eta) &= W(P_T, \eta)/p_{bkg}(P_T, \eta)
125			\end{align*}
126			where $p_{sig}(P_T,\eta)$ and $p_{bkg}(P_T,\eta)$ are the probility densities of
127			the signal and background samples after the ``leading pion'' and decay mode
128			selections. Figure~\ref{fig:nnTrainingWeights} shows the signal and background
129			training $P_T$ distributions before and after the weighting is applied.
130
131
132			\begin{figure}[t]
133			\setlength{\unitlength}{1mm}
134			\begin{center}
135			\begin{picture}(150,60)(0,0)
136			\put(10.5, 2){
137			\mbox{\includegraphics*[height=58mm]{figures/training_weights_unweighted.pdf}}}
138			\put(86.0, 2){
139			\mbox{\includegraphics*[height=58mm]{figures/training_weights_weighted.pdf}}}
140			%\put(-5.5, 112.5){\small (a)}
141			%\put(72.0, 112.5){\small (b)}
142			%\put(-5.5, 54.5){\small (c)}
143			%\put(72.0, 54.5){\small (d)}
144			\end{picture}
145	friis	1.4	\caption{Transverse momentum spectrum of signal and background
146	friis	1.2	tau--candidates used in neural net training before (left) and after (right) the
147			application of $P_T-\eta$ dependent weight function. Application of the weights
148			lowers the training significance of tau--candidates in regions of $P_T-\eta$
149			phase space where either the signal or background samples has an excess of
150			events. }
151			\label{fig:nnTrainingWeights}
152			\end{center}
153			\end{figure}
154