ViewVC Help
View File | Revision Log | Show Annotations | Root Listing
root/cvsroot/UserCode/Friis/TancNote/note/tanc_nn_training.tex
Revision: 1.11
Committed: Thu Apr 29 23:23:12 2010 UTC (15 years ago) by friis
Content type: application/x-tex
Branch: MAIN
CVS Tags: HEAD
Changes since 1.10: +20 -11 lines
Error occurred while calculating annotation data.
Log Message:
Final version

File Contents

# Content
1 The samples used to train the TaNC neural networks are typical of the signals
2 and backgrounds found in common physics analyses using taus. The signal--type
3 training sample is composed of reconstructed tau--candidates that are matched
4 to generator level hadronic tau decays coming from simulated $Z \rightarrow
5 \tau^{+}\tau^{-}$ events. The background training sample consists of
6 reconstructed tau--candidates in simulated QCD $2\rightarrow2$ hard scattering
7 events. The QCD $P_T$ spectrum is steeply falling, and to obtain sufficient
8 statistics across a broad range of $P_T$ the sample is split into different
9 $\hat P_{T}$ bins. Each binned QCD sample imposes a generator level cut on the
10 transverse momentum of the hard interaction. During the evaluation of discrimination
11 performance the QCD samples are weighted according to their respective
12 integrated luminosities to remove any effect of the binning.
13
14 The signal and background samples are split into five subsamples corresponding
15 to each reconstructed decay mode. An additional selection is applied to each
16 subsample by requiring a ``leading pion'': either a charged hadron or gamma
17 candidate with transverse momentum greater than 5 GeV$/c$. A large number of
18 QCD training events is required as both the leading pion selection and the
19 requirement that the decay mode match one of the dominant modes given in table
20 ~\ref{tab:decay_modes} are effective discriminants. For each subsample,
21 80\% of the signal and background tau--candidates are used for training the neural
22 networks by the TMVA software, with half (40\%) used as a validation sample
23 used to ensure the neural network is not over--trained. The number of signal and background entries
24 used for training and validation in each decay mode subsample is given in table ~\ref{tab:trainingEvents}.
25
26 %Chained 100 signal files.
27 %Chained 208 background files.
28 %Total signal entries: 874266
29 %Total background entries: 9526176
30 %Pruning non-relevant entries.
31 %After pruning, 584895 signal and 644315 background entries remain.
32 %**********************************************************************************
33 %*********************************** Summary **************************************
34 %**********************************************************************************
35 %* NumEvents with weight > 0 (Total NumEvents) *
36 %*--------------------------------------------------------------------------------*
37 %*shrinkingConePFTauDecayModeProducer ThreeProngNoPiZero: Signal: 53257(53271) Background:155793(155841)
38 %*shrinkingConePFTauDecayModeProducer ThreeProngOnePiZero: Signal: 13340(13342) Background:135871(135942)
39 %*shrinkingConePFTauDecayModeProducer OneProngTwoPiZero: Signal: 34780(34799) Background:51181(51337)
40 %*shrinkingConePFTauDecayModeProducer OneProngOnePiZero: Signal: 136464(138171) Background:137739(139592)
41 %*shrinkingConePFTauDecayModeProducer OneProngNoPiZero: Signal: 300951(345312) Background:144204(161603)
42
43 \begin{table}
44 \centering
45 \begin{tabular}{lcc}
46 %\multirow{2}{*}{} & \multicolumn{2}{c}{Events} \\
47 & Signal & Background \\
48 \hline
49 Total number of tau--candidates & 874266 & 9526176 \\
50 Tau--candidates passing preselection & 584895 & 644315 \\
51 Tau--candidates with $W(P_T,\eta)>0$ & 538792 & 488917 \\
52 \hline
53 Decay Mode & \multicolumn{2}{c}{Training Events} \\
54 \hline
55 $\pi^{-}$ & 300951 & 144204 \\
56 $\pi^{-}\pi^0$ & 135464 & 137739 \\
57 $\pi^{-}\pi^0\pi^0$ & 34780 & 51181 \\
58 $\pi^{-}\pi^{-}\pi^{+}$ & 53247 & 155793 \\
59 $\pi^{-}\pi^{-}\pi^{+}\pi^0$ & 13340 & 135871 \\
60 \end{tabular}
61 \label{tab:trainingEvents}
62 \caption{Number of events used for neural network training and validation for each
63 selected decay mode.}
64 \end{table}
65
66 The remaining 20\% of the signal and background samples are
67 reserved as a statistically independent sample to evaluate the performance of
68 the neural nets after the training is completed. The TaNC uses the ``MLP''
69 neural network implementation provided by the TMVA software package, described
70 in ~\cite{TMVA}. The ``MLP'' classifier is a feed-forward artificial neural
71 network. There are two layers of hidden nodes and a single node in the output
72 layer. The hyperbolic tangent function is used for the neuron activation
73 function.
74
75 The neural networks used in the TaNC have two hidden layers and single node in
76 the output layers. The number of nodes in the first and second hidden layers
77 are chosen to be $N+1$ and $2N+1$, respectively, where $N$ is the number of
78 input observables for that neural network. According to the Kolmogorov's
79 theorem~\cite{Kolmogorov}, any continuous function $g(x)$ defined on a vector
80 space of dimension $d$ spanned by $x$ can be represented by
81 \begin{equation}
82 g(x) = \sum_{j=1}^{j=2d+1} \Phi_j \left(\sum_{i=1}^{d} \phi_i(x) \right)
83 \label{eq:Kolmogorov}
84 \end{equation}
85 for suitably chosen functions for $\Phi_j$ and $\phi_j$. As the form of
86 equation~\ref{eq:Kolmogorov} is similar to the topology of a two hidden--layer
87 neural network, Kolmogorov's theorem suggests that \emph{any} classification
88 problem can be solved with a neural network with two hidden layers containing
89 the appropriate number of nodes.
90
91 The neural network is trained for 500 epochs. At ten epoch intervals, the
92 neural network error is computed using the validation sample to check for
93 over--training (see figure~\ref{fig:overTrainCheck}). The neural network error
94 $E$ is defined~\cite{TMVA} as
95 \begin{equation}
96 E = \frac{1}{2} \sum_{i=1}^N (y_{ANN,i} - \hat y_i)^2
97 \label{eq:NNerrorFunc}
98 %note - not right for weighted dists?
99 \end{equation}
100 where $N$ is the number of training events, $y_{ANN,i}$ is the neural network output
101 for the $i$th training event, and $y_i$ is the desired (-1 for background, 1 for signal) output
102 the $i$th event. No evidence of over--training is observed.
103
104 \begin{figure}[thbp]
105 \setlength{\unitlength}{1mm}
106 \begin{center}
107 \begin{picture}(150, 195)(0,0)
108 \put(0.5, 130)
109 {\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_OneProngNoPiZero.pdf}}}
110 \put(65, 130)
111 {\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_OneProngOnePiZero.pdf}}}
112 \put(0.5, 65)
113 {\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_OneProngTwoPiZero.pdf}}}
114 \put(65, 65)
115 {\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_ThreeProngNoPiZero.pdf}}}
116 \put(33, 0)
117 {\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_ThreeProngOnePiZero.pdf}}}
118 \end{picture}
119 \caption{
120 Neural network classification error for training (solid red) and testing
121 (dashed blue) samples at ten epoch intervals over the 500 training epochs for each
122 decay mode neural network. The vertical axis represents the classification
123 error, defined by equation~\ref{eq:NNerrorFunc}. N.B. that the choice of
124 hyperbolic tangent for neuron activation functions results in the desired
125 outputs for signal and background to be 1 and -1, respectively. This results
126 in the computed neural network error being larger by a factor of four than
127 the case where the desired outputs are (0, 1). Classifier over--training
128 would be evidenced by divergence of the classification error of the training
129 and testing samples, indicating that the neural net was optimizing about
130 statistical fluctuations in the training sample.
131 }
132 \label{fig:overTrainCheck}
133 \end{center}
134 \end{figure}
135
136
137 The neural networks use as input observables the transverse momentum and $\eta$
138 of the tau--candidates. These observables are included as their correlations
139 with other observables can increase the separation power of the ensemble of
140 observables. For example, the opening angle in $\Delta R$ for signal
141 tau--candidates is inversely related to the transverse momentum, while for
142 background events the correlation is very small~\cite{DavisTau}. In the
143 training signal and background samples, there is significant discrimination
144 power in the $P_T$ spectrum. However, it is desirable to eliminate any
145 systematic dependence of the neural network output on $P_T$ and $\eta$, as in
146 practice the TaNC will be presented with tau--candidates whose $P_T-\eta$
147 spectrum will be analysis dependent. The dependence on $P_T$ and $\eta$ is
148 removed by applying a $P_T$ and $\eta$ dependent weight to the tau--candidates
149 when training the neural nets.
150
151 The weights are defined such that in any region in the vector space spanned by
152 $P_T$ and $\eta$ where the signal sample and background sample probability
153 density functions are different, the sample with higher probability density is
154 weighted such that the samples have identical $P_T-\eta$ probability
155 distributions. This removes regions of $P_T-\eta$ space where the training
156 sample is exclusively signal or background. The weights are computed according to
157 \begin{align*}
158 W(P_T, \eta) &= {\rm less}(p_{sig}(P_T, \eta), p_{bkg}(P_T, \eta))\\
159 w_{sig}(P_T, \eta) &= W(P_T, \eta)/p_{sig}(P_T, \eta) \\
160 w_{bkg}(P_T, \eta) &= W(P_T, \eta)/p_{bkg}(P_T, \eta)
161 \end{align*}
162 where $p_{sig}(P_T,\eta)$ and $p_{bkg}(P_T,\eta)$ are the probability densities of
163 the signal and background samples after the ``leading pion'' and dominant decay mode
164 selections. Figure~\ref{fig:nnTrainingWeights} shows the signal and background
165 training $P_T$ distributions before and after the weighting is applied.
166
167
168 \begin{figure}[thbp]
169 \setlength{\unitlength}{1mm}
170 \begin{center}
171 \begin{picture}(150,60)(0,0)
172 \put(10.5, 2){
173 \mbox{\includegraphics*[height=58mm]{figures/training_weights_unweighted.pdf}}}
174 \put(86.0, 2){
175 \mbox{\includegraphics*[height=58mm]{figures/training_weights_weighted.pdf}}}
176 %\put(-5.5, 112.5){\small (a)}
177 %\put(72.0, 112.5){\small (b)}
178 %\put(-5.5, 54.5){\small (c)}
179 %\put(72.0, 54.5){\small (d)}
180 \end{picture}
181 \caption{Transverse momentum spectrum of signal and background
182 tau--candidates used in neural net training before (left) and after (right) the
183 application of $P_T-\eta$ dependent weight function. Application of the weights
184 lowers the training significance of tau--candidates in regions of $P_T-\eta$
185 phase space where either the signal or background samples has an excess of
186 events. }
187 \label{fig:nnTrainingWeights}
188 \end{center}
189 \end{figure}
190