ViewVC Help
View File | Revision Log | Show Annotations | Root Listing
root/cvsroot/UserCode/Friis/TancNote/note/tanc_nn_training.tex
Revision: 1.5
Committed: Fri Apr 23 00:07:48 2010 UTC (15 years ago) by friis
Content type: application/x-tex
Branch: MAIN
Changes since 1.4: +31 -13 lines
Log Message:
Some bug fixes, and improving the description of the neural network error

File Contents

# User Rev Content
1 friis 1.1 The samples used to train the TaNC neural networks are typical of the signals
2     and backgrounds found in common physics analyses using taus. The signal--type
3     training sample is composed of reconstucted tau--candidates that are matched to
4     generator level hadronic tau decays coming from simulated $Z \rightarrow
5     \tau^{+}\tau^{-}$ events. The background training sample consists of reconstructed
6 friis 1.2 tau--candidates in simulated QCD $2\rightarrow2$ hard scattering events. The
7     QCD $P_T$ spectrum is steeply falling, and to obtain sufficient statistics
8     across a broad range of $P_T$ the sample is split into different $\hat P_{T}$
9     bins. Each QCD sub-sample imposes a generator level cut on the transverse
10     energy of the hard interaction.
11    
12 friis 1.1 The signal and background samples are split into five subsamples corresponding
13     to each reconstructed decay mode. An additional selection is applied to each
14     subsample by requiring a ``leading pion'': either a charged hadron or gamma
15     candidate with transverse momentum greater than 5 GeV$/c$. A large number of
16     QCD training events is required as the leading pion selection and the
17     requirement that the decay mode match one of the dominant modes given in table
18     ~\ref{tab:decay_modes} are both effective discriminants. For each subsample,
19     10000 signal and background tau--candidates are reserved to be used internally
20     by the TMVA software to test for over--training. The number of signal and
21     background entries used for each decay mode subsample is given in table
22     ~\ref{tab:trainingEvents}.
23    
24 friis 1.2 %Chained 100 signal files.
25     %Chained 208 background files.
26     %Total signal entries: 874266
27     %Total background entries: 9526176
28     %Pruning non-relevant entries.
29     %After pruning, 584895 signal and 644315 background entries remain.
30     %**********************************************************************************
31     %*********************************** Summary **************************************
32     %**********************************************************************************
33     %* NumEvents with weight > 0 (Total NumEvents) *
34     %*--------------------------------------------------------------------------------*
35     %*shrinkingConePFTauDecayModeProducer ThreeProngNoPiZero: Signal: 53257(53271) Background:155793(155841)
36     %*shrinkingConePFTauDecayModeProducer ThreeProngOnePiZero: Signal: 13340(13342) Background:135871(135942)
37     %*shrinkingConePFTauDecayModeProducer OneProngTwoPiZero: Signal: 34780(34799) Background:51181(51337)
38     %*shrinkingConePFTauDecayModeProducer OneProngOnePiZero: Signal: 136464(138171) Background:137739(139592)
39     %*shrinkingConePFTauDecayModeProducer OneProngNoPiZero: Signal: 300951(345312) Background:144204(161603)
40    
41 friis 1.1 \begin{table}
42     \centering
43 friis 1.2 \begin{tabular}{lcc}
44     %\multirow{2}{*}{} & \multicolumn{2}{c}{Events} \\
45     & Signal & Background \\
46 friis 1.1 \hline
47 friis 1.2 Total number of tau--candidates & 874266 & 9526176 \\
48     Tau--candidates passing preselection & 584895 & 644315 \\
49     Tau--candidates with $W(P_T,\eta)>0$ & 538792 & 488917 \\
50 friis 1.1 \hline
51 friis 1.2 Decay Mode & \multicolumn{2}{c}{Training Events} \\
52 friis 1.1 \hline
53 friis 1.2 $\pi^{-}$ & 300951 & 144204 \\
54     $\pi^{-}\pi^0$ & 135464 & 137739 \\
55     $\pi^{-}\pi^0\pi^0$ & 34780 & 51181 \\
56     $\pi^{-}\pi^{-}\pi^{+}$ & 53247 & 155793 \\
57     $\pi^{-}\pi^{-}\pi^{+}\pi^0$ & 13340 & 135871 \\
58 friis 1.1 \end{tabular}
59     \label{tab:trainingEvents}
60     \caption{Number of events used for neural network training for each
61     selected decay mode.}
62     \end{table}
63    
64 friis 1.4 In both signal and background samples, 20\% of the events are reserved as a
65     statistically independent sample to evaluate the performance of the neural nets
66     after the training is completed. The TaNC uses the ``MLP'' neural network
67 friis 1.5 implementation provided by the TMVA software package, described in
68     ~\cite{TMVA}. The ``MLP'' classifier is a feed-forward artificial neural
69     network. There are two layers of hidden nodes and a single node in
70     the output layer. The hyperbolic tangent function is used for the neuron activation function.
71     The number of hidden nodes in the first and second layers
72     are determined by Kolmogorov's theorem~\cite{kolmogorovsTheorem}; the number of
73     hidden nodes in the first (second) layer is $N+1 (2*N+1)$, where $N$ is the
74     number of input observables. The neural network is trained for 500 epochs. At
75     ten epoch intervals, the neural network error is computed to check for
76     overtraining. (see figure~\ref{fig:overTrainCheck}) The neural network error $E$ is
77     defined~\cite{TMVA} as
78     \begin{equation}
79     E = \frac{1}{2} \sum_{i=1}^N (y_{ANN,i} - \hat y_i)^2
80     \label{eq:NNerrorFunc}
81     %note - not right for weighted dists?
82     \end{equation}
83     where $N$ is the number of training events, $y_{ANN,i}$ is the neural network output
84     for the $i$th training event, and $y_i$ is the desired (-1 for background, 1 for signal) output
85     the $i$th event. No evidence of overtraining is observed.
86 friis 1.4
87     \begin{figure}[t]
88     \setlength{\unitlength}{1mm}
89     \begin{center}
90     \begin{picture}(150, 195)(0,0)
91     \put(0.5, 130)
92     {\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_OneProngNoPiZero.pdf}}}
93     \put(65, 130)
94     {\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_OneProngOnePiZero.pdf}}}
95     \put(0.5, 65)
96     {\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_OneProngTwoPiZero.pdf}}}
97     \put(65, 65)
98     {\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_ThreeProngNoPiZero.pdf}}}
99     \put(33, 0)
100     {\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_ThreeProngOnePiZero.pdf}}}
101     \end{picture}
102 friis 1.5 \caption{
103     Neural network classification error for training (blue) and testing
104     (red) samples at ten epoch intervals over the 500 training epochs for each
105     decay mode neural network. The vertical axis represents the classification
106     error, defined by equation~\ref{eq:NNerrorFunc}. N.B. that the choice of
107     hyperbolic tangent for neuron activation functions results in the desired
108     outputs for signal and backround to be -1 and 1, respectively. This results
109     in the computed neural network error being larger by a factor of four than
110     the case where the desired outputs are (0, 1). Classifier over--training
111     would be evidenced by divergence of the classification error of the training
112     and testing samples, indicating that the neural net was optimizing about
113     statistical fluctations in the training sample.
114 friis 1.4 }
115     \label{fig:overTrainCheck}
116     \end{center}
117     \end{figure}
118    
119 friis 1.1
120 friis 1.2 The neural nets uses as input variables the transverse momentum and $\eta$ of the
121 friis 1.1 tau--candidates. These variables are included as their correlations with other
122     observables can increase the separation power of the ensemble of observables.
123     For example, the opening angle in $\Delta R$ for signal tau--candidates is
124     inversely related to the transverse momentum, while for background events the
125     correlation is very small (see~\cite{DavisTau}). In the training signal and
126     background samples, there is significant discrimination power in the $P_T$
127     spectrum. However, it is desirable to eliminate any systematic dependence of
128 friis 1.2 the neural network output on $P_T$ and $\eta$, as in use the TaNC will be
129     presented with tau--candidates whose $P_T-\eta$ spectrum will be analysis
130     dependent. The dependence on $P_T$ and $\eta$ is removed by applying a $P_T$ and
131     $\eta$ dependent weight to the tau--candidates when training the neural nets.
132    
133     The weights are defined such that in any region in $P_T-\eta$ where the signal
134     and background probability density function are different, the sample with
135     higher probability density is weighted such that the samples have identical
136     $P_T-\eta$ probability distributions. This removes regions of $P_T-\eta$ space
137     where the training sample is exclusively signal or background. The weights are
138     computed by
139     \begin{align*}
140     W(P_T, \eta) &= {\rm less}(p_{sig}(P_T, \eta), p_{bkg}(P_T, \eta))\\
141     w_{sig}(P_T, \eta) &= W(P_T, \eta)/p_{sig}(P_T, \eta) \\
142     w_{bkg}(P_T, \eta) &= W(P_T, \eta)/p_{bkg}(P_T, \eta)
143     \end{align*}
144     where $p_{sig}(P_T,\eta)$ and $p_{bkg}(P_T,\eta)$ are the probility densities of
145     the signal and background samples after the ``leading pion'' and decay mode
146     selections. Figure~\ref{fig:nnTrainingWeights} shows the signal and background
147     training $P_T$ distributions before and after the weighting is applied.
148    
149    
150     \begin{figure}[t]
151     \setlength{\unitlength}{1mm}
152     \begin{center}
153     \begin{picture}(150,60)(0,0)
154     \put(10.5, 2){
155     \mbox{\includegraphics*[height=58mm]{figures/training_weights_unweighted.pdf}}}
156     \put(86.0, 2){
157     \mbox{\includegraphics*[height=58mm]{figures/training_weights_weighted.pdf}}}
158     %\put(-5.5, 112.5){\small (a)}
159     %\put(72.0, 112.5){\small (b)}
160     %\put(-5.5, 54.5){\small (c)}
161     %\put(72.0, 54.5){\small (d)}
162     \end{picture}
163 friis 1.4 \caption{Transverse momentum spectrum of signal and background
164 friis 1.2 tau--candidates used in neural net training before (left) and after (right) the
165     application of $P_T-\eta$ dependent weight function. Application of the weights
166     lowers the training significance of tau--candidates in regions of $P_T-\eta$
167     phase space where either the signal or background samples has an excess of
168     events. }
169     \label{fig:nnTrainingWeights}
170     \end{center}
171     \end{figure}
172