ViewVC Help
View File | Revision Log | Show Annotations | Root Listing
root/cvsroot/UserCode/Friis/TancNote/note/tanc_nn_training.tex
Revision: 1.7
Committed: Tue Apr 27 05:13:16 2010 UTC (15 years ago) by friis
Content type: application/x-tex
Branch: MAIN
Changes since 1.6: +13 -11 lines
Log Message:
Almost complete

File Contents

# User Rev Content
1 friis 1.1 The samples used to train the TaNC neural networks are typical of the signals
2     and backgrounds found in common physics analyses using taus. The signal--type
3 friis 1.7 training sample is composed of reconstructed tau--candidates that are matched
4     to generator level hadronic tau decays coming from simulated $Z \rightarrow
5     \tau^{+}\tau^{-}$ events. The background training sample consists of
6     reconstructed tau--candidates in simulated QCD $2\rightarrow2$ hard scattering
7     events. The QCD $P_T$ spectrum is steeply falling, and to obtain sufficient
8     statistics across a broad range of $P_T$ the sample is split into different
9     $\hat P_{T}$ bins. Each QCD sub--sample imposes a generator level cut on the
10     transverse energy of the hard interaction. During evaluation of discrimination
11     performance the QCD sub--samples are weighted according to their respective
12     integrated luminosities to remove any effect of the binning.
13 friis 1.2
14 friis 1.1 The signal and background samples are split into five subsamples corresponding
15     to each reconstructed decay mode. An additional selection is applied to each
16     subsample by requiring a ``leading pion'': either a charged hadron or gamma
17     candidate with transverse momentum greater than 5 GeV$/c$. A large number of
18     QCD training events is required as the leading pion selection and the
19     requirement that the decay mode match one of the dominant modes given in table
20     ~\ref{tab:decay_modes} are both effective discriminants. For each subsample,
21 friis 1.7 half the signal and background tau--candidates are reserved to be used internally
22 friis 1.1 by the TMVA software to test for over--training. The number of signal and
23     background entries used for each decay mode subsample is given in table
24     ~\ref{tab:trainingEvents}.
25    
26 friis 1.2 %Chained 100 signal files.
27     %Chained 208 background files.
28     %Total signal entries: 874266
29     %Total background entries: 9526176
30     %Pruning non-relevant entries.
31     %After pruning, 584895 signal and 644315 background entries remain.
32     %**********************************************************************************
33     %*********************************** Summary **************************************
34     %**********************************************************************************
35     %* NumEvents with weight > 0 (Total NumEvents) *
36     %*--------------------------------------------------------------------------------*
37     %*shrinkingConePFTauDecayModeProducer ThreeProngNoPiZero: Signal: 53257(53271) Background:155793(155841)
38     %*shrinkingConePFTauDecayModeProducer ThreeProngOnePiZero: Signal: 13340(13342) Background:135871(135942)
39     %*shrinkingConePFTauDecayModeProducer OneProngTwoPiZero: Signal: 34780(34799) Background:51181(51337)
40     %*shrinkingConePFTauDecayModeProducer OneProngOnePiZero: Signal: 136464(138171) Background:137739(139592)
41     %*shrinkingConePFTauDecayModeProducer OneProngNoPiZero: Signal: 300951(345312) Background:144204(161603)
42    
43 friis 1.1 \begin{table}
44     \centering
45 friis 1.2 \begin{tabular}{lcc}
46     %\multirow{2}{*}{} & \multicolumn{2}{c}{Events} \\
47     & Signal & Background \\
48 friis 1.1 \hline
49 friis 1.2 Total number of tau--candidates & 874266 & 9526176 \\
50     Tau--candidates passing preselection & 584895 & 644315 \\
51     Tau--candidates with $W(P_T,\eta)>0$ & 538792 & 488917 \\
52 friis 1.1 \hline
53 friis 1.2 Decay Mode & \multicolumn{2}{c}{Training Events} \\
54 friis 1.1 \hline
55 friis 1.2 $\pi^{-}$ & 300951 & 144204 \\
56     $\pi^{-}\pi^0$ & 135464 & 137739 \\
57     $\pi^{-}\pi^0\pi^0$ & 34780 & 51181 \\
58     $\pi^{-}\pi^{-}\pi^{+}$ & 53247 & 155793 \\
59     $\pi^{-}\pi^{-}\pi^{+}\pi^0$ & 13340 & 135871 \\
60 friis 1.1 \end{tabular}
61     \label{tab:trainingEvents}
62     \caption{Number of events used for neural network training for each
63     selected decay mode.}
64     \end{table}
65    
66 friis 1.4 In both signal and background samples, 20\% of the events are reserved as a
67     statistically independent sample to evaluate the performance of the neural nets
68     after the training is completed. The TaNC uses the ``MLP'' neural network
69 friis 1.5 implementation provided by the TMVA software package, described in
70     ~\cite{TMVA}. The ``MLP'' classifier is a feed-forward artificial neural
71     network. There are two layers of hidden nodes and a single node in
72     the output layer. The hyperbolic tangent function is used for the neuron activation function.
73     The number of hidden nodes in the first and second layers
74 friis 1.6 are chosen according to Kolmogorov's theorem~\cite{kolmogorovsTheorem}; the number of
75     hidden nodes in the first (second) layer is $N+1 (2N+1)$, where $N$ is the
76 friis 1.5 number of input observables. The neural network is trained for 500 epochs. At
77     ten epoch intervals, the neural network error is computed to check for
78 friis 1.6 overtraining (see figure~\ref{fig:overTrainCheck}). The neural network error $E$ is
79 friis 1.5 defined~\cite{TMVA} as
80     \begin{equation}
81     E = \frac{1}{2} \sum_{i=1}^N (y_{ANN,i} - \hat y_i)^2
82     \label{eq:NNerrorFunc}
83     %note - not right for weighted dists?
84     \end{equation}
85     where $N$ is the number of training events, $y_{ANN,i}$ is the neural network output
86     for the $i$th training event, and $y_i$ is the desired (-1 for background, 1 for signal) output
87 friis 1.7 the $i$th event. No evidence of over--training is observed.
88 friis 1.4
89     \begin{figure}[t]
90     \setlength{\unitlength}{1mm}
91     \begin{center}
92     \begin{picture}(150, 195)(0,0)
93     \put(0.5, 130)
94     {\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_OneProngNoPiZero.pdf}}}
95     \put(65, 130)
96     {\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_OneProngOnePiZero.pdf}}}
97     \put(0.5, 65)
98     {\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_OneProngTwoPiZero.pdf}}}
99     \put(65, 65)
100     {\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_ThreeProngNoPiZero.pdf}}}
101     \put(33, 0)
102     {\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_ThreeProngOnePiZero.pdf}}}
103     \end{picture}
104 friis 1.5 \caption{
105 friis 1.6 Neural network classification error for training (solid red) and testing
106     (dashed blue) samples at ten epoch intervals over the 500 training epochs for each
107 friis 1.5 decay mode neural network. The vertical axis represents the classification
108     error, defined by equation~\ref{eq:NNerrorFunc}. N.B. that the choice of
109     hyperbolic tangent for neuron activation functions results in the desired
110 friis 1.6 outputs for signal and background to be -1 and 1, respectively. This results
111 friis 1.5 in the computed neural network error being larger by a factor of four than
112     the case where the desired outputs are (0, 1). Classifier over--training
113     would be evidenced by divergence of the classification error of the training
114     and testing samples, indicating that the neural net was optimizing about
115 friis 1.6 statistical fluctuations in the training sample.
116 friis 1.4 }
117     \label{fig:overTrainCheck}
118     \end{center}
119     \end{figure}
120    
121 friis 1.1
122 friis 1.2 The neural nets uses as input variables the transverse momentum and $\eta$ of the
123 friis 1.1 tau--candidates. These variables are included as their correlations with other
124     observables can increase the separation power of the ensemble of observables.
125     For example, the opening angle in $\Delta R$ for signal tau--candidates is
126     inversely related to the transverse momentum, while for background events the
127 friis 1.6 correlation is very small~\cite{DavisTau}. In the training signal and
128 friis 1.1 background samples, there is significant discrimination power in the $P_T$
129     spectrum. However, it is desirable to eliminate any systematic dependence of
130 friis 1.2 the neural network output on $P_T$ and $\eta$, as in use the TaNC will be
131     presented with tau--candidates whose $P_T-\eta$ spectrum will be analysis
132     dependent. The dependence on $P_T$ and $\eta$ is removed by applying a $P_T$ and
133     $\eta$ dependent weight to the tau--candidates when training the neural nets.
134    
135     The weights are defined such that in any region in $P_T-\eta$ where the signal
136     and background probability density function are different, the sample with
137     higher probability density is weighted such that the samples have identical
138     $P_T-\eta$ probability distributions. This removes regions of $P_T-\eta$ space
139     where the training sample is exclusively signal or background. The weights are
140     computed by
141     \begin{align*}
142     W(P_T, \eta) &= {\rm less}(p_{sig}(P_T, \eta), p_{bkg}(P_T, \eta))\\
143     w_{sig}(P_T, \eta) &= W(P_T, \eta)/p_{sig}(P_T, \eta) \\
144     w_{bkg}(P_T, \eta) &= W(P_T, \eta)/p_{bkg}(P_T, \eta)
145     \end{align*}
146 friis 1.7 where $p_{sig}(P_T,\eta)$ and $p_{bkg}(P_T,\eta)$ are the probability densities of
147 friis 1.2 the signal and background samples after the ``leading pion'' and decay mode
148     selections. Figure~\ref{fig:nnTrainingWeights} shows the signal and background
149     training $P_T$ distributions before and after the weighting is applied.
150    
151    
152     \begin{figure}[t]
153     \setlength{\unitlength}{1mm}
154     \begin{center}
155     \begin{picture}(150,60)(0,0)
156     \put(10.5, 2){
157     \mbox{\includegraphics*[height=58mm]{figures/training_weights_unweighted.pdf}}}
158     \put(86.0, 2){
159     \mbox{\includegraphics*[height=58mm]{figures/training_weights_weighted.pdf}}}
160     %\put(-5.5, 112.5){\small (a)}
161     %\put(72.0, 112.5){\small (b)}
162     %\put(-5.5, 54.5){\small (c)}
163     %\put(72.0, 54.5){\small (d)}
164     \end{picture}
165 friis 1.4 \caption{Transverse momentum spectrum of signal and background
166 friis 1.2 tau--candidates used in neural net training before (left) and after (right) the
167     application of $P_T-\eta$ dependent weight function. Application of the weights
168     lowers the training significance of tau--candidates in regions of $P_T-\eta$
169     phase space where either the signal or background samples has an excess of
170     events. }
171     \label{fig:nnTrainingWeights}
172     \end{center}
173     \end{figure}
174