ViewVC Help
View File | Revision Log | Show Annotations | Root Listing
root/cvsroot/UserCode/Friis/TancNote/note/tanc_nn_training.tex
Revision: 1.4
Committed: Thu Apr 22 16:13:12 2010 UTC (15 years ago) by friis
Content type: application/x-tex
Branch: MAIN
Changes since 1.3: +38 -7 lines
Log Message:
Adding NN observables inputs and other miscellaneous stuff

File Contents

# User Rev Content
1 friis 1.1 The samples used to train the TaNC neural networks are typical of the signals
2     and backgrounds found in common physics analyses using taus. The signal--type
3     training sample is composed of reconstucted tau--candidates that are matched to
4     generator level hadronic tau decays coming from simulated $Z \rightarrow
5     \tau^{+}\tau^{-}$ events. The background training sample consists of reconstructed
6 friis 1.2 tau--candidates in simulated QCD $2\rightarrow2$ hard scattering events. The
7     QCD $P_T$ spectrum is steeply falling, and to obtain sufficient statistics
8     across a broad range of $P_T$ the sample is split into different $\hat P_{T}$
9     bins. Each QCD sub-sample imposes a generator level cut on the transverse
10     energy of the hard interaction.
11    
12 friis 1.1 The signal and background samples are split into five subsamples corresponding
13     to each reconstructed decay mode. An additional selection is applied to each
14     subsample by requiring a ``leading pion'': either a charged hadron or gamma
15     candidate with transverse momentum greater than 5 GeV$/c$. A large number of
16     QCD training events is required as the leading pion selection and the
17     requirement that the decay mode match one of the dominant modes given in table
18     ~\ref{tab:decay_modes} are both effective discriminants. For each subsample,
19     10000 signal and background tau--candidates are reserved to be used internally
20     by the TMVA software to test for over--training. The number of signal and
21     background entries used for each decay mode subsample is given in table
22     ~\ref{tab:trainingEvents}.
23    
24 friis 1.2 %Chained 100 signal files.
25     %Chained 208 background files.
26     %Total signal entries: 874266
27     %Total background entries: 9526176
28     %Pruning non-relevant entries.
29     %After pruning, 584895 signal and 644315 background entries remain.
30     %**********************************************************************************
31     %*********************************** Summary **************************************
32     %**********************************************************************************
33     %* NumEvents with weight > 0 (Total NumEvents) *
34     %*--------------------------------------------------------------------------------*
35     %*shrinkingConePFTauDecayModeProducer ThreeProngNoPiZero: Signal: 53257(53271) Background:155793(155841)
36     %*shrinkingConePFTauDecayModeProducer ThreeProngOnePiZero: Signal: 13340(13342) Background:135871(135942)
37     %*shrinkingConePFTauDecayModeProducer OneProngTwoPiZero: Signal: 34780(34799) Background:51181(51337)
38     %*shrinkingConePFTauDecayModeProducer OneProngOnePiZero: Signal: 136464(138171) Background:137739(139592)
39     %*shrinkingConePFTauDecayModeProducer OneProngNoPiZero: Signal: 300951(345312) Background:144204(161603)
40    
41 friis 1.1 \begin{table}
42     \centering
43 friis 1.2 \begin{tabular}{lcc}
44     %\multirow{2}{*}{} & \multicolumn{2}{c}{Events} \\
45     & Signal & Background \\
46 friis 1.1 \hline
47 friis 1.2 Total number of tau--candidates & 874266 & 9526176 \\
48     Tau--candidates passing preselection & 584895 & 644315 \\
49     Tau--candidates with $W(P_T,\eta)>0$ & 538792 & 488917 \\
50 friis 1.1 \hline
51 friis 1.2 Decay Mode & \multicolumn{2}{c}{Training Events} \\
52 friis 1.1 \hline
53 friis 1.2 $\pi^{-}$ & 300951 & 144204 \\
54     $\pi^{-}\pi^0$ & 135464 & 137739 \\
55     $\pi^{-}\pi^0\pi^0$ & 34780 & 51181 \\
56     $\pi^{-}\pi^{-}\pi^{+}$ & 53247 & 155793 \\
57     $\pi^{-}\pi^{-}\pi^{+}\pi^0$ & 13340 & 135871 \\
58 friis 1.1 \end{tabular}
59     \label{tab:trainingEvents}
60     \caption{Number of events used for neural network training for each
61     selected decay mode.}
62     \end{table}
63    
64 friis 1.4 In both signal and background samples, 20\% of the events are reserved as a
65     statistically independent sample to evaluate the performance of the neural nets
66     after the training is completed. The TaNC uses the ``MLP'' neural network
67     implementation provided by the TMVA software package, described in ~\cite{TMVA}.
68     The ``MLP'' classifier is a feed-forward artificial neural network. The neural
69     network has two layers of hidden nodes and a single node in the output layer.
70     The number of hidden nodes in the first and second layers are determined by
71     Kolmogorov's theorem~\cite{kolmogorovsTheorem}; the number of hidden nodes in the first
72     (second) layer is $N+1 (2*N+1)$, where $N$ is the number of input observables.
73     The neural network is trained for 500 epochs, and no evidence (see
74     figure~\ref{fig:overTrainCheck} of overtraining is observed.
75    
76     \begin{figure}[t]
77     \setlength{\unitlength}{1mm}
78     \begin{center}
79     \begin{picture}(150, 195)(0,0)
80     \put(0.5, 130)
81     {\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_OneProngNoPiZero.pdf}}}
82     \put(65, 130)
83     {\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_OneProngOnePiZero.pdf}}}
84     \put(0.5, 65)
85     {\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_OneProngTwoPiZero.pdf}}}
86     \put(65, 65)
87     {\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_ThreeProngNoPiZero.pdf}}}
88     \put(33, 0)
89     {\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_ThreeProngOnePiZero.pdf}}}
90     \end{picture}
91     \caption{Neural network classification error for training and testing sample
92     over 500 training epochs for each decay mode neural network.
93     Classifier over--training would be evidenced by divergence of the classification error
94     of the training and testing samples, indicating that the neural net was
95     optimizing about statistical fluctations in the training sample.
96     }
97     \label{fig:overTrainCheck}
98     \end{center}
99     \end{figure}
100    
101 friis 1.1
102 friis 1.2 The neural nets uses as input variables the transverse momentum and $\eta$ of the
103 friis 1.1 tau--candidates. These variables are included as their correlations with other
104     observables can increase the separation power of the ensemble of observables.
105     For example, the opening angle in $\Delta R$ for signal tau--candidates is
106     inversely related to the transverse momentum, while for background events the
107     correlation is very small (see~\cite{DavisTau}). In the training signal and
108     background samples, there is significant discrimination power in the $P_T$
109     spectrum. However, it is desirable to eliminate any systematic dependence of
110 friis 1.2 the neural network output on $P_T$ and $\eta$, as in use the TaNC will be
111     presented with tau--candidates whose $P_T-\eta$ spectrum will be analysis
112     dependent. The dependence on $P_T$ and $\eta$ is removed by applying a $P_T$ and
113     $\eta$ dependent weight to the tau--candidates when training the neural nets.
114    
115     The weights are defined such that in any region in $P_T-\eta$ where the signal
116     and background probability density function are different, the sample with
117     higher probability density is weighted such that the samples have identical
118     $P_T-\eta$ probability distributions. This removes regions of $P_T-\eta$ space
119     where the training sample is exclusively signal or background. The weights are
120     computed by
121     \begin{align*}
122     W(P_T, \eta) &= {\rm less}(p_{sig}(P_T, \eta), p_{bkg}(P_T, \eta))\\
123     w_{sig}(P_T, \eta) &= W(P_T, \eta)/p_{sig}(P_T, \eta) \\
124     w_{bkg}(P_T, \eta) &= W(P_T, \eta)/p_{bkg}(P_T, \eta)
125     \end{align*}
126     where $p_{sig}(P_T,\eta)$ and $p_{bkg}(P_T,\eta)$ are the probility densities of
127     the signal and background samples after the ``leading pion'' and decay mode
128     selections. Figure~\ref{fig:nnTrainingWeights} shows the signal and background
129     training $P_T$ distributions before and after the weighting is applied.
130    
131    
132     \begin{figure}[t]
133     \setlength{\unitlength}{1mm}
134     \begin{center}
135     \begin{picture}(150,60)(0,0)
136     \put(10.5, 2){
137     \mbox{\includegraphics*[height=58mm]{figures/training_weights_unweighted.pdf}}}
138     \put(86.0, 2){
139     \mbox{\includegraphics*[height=58mm]{figures/training_weights_weighted.pdf}}}
140     %\put(-5.5, 112.5){\small (a)}
141     %\put(72.0, 112.5){\small (b)}
142     %\put(-5.5, 54.5){\small (c)}
143     %\put(72.0, 54.5){\small (d)}
144     \end{picture}
145 friis 1.4 \caption{Transverse momentum spectrum of signal and background
146 friis 1.2 tau--candidates used in neural net training before (left) and after (right) the
147     application of $P_T-\eta$ dependent weight function. Application of the weights
148     lowers the training significance of tau--candidates in regions of $P_T-\eta$
149     phase space where either the signal or background samples has an excess of
150     events. }
151     \label{fig:nnTrainingWeights}
152     \end{center}
153     \end{figure}
154