1 |
friis |
1.1 |
The samples used to train the TaNC neural networks are typical of the signals
|
2 |
|
|
and backgrounds found in common physics analyses using taus. The signal--type
|
3 |
friis |
1.7 |
training sample is composed of reconstructed tau--candidates that are matched
|
4 |
|
|
to generator level hadronic tau decays coming from simulated $Z \rightarrow
|
5 |
|
|
\tau^{+}\tau^{-}$ events. The background training sample consists of
|
6 |
|
|
reconstructed tau--candidates in simulated QCD $2\rightarrow2$ hard scattering
|
7 |
|
|
events. The QCD $P_T$ spectrum is steeply falling, and to obtain sufficient
|
8 |
|
|
statistics across a broad range of $P_T$ the sample is split into different
|
9 |
friis |
1.9 |
$\hat P_{T}$ bins. Each binned QCD sample imposes a generator level cut on the
|
10 |
friis |
1.10 |
transverse momentum of the hard interaction. During the evaluation of discrimination
|
11 |
friis |
1.9 |
performance the QCD samples are weighted according to their respective
|
12 |
friis |
1.7 |
integrated luminosities to remove any effect of the binning.
|
13 |
friis |
1.2 |
|
14 |
friis |
1.1 |
The signal and background samples are split into five subsamples corresponding
|
15 |
|
|
to each reconstructed decay mode. An additional selection is applied to each
|
16 |
|
|
subsample by requiring a ``leading pion'': either a charged hadron or gamma
|
17 |
|
|
candidate with transverse momentum greater than 5 GeV$/c$. A large number of
|
18 |
friis |
1.9 |
QCD training events is required as both the leading pion selection and the
|
19 |
friis |
1.1 |
requirement that the decay mode match one of the dominant modes given in table
|
20 |
friis |
1.9 |
~\ref{tab:decay_modes} are effective discriminants. For each subsample,
|
21 |
friis |
1.10 |
80\% of the signal and background tau--candidates are used for training the neural
|
22 |
friis |
1.8 |
networks by the TMVA software, with half (40\%) used as a validation sample
|
23 |
friis |
1.9 |
used to ensure the neural network is not over--trained. The number of signal and background entries
|
24 |
|
|
used for training and validation in each decay mode subsample is given in table ~\ref{tab:trainingEvents}.
|
25 |
friis |
1.1 |
|
26 |
friis |
1.2 |
%Chained 100 signal files.
|
27 |
|
|
%Chained 208 background files.
|
28 |
|
|
%Total signal entries: 874266
|
29 |
|
|
%Total background entries: 9526176
|
30 |
|
|
%Pruning non-relevant entries.
|
31 |
|
|
%After pruning, 584895 signal and 644315 background entries remain.
|
32 |
|
|
%**********************************************************************************
|
33 |
|
|
%*********************************** Summary **************************************
|
34 |
|
|
%**********************************************************************************
|
35 |
|
|
%* NumEvents with weight > 0 (Total NumEvents) *
|
36 |
|
|
%*--------------------------------------------------------------------------------*
|
37 |
|
|
%*shrinkingConePFTauDecayModeProducer ThreeProngNoPiZero: Signal: 53257(53271) Background:155793(155841)
|
38 |
|
|
%*shrinkingConePFTauDecayModeProducer ThreeProngOnePiZero: Signal: 13340(13342) Background:135871(135942)
|
39 |
|
|
%*shrinkingConePFTauDecayModeProducer OneProngTwoPiZero: Signal: 34780(34799) Background:51181(51337)
|
40 |
|
|
%*shrinkingConePFTauDecayModeProducer OneProngOnePiZero: Signal: 136464(138171) Background:137739(139592)
|
41 |
|
|
%*shrinkingConePFTauDecayModeProducer OneProngNoPiZero: Signal: 300951(345312) Background:144204(161603)
|
42 |
|
|
|
43 |
friis |
1.1 |
\begin{table}
|
44 |
|
|
\centering
|
45 |
friis |
1.2 |
\begin{tabular}{lcc}
|
46 |
|
|
%\multirow{2}{*}{} & \multicolumn{2}{c}{Events} \\
|
47 |
|
|
& Signal & Background \\
|
48 |
friis |
1.1 |
\hline
|
49 |
friis |
1.2 |
Total number of tau--candidates & 874266 & 9526176 \\
|
50 |
|
|
Tau--candidates passing preselection & 584895 & 644315 \\
|
51 |
|
|
Tau--candidates with $W(P_T,\eta)>0$ & 538792 & 488917 \\
|
52 |
friis |
1.1 |
\hline
|
53 |
friis |
1.2 |
Decay Mode & \multicolumn{2}{c}{Training Events} \\
|
54 |
friis |
1.1 |
\hline
|
55 |
friis |
1.2 |
$\pi^{-}$ & 300951 & 144204 \\
|
56 |
|
|
$\pi^{-}\pi^0$ & 135464 & 137739 \\
|
57 |
|
|
$\pi^{-}\pi^0\pi^0$ & 34780 & 51181 \\
|
58 |
|
|
$\pi^{-}\pi^{-}\pi^{+}$ & 53247 & 155793 \\
|
59 |
|
|
$\pi^{-}\pi^{-}\pi^{+}\pi^0$ & 13340 & 135871 \\
|
60 |
friis |
1.1 |
\end{tabular}
|
61 |
|
|
\label{tab:trainingEvents}
|
62 |
friis |
1.9 |
\caption{Number of events used for neural network training and validation for each
|
63 |
friis |
1.1 |
selected decay mode.}
|
64 |
|
|
\end{table}
|
65 |
|
|
|
66 |
friis |
1.9 |
The remaining 20\% of the signal and background samples are
|
67 |
friis |
1.8 |
reserved as a statistically independent sample to evaluate the performance of
|
68 |
|
|
the neural nets after the training is completed. The TaNC uses the ``MLP''
|
69 |
|
|
neural network implementation provided by the TMVA software package, described
|
70 |
|
|
in ~\cite{TMVA}. The ``MLP'' classifier is a feed-forward artificial neural
|
71 |
|
|
network. There are two layers of hidden nodes and a single node in the output
|
72 |
|
|
layer. The hyperbolic tangent function is used for the neuron activation
|
73 |
|
|
function.
|
74 |
|
|
%The number of hidden nodes in the first and second layers are chosen
|
75 |
|
|
%according to Kolmogorov's theorem~\cite{kolmogorovsTheorem}; the number of
|
76 |
|
|
%hidden nodes in the first (second) layer is $N+1 (2N+1)$, where $N$ is the
|
77 |
|
|
%number of input observables.
|
78 |
|
|
The number of hidden nodes in the first (second) layers are chosen
|
79 |
friis |
1.9 |
to be $N+1 (2N+1)$, respectively, where $N$ is the
|
80 |
friis |
1.8 |
number of input observables. According to the Kolmogorov's theorem~\fixme{need to find cite}
|
81 |
|
|
The neural network is trained for 500 epochs. At
|
82 |
friis |
1.9 |
ten epoch intervals, the neural network error is computed using the validation sample to check for
|
83 |
friis |
1.8 |
overtraining (see figure~\ref{fig:overTrainCheck}). The neural network error
|
84 |
|
|
$E$ is defined~\cite{TMVA} as
|
85 |
|
|
|
86 |
friis |
1.5 |
\begin{equation}
|
87 |
|
|
E = \frac{1}{2} \sum_{i=1}^N (y_{ANN,i} - \hat y_i)^2
|
88 |
|
|
\label{eq:NNerrorFunc}
|
89 |
|
|
%note - not right for weighted dists?
|
90 |
|
|
\end{equation}
|
91 |
|
|
where $N$ is the number of training events, $y_{ANN,i}$ is the neural network output
|
92 |
|
|
for the $i$th training event, and $y_i$ is the desired (-1 for background, 1 for signal) output
|
93 |
friis |
1.7 |
the $i$th event. No evidence of over--training is observed.
|
94 |
friis |
1.4 |
|
95 |
friis |
1.8 |
\begin{figure}[thbp]
|
96 |
friis |
1.4 |
\setlength{\unitlength}{1mm}
|
97 |
|
|
\begin{center}
|
98 |
|
|
\begin{picture}(150, 195)(0,0)
|
99 |
|
|
\put(0.5, 130)
|
100 |
|
|
{\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_OneProngNoPiZero.pdf}}}
|
101 |
|
|
\put(65, 130)
|
102 |
|
|
{\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_OneProngOnePiZero.pdf}}}
|
103 |
|
|
\put(0.5, 65)
|
104 |
|
|
{\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_OneProngTwoPiZero.pdf}}}
|
105 |
|
|
\put(65, 65)
|
106 |
|
|
{\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_ThreeProngNoPiZero.pdf}}}
|
107 |
|
|
\put(33, 0)
|
108 |
|
|
{\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_ThreeProngOnePiZero.pdf}}}
|
109 |
|
|
\end{picture}
|
110 |
friis |
1.5 |
\caption{
|
111 |
friis |
1.6 |
Neural network classification error for training (solid red) and testing
|
112 |
|
|
(dashed blue) samples at ten epoch intervals over the 500 training epochs for each
|
113 |
friis |
1.5 |
decay mode neural network. The vertical axis represents the classification
|
114 |
|
|
error, defined by equation~\ref{eq:NNerrorFunc}. N.B. that the choice of
|
115 |
|
|
hyperbolic tangent for neuron activation functions results in the desired
|
116 |
friis |
1.10 |
outputs for signal and background to be 1 and -1, respectively. This results
|
117 |
friis |
1.5 |
in the computed neural network error being larger by a factor of four than
|
118 |
|
|
the case where the desired outputs are (0, 1). Classifier over--training
|
119 |
|
|
would be evidenced by divergence of the classification error of the training
|
120 |
|
|
and testing samples, indicating that the neural net was optimizing about
|
121 |
friis |
1.6 |
statistical fluctuations in the training sample.
|
122 |
friis |
1.4 |
}
|
123 |
|
|
\label{fig:overTrainCheck}
|
124 |
|
|
\end{center}
|
125 |
|
|
\end{figure}
|
126 |
|
|
|
127 |
friis |
1.1 |
|
128 |
friis |
1.9 |
The neural networks use as input observables the transverse momentum and $\eta$
|
129 |
|
|
of the tau--candidates. These observables are included as their correlations
|
130 |
|
|
with other observables can increase the separation power of the ensemble of
|
131 |
|
|
observables. For example, the opening angle in $\Delta R$ for signal
|
132 |
|
|
tau--candidates is inversely related to the transverse momentum, while for
|
133 |
|
|
background events the correlation is very small~\cite{DavisTau}. In the
|
134 |
|
|
training signal and background samples, there is significant discrimination
|
135 |
|
|
power in the $P_T$ spectrum. However, it is desirable to eliminate any
|
136 |
|
|
systematic dependence of the neural network output on $P_T$ and $\eta$, as in
|
137 |
|
|
practice the TaNC will be presented with tau--candidates whose $P_T-\eta$
|
138 |
|
|
spectrum will be analysis dependent. The dependence on $P_T$ and $\eta$ is
|
139 |
|
|
removed by applying a $P_T$ and $\eta$ dependent weight to the tau--candidates
|
140 |
|
|
when training the neural nets.
|
141 |
|
|
|
142 |
|
|
The weights are defined such that in any region in the vector space spanned by
|
143 |
|
|
$P_T$ and $\eta$ where the signal sample and background sample probability
|
144 |
|
|
density functions are different, the sample with higher probability density is
|
145 |
|
|
weighted such that the samples have identical $P_T-\eta$ probability
|
146 |
|
|
distributions. This removes regions of $P_T-\eta$ space where the training
|
147 |
|
|
sample is exclusively signal or background. The weights are computed according to
|
148 |
friis |
1.2 |
\begin{align*}
|
149 |
|
|
W(P_T, \eta) &= {\rm less}(p_{sig}(P_T, \eta), p_{bkg}(P_T, \eta))\\
|
150 |
|
|
w_{sig}(P_T, \eta) &= W(P_T, \eta)/p_{sig}(P_T, \eta) \\
|
151 |
|
|
w_{bkg}(P_T, \eta) &= W(P_T, \eta)/p_{bkg}(P_T, \eta)
|
152 |
|
|
\end{align*}
|
153 |
friis |
1.7 |
where $p_{sig}(P_T,\eta)$ and $p_{bkg}(P_T,\eta)$ are the probability densities of
|
154 |
friis |
1.10 |
the signal and background samples after the ``leading pion'' and dominant decay mode
|
155 |
friis |
1.2 |
selections. Figure~\ref{fig:nnTrainingWeights} shows the signal and background
|
156 |
|
|
training $P_T$ distributions before and after the weighting is applied.
|
157 |
|
|
|
158 |
|
|
|
159 |
friis |
1.8 |
\begin{figure}[thbp]
|
160 |
friis |
1.2 |
\setlength{\unitlength}{1mm}
|
161 |
|
|
\begin{center}
|
162 |
|
|
\begin{picture}(150,60)(0,0)
|
163 |
|
|
\put(10.5, 2){
|
164 |
|
|
\mbox{\includegraphics*[height=58mm]{figures/training_weights_unweighted.pdf}}}
|
165 |
|
|
\put(86.0, 2){
|
166 |
|
|
\mbox{\includegraphics*[height=58mm]{figures/training_weights_weighted.pdf}}}
|
167 |
|
|
%\put(-5.5, 112.5){\small (a)}
|
168 |
|
|
%\put(72.0, 112.5){\small (b)}
|
169 |
|
|
%\put(-5.5, 54.5){\small (c)}
|
170 |
|
|
%\put(72.0, 54.5){\small (d)}
|
171 |
|
|
\end{picture}
|
172 |
friis |
1.4 |
\caption{Transverse momentum spectrum of signal and background
|
173 |
friis |
1.2 |
tau--candidates used in neural net training before (left) and after (right) the
|
174 |
|
|
application of $P_T-\eta$ dependent weight function. Application of the weights
|
175 |
|
|
lowers the training significance of tau--candidates in regions of $P_T-\eta$
|
176 |
|
|
phase space where either the signal or background samples has an excess of
|
177 |
|
|
events. }
|
178 |
|
|
\label{fig:nnTrainingWeights}
|
179 |
|
|
\end{center}
|
180 |
|
|
\end{figure}
|
181 |
|
|
|