1 |
friis |
1.1 |
The samples used to train the TaNC neural networks are typical of the signals
|
2 |
|
|
and backgrounds found in common physics analyses using taus. The signal--type
|
3 |
|
|
training sample is composed of reconstucted tau--candidates that are matched to
|
4 |
|
|
generator level hadronic tau decays coming from simulated $Z \rightarrow
|
5 |
|
|
\tau^{+}\tau^{-}$ events. The background training sample consists of reconstructed
|
6 |
friis |
1.2 |
tau--candidates in simulated QCD $2\rightarrow2$ hard scattering events. The
|
7 |
|
|
QCD $P_T$ spectrum is steeply falling, and to obtain sufficient statistics
|
8 |
|
|
across a broad range of $P_T$ the sample is split into different $\hat P_{T}$
|
9 |
|
|
bins. Each QCD sub-sample imposes a generator level cut on the transverse
|
10 |
|
|
energy of the hard interaction.
|
11 |
|
|
|
12 |
friis |
1.1 |
The signal and background samples are split into five subsamples corresponding
|
13 |
|
|
to each reconstructed decay mode. An additional selection is applied to each
|
14 |
|
|
subsample by requiring a ``leading pion'': either a charged hadron or gamma
|
15 |
|
|
candidate with transverse momentum greater than 5 GeV$/c$. A large number of
|
16 |
|
|
QCD training events is required as the leading pion selection and the
|
17 |
|
|
requirement that the decay mode match one of the dominant modes given in table
|
18 |
|
|
~\ref{tab:decay_modes} are both effective discriminants. For each subsample,
|
19 |
|
|
10000 signal and background tau--candidates are reserved to be used internally
|
20 |
|
|
by the TMVA software to test for over--training. The number of signal and
|
21 |
|
|
background entries used for each decay mode subsample is given in table
|
22 |
|
|
~\ref{tab:trainingEvents}.
|
23 |
|
|
|
24 |
friis |
1.2 |
%Chained 100 signal files.
|
25 |
|
|
%Chained 208 background files.
|
26 |
|
|
%Total signal entries: 874266
|
27 |
|
|
%Total background entries: 9526176
|
28 |
|
|
%Pruning non-relevant entries.
|
29 |
|
|
%After pruning, 584895 signal and 644315 background entries remain.
|
30 |
|
|
%**********************************************************************************
|
31 |
|
|
%*********************************** Summary **************************************
|
32 |
|
|
%**********************************************************************************
|
33 |
|
|
%* NumEvents with weight > 0 (Total NumEvents) *
|
34 |
|
|
%*--------------------------------------------------------------------------------*
|
35 |
|
|
%*shrinkingConePFTauDecayModeProducer ThreeProngNoPiZero: Signal: 53257(53271) Background:155793(155841)
|
36 |
|
|
%*shrinkingConePFTauDecayModeProducer ThreeProngOnePiZero: Signal: 13340(13342) Background:135871(135942)
|
37 |
|
|
%*shrinkingConePFTauDecayModeProducer OneProngTwoPiZero: Signal: 34780(34799) Background:51181(51337)
|
38 |
|
|
%*shrinkingConePFTauDecayModeProducer OneProngOnePiZero: Signal: 136464(138171) Background:137739(139592)
|
39 |
|
|
%*shrinkingConePFTauDecayModeProducer OneProngNoPiZero: Signal: 300951(345312) Background:144204(161603)
|
40 |
|
|
|
41 |
friis |
1.1 |
\begin{table}
|
42 |
|
|
\centering
|
43 |
friis |
1.2 |
\begin{tabular}{lcc}
|
44 |
|
|
%\multirow{2}{*}{} & \multicolumn{2}{c}{Events} \\
|
45 |
|
|
& Signal & Background \\
|
46 |
friis |
1.1 |
\hline
|
47 |
friis |
1.2 |
Total number of tau--candidates & 874266 & 9526176 \\
|
48 |
|
|
Tau--candidates passing preselection & 584895 & 644315 \\
|
49 |
|
|
Tau--candidates with $W(P_T,\eta)>0$ & 538792 & 488917 \\
|
50 |
friis |
1.1 |
\hline
|
51 |
friis |
1.2 |
Decay Mode & \multicolumn{2}{c}{Training Events} \\
|
52 |
friis |
1.1 |
\hline
|
53 |
friis |
1.2 |
$\pi^{-}$ & 300951 & 144204 \\
|
54 |
|
|
$\pi^{-}\pi^0$ & 135464 & 137739 \\
|
55 |
|
|
$\pi^{-}\pi^0\pi^0$ & 34780 & 51181 \\
|
56 |
|
|
$\pi^{-}\pi^{-}\pi^{+}$ & 53247 & 155793 \\
|
57 |
|
|
$\pi^{-}\pi^{-}\pi^{+}\pi^0$ & 13340 & 135871 \\
|
58 |
friis |
1.1 |
\end{tabular}
|
59 |
|
|
\label{tab:trainingEvents}
|
60 |
|
|
\caption{Number of events used for neural network training for each
|
61 |
|
|
selected decay mode.}
|
62 |
|
|
\end{table}
|
63 |
|
|
|
64 |
friis |
1.4 |
In both signal and background samples, 20\% of the events are reserved as a
|
65 |
|
|
statistically independent sample to evaluate the performance of the neural nets
|
66 |
|
|
after the training is completed. The TaNC uses the ``MLP'' neural network
|
67 |
friis |
1.5 |
implementation provided by the TMVA software package, described in
|
68 |
|
|
~\cite{TMVA}. The ``MLP'' classifier is a feed-forward artificial neural
|
69 |
|
|
network. There are two layers of hidden nodes and a single node in
|
70 |
|
|
the output layer. The hyperbolic tangent function is used for the neuron activation function.
|
71 |
|
|
The number of hidden nodes in the first and second layers
|
72 |
friis |
1.6 |
are chosen according to Kolmogorov's theorem~\cite{kolmogorovsTheorem}; the number of
|
73 |
|
|
hidden nodes in the first (second) layer is $N+1 (2N+1)$, where $N$ is the
|
74 |
friis |
1.5 |
number of input observables. The neural network is trained for 500 epochs. At
|
75 |
|
|
ten epoch intervals, the neural network error is computed to check for
|
76 |
friis |
1.6 |
overtraining (see figure~\ref{fig:overTrainCheck}). The neural network error $E$ is
|
77 |
friis |
1.5 |
defined~\cite{TMVA} as
|
78 |
|
|
\begin{equation}
|
79 |
|
|
E = \frac{1}{2} \sum_{i=1}^N (y_{ANN,i} - \hat y_i)^2
|
80 |
|
|
\label{eq:NNerrorFunc}
|
81 |
|
|
%note - not right for weighted dists?
|
82 |
|
|
\end{equation}
|
83 |
|
|
where $N$ is the number of training events, $y_{ANN,i}$ is the neural network output
|
84 |
|
|
for the $i$th training event, and $y_i$ is the desired (-1 for background, 1 for signal) output
|
85 |
|
|
the $i$th event. No evidence of overtraining is observed.
|
86 |
friis |
1.4 |
|
87 |
|
|
\begin{figure}[t]
|
88 |
|
|
\setlength{\unitlength}{1mm}
|
89 |
|
|
\begin{center}
|
90 |
|
|
\begin{picture}(150, 195)(0,0)
|
91 |
|
|
\put(0.5, 130)
|
92 |
|
|
{\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_OneProngNoPiZero.pdf}}}
|
93 |
|
|
\put(65, 130)
|
94 |
|
|
{\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_OneProngOnePiZero.pdf}}}
|
95 |
|
|
\put(0.5, 65)
|
96 |
|
|
{\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_OneProngTwoPiZero.pdf}}}
|
97 |
|
|
\put(65, 65)
|
98 |
|
|
{\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_ThreeProngNoPiZero.pdf}}}
|
99 |
|
|
\put(33, 0)
|
100 |
|
|
{\mbox{\includegraphics*[height=60mm]{figures/overtrainCheck_ThreeProngOnePiZero.pdf}}}
|
101 |
|
|
\end{picture}
|
102 |
friis |
1.5 |
\caption{
|
103 |
friis |
1.6 |
Neural network classification error for training (solid red) and testing
|
104 |
|
|
(dashed blue) samples at ten epoch intervals over the 500 training epochs for each
|
105 |
friis |
1.5 |
decay mode neural network. The vertical axis represents the classification
|
106 |
|
|
error, defined by equation~\ref{eq:NNerrorFunc}. N.B. that the choice of
|
107 |
|
|
hyperbolic tangent for neuron activation functions results in the desired
|
108 |
friis |
1.6 |
outputs for signal and background to be -1 and 1, respectively. This results
|
109 |
friis |
1.5 |
in the computed neural network error being larger by a factor of four than
|
110 |
|
|
the case where the desired outputs are (0, 1). Classifier over--training
|
111 |
|
|
would be evidenced by divergence of the classification error of the training
|
112 |
|
|
and testing samples, indicating that the neural net was optimizing about
|
113 |
friis |
1.6 |
statistical fluctuations in the training sample.
|
114 |
friis |
1.4 |
}
|
115 |
|
|
\label{fig:overTrainCheck}
|
116 |
|
|
\end{center}
|
117 |
|
|
\end{figure}
|
118 |
|
|
|
119 |
friis |
1.1 |
|
120 |
friis |
1.2 |
The neural nets uses as input variables the transverse momentum and $\eta$ of the
|
121 |
friis |
1.1 |
tau--candidates. These variables are included as their correlations with other
|
122 |
|
|
observables can increase the separation power of the ensemble of observables.
|
123 |
|
|
For example, the opening angle in $\Delta R$ for signal tau--candidates is
|
124 |
|
|
inversely related to the transverse momentum, while for background events the
|
125 |
friis |
1.6 |
correlation is very small~\cite{DavisTau}. In the training signal and
|
126 |
friis |
1.1 |
background samples, there is significant discrimination power in the $P_T$
|
127 |
|
|
spectrum. However, it is desirable to eliminate any systematic dependence of
|
128 |
friis |
1.2 |
the neural network output on $P_T$ and $\eta$, as in use the TaNC will be
|
129 |
|
|
presented with tau--candidates whose $P_T-\eta$ spectrum will be analysis
|
130 |
|
|
dependent. The dependence on $P_T$ and $\eta$ is removed by applying a $P_T$ and
|
131 |
|
|
$\eta$ dependent weight to the tau--candidates when training the neural nets.
|
132 |
|
|
|
133 |
|
|
The weights are defined such that in any region in $P_T-\eta$ where the signal
|
134 |
|
|
and background probability density function are different, the sample with
|
135 |
|
|
higher probability density is weighted such that the samples have identical
|
136 |
|
|
$P_T-\eta$ probability distributions. This removes regions of $P_T-\eta$ space
|
137 |
|
|
where the training sample is exclusively signal or background. The weights are
|
138 |
|
|
computed by
|
139 |
|
|
\begin{align*}
|
140 |
|
|
W(P_T, \eta) &= {\rm less}(p_{sig}(P_T, \eta), p_{bkg}(P_T, \eta))\\
|
141 |
|
|
w_{sig}(P_T, \eta) &= W(P_T, \eta)/p_{sig}(P_T, \eta) \\
|
142 |
|
|
w_{bkg}(P_T, \eta) &= W(P_T, \eta)/p_{bkg}(P_T, \eta)
|
143 |
|
|
\end{align*}
|
144 |
|
|
where $p_{sig}(P_T,\eta)$ and $p_{bkg}(P_T,\eta)$ are the probility densities of
|
145 |
|
|
the signal and background samples after the ``leading pion'' and decay mode
|
146 |
|
|
selections. Figure~\ref{fig:nnTrainingWeights} shows the signal and background
|
147 |
|
|
training $P_T$ distributions before and after the weighting is applied.
|
148 |
|
|
|
149 |
|
|
|
150 |
|
|
\begin{figure}[t]
|
151 |
|
|
\setlength{\unitlength}{1mm}
|
152 |
|
|
\begin{center}
|
153 |
|
|
\begin{picture}(150,60)(0,0)
|
154 |
|
|
\put(10.5, 2){
|
155 |
|
|
\mbox{\includegraphics*[height=58mm]{figures/training_weights_unweighted.pdf}}}
|
156 |
|
|
\put(86.0, 2){
|
157 |
|
|
\mbox{\includegraphics*[height=58mm]{figures/training_weights_weighted.pdf}}}
|
158 |
|
|
%\put(-5.5, 112.5){\small (a)}
|
159 |
|
|
%\put(72.0, 112.5){\small (b)}
|
160 |
|
|
%\put(-5.5, 54.5){\small (c)}
|
161 |
|
|
%\put(72.0, 54.5){\small (d)}
|
162 |
|
|
\end{picture}
|
163 |
friis |
1.4 |
\caption{Transverse momentum spectrum of signal and background
|
164 |
friis |
1.2 |
tau--candidates used in neural net training before (left) and after (right) the
|
165 |
|
|
application of $P_T-\eta$ dependent weight function. Application of the weights
|
166 |
|
|
lowers the training significance of tau--candidates in regions of $P_T-\eta$
|
167 |
|
|
phase space where either the signal or background samples has an excess of
|
168 |
|
|
events. }
|
169 |
|
|
\label{fig:nnTrainingWeights}
|
170 |
|
|
\end{center}
|
171 |
|
|
\end{figure}
|
172 |
|
|
|