1 |
friis |
1.1 |
An artificial neural network maps a point in the space of input observables to
|
2 |
|
|
some value of neural network output $x$. The neural network training error is
|
3 |
|
|
given by equation~\ref{eq:NNerrorFunc}. A given point in the vector space
|
4 |
friis |
1.2 |
spanned by the neural network input observables (denoted as ``feature space'')
|
5 |
|
|
contributes to the neural network training error $E$ by
|
6 |
friis |
1.1 |
\begin{equation}
|
7 |
|
|
E' = (1 - x)^2\cdot\rho^\tau + x^2\cdot\rho^{QCD}
|
8 |
|
|
\end{equation}
|
9 |
friis |
1.2 |
where $\rho^\tau (\rho^{QCD})$ denotes the training sample density of the
|
10 |
|
|
$\tau$ signal and QCD--jet background at that point in feature space.
|
11 |
friis |
1.1 |
|
12 |
friis |
1.2 |
The value $x$ assigned by the neural network to this region in feature space
|
13 |
|
|
should satisfy the requirement of minimal error:
|
14 |
friis |
1.1 |
\begin{align}
|
15 |
|
|
\frac{\partial E'}{\partial x} &= 0 \nonumber \\
|
16 |
|
|
0 &= -2(1-x)\cdot\rho^\tau+2x\cdot\rho^{QCD} \nonumber \\
|
17 |
|
|
x &= \frac{\rho^\tau} {\rho^\tau + \rho^{QCD}} \label{eq:probFracToX} \\
|
18 |
|
|
\rho^\tau &= x(\rho^\tau + \rho^{QCD}) \nonumber \\
|
19 |
|
|
\frac{\rho^{QCD}}{\rho^\tau} &= \frac{1}{x} - 1 \label{eq:rawTransformX}
|
20 |
|
|
\end{align}
|
21 |
|
|
|
22 |
|
|
N.B. that the ratio $\frac{\rho^{QCD}}{\rho^\tau}$ corresponds to the ratio of
|
23 |
|
|
the normalized probability density functions of signal and background input
|
24 |
|
|
observable distributions, i.e. $\int \rho^{\tau} d\vec x = 1$.
|
25 |
|
|
|
26 |
|
|
In the case of multiple neural networks, one can derive a formula that maps the
|
27 |
|
|
output $x_j$ of the neural network corresponding to decay mode $j$ according to
|
28 |
friis |
1.2 |
the ``prior probabilities'' $p_j^\tau (p_j^{QCD})$ for true $\tau$ lepton
|
29 |
|
|
hadronic decays (quark and gluon jets) to pass the preselection criteria and
|
30 |
|
|
be reconstructed with decay mode $j$.
|
31 |
friis |
1.1 |
|
32 |
|
|
By substituting $\rho^s \rightarrow \rho^s p_j^s$ for $s \in \{\tau, QCD\}$ in
|
33 |
friis |
1.2 |
equation~\ref{eq:probFracToX}, the output $x_j$ can be related to $p_j^\tau
|
34 |
|
|
(p_j^{QCD})$ by
|
35 |
friis |
1.1 |
\begin{equation}
|
36 |
|
|
x_j' = \frac{\rho^\tau \cdot p_j^\tau}
|
37 |
|
|
{\rho^\tau \cdot p_j^\tau + \rho^{QCD} \cdot p_j^{QCD} }
|
38 |
|
|
= \frac{p_j^\tau}
|
39 |
|
|
{p_j^\tau + \frac{\rho^{QCD}}{\rho^\tau} \cdot p_j^{QCD} }
|
40 |
|
|
\label{eq:probFracToXWithPriors}
|
41 |
|
|
\end{equation}
|
42 |
|
|
|
43 |
friis |
1.2 |
Substituting equation~\ref{eq:rawTransformX} into
|
44 |
|
|
equation~\ref{eq:probFracToXWithPriors} yields the transformation of the output
|
45 |
|
|
$x_j$ of the neural neural network corresponding to any selected decay mode $j$
|
46 |
|
|
to a single discriminator output $x_j'$ which for a given point on the optimal
|
47 |
|
|
performance curve should be independent of $j$.
|
48 |
friis |
1.1 |
|
49 |
|
|
\begin{equation}
|
50 |
|
|
x_j' = \frac{p_j^\tau}
|
51 |
|
|
{p_j^\tau + \left(\frac{1}{x_j}-1\right)\cdot p_j^{QCD} }
|
52 |
|
|
\end{equation}
|
53 |
|
|
|
54 |
|
|
|