TancNote/note/transformation_derivation.tex

An artificial neural network maps a point in the space of input observables to
some value of neural network output $x$.  The neural network training error is
given by equation~\ref{eq:NNerrorFunc}.  A given point in the vector space
spanned by the neural network input observables (denoted as ``feature space'')
contributes to the neural network training error $E$ by 
\begin{equation}
   E' = (1 - x)^2\cdot\rho^\tau + x^2\cdot\rho^{QCD}
\end{equation}
where $\rho^\tau (\rho^{QCD})$ denotes the training sample density of the
$\tau$ signal and QCD--jet background at that point in feature space.

The value $x$ assigned by the neural network to this region in feature space
should satisfy the requirement of minimal error:
\begin{align}
   \frac{\partial E'}{\partial x} &= 0 \nonumber \\ 
   0 &= -2(1-x)\cdot\rho^\tau+2x\cdot\rho^{QCD} \nonumber \\ 
   x &= \frac{\rho^\tau} {\rho^\tau + \rho^{QCD}} \label{eq:probFracToX} \\ 
   \rho^\tau &= x(\rho^\tau + \rho^{QCD}) \nonumber \\ 
   \frac{\rho^{QCD}}{\rho^\tau} &= \frac{1}{x} - 1  \label{eq:rawTransformX}
\end{align}

N.B. that the ratio $\frac{\rho^{QCD}}{\rho^\tau}$ corresponds to the ratio of
the normalized probability density functions of signal and background input
observable distributions, i.e. $\int \rho^{\tau} d\vec x = 1$.

In the case of multiple neural networks, one can derive a formula that maps the
output $x_j$ of the neural network corresponding to decay mode $j$ according to
the ``prior probabilities'' $p_j^\tau (p_j^{QCD})$ for true $\tau$ lepton
hadronic decays (quark and gluon jets) to pass the preselection criteria and
be reconstructed with decay mode $j$.

By substituting $\rho^s \rightarrow \rho^s p_j^s$ for $s \in \{\tau, QCD\}$ in
equation~\ref{eq:probFracToX}, the output $x_j$ can be related to $p_j^\tau
(p_j^{QCD})$ by 
\begin{equation}
   x_j' = \frac{\rho^\tau \cdot p_j^\tau} 
   {\rho^\tau \cdot p_j^\tau + \rho^{QCD} \cdot p_j^{QCD} }
   = \frac{p_j^\tau} 
   {p_j^\tau + \frac{\rho^{QCD}}{\rho^\tau} \cdot p_j^{QCD} }
   \label{eq:probFracToXWithPriors}
\end{equation}

Substituting equation~\ref{eq:rawTransformX} into
equation~\ref{eq:probFracToXWithPriors} yields the transformation of the output
$x_j$ of the neural neural network corresponding to any selected decay mode $j$
to a single discriminator output $x_j'$ which for a given point on the optimal
performance curve should be independent of $j$.

\begin{equation}
   x_j' = \frac{p_j^\tau} 
   {p_j^\tau + \left(\frac{1}{x_j}-1\right)\cdot p_j^{QCD} }
\end{equation}


Revision:	1.2
Committed:	Wed Apr 28 22:15:10 2010 UTC (15 years ago) by friis
Content type:	application/x-tex
Branch:	MAIN
CVS Tags:	HEAD
Changes since 1.1:	+16 -16 lines
Log Message:	Final text tweaks
#	User	Rev	Content
1	friis	1.1	An artificial neural network maps a point in the space of input observables to
2			some value of neural network output $x$. The neural network training error is
3			given by equation~\ref{eq:NNerrorFunc}. A given point in the vector space
4	friis	1.2	spanned by the neural network input observables (denoted as ``feature space'')
5			contributes to the neural network training error $E$ by
6	friis	1.1	\begin{equation}
7			E' = (1 - x)^2\cdot\rho^\tau + x^2\cdot\rho^{QCD}
8			\end{equation}
9	friis	1.2	where $\rho^\tau (\rho^{QCD})$ denotes the training sample density of the
10			$\tau$ signal and QCD--jet background at that point in feature space.
11	friis	1.1
12	friis	1.2	The value $x$ assigned by the neural network to this region in feature space
13			should satisfy the requirement of minimal error:
14	friis	1.1	\begin{align}
15			\frac{\partial E'}{\partial x} &= 0 \nonumber \\
16			0 &= -2(1-x)\cdot\rho^\tau+2x\cdot\rho^{QCD} \nonumber \\
17			x &= \frac{\rho^\tau} {\rho^\tau + \rho^{QCD}} \label{eq:probFracToX} \\
18			\rho^\tau &= x(\rho^\tau + \rho^{QCD}) \nonumber \\
19			\frac{\rho^{QCD}}{\rho^\tau} &= \frac{1}{x} - 1 \label{eq:rawTransformX}
20			\end{align}
21
22			N.B. that the ratio $\frac{\rho^{QCD}}{\rho^\tau}$ corresponds to the ratio of
23			the normalized probability density functions of signal and background input
24			observable distributions, i.e. $\int \rho^{\tau} d\vec x = 1$.
25
26			In the case of multiple neural networks, one can derive a formula that maps the
27			output $x_j$ of the neural network corresponding to decay mode $j$ according to
28	friis	1.2	the ``prior probabilities'' $p_j^\tau (p_j^{QCD})$ for true $\tau$ lepton
29			hadronic decays (quark and gluon jets) to pass the preselection criteria and
30			be reconstructed with decay mode $j$.
31	friis	1.1
32			By substituting $\rho^s \rightarrow \rho^s p_j^s$ for $s \in \{\tau, QCD\}$ in
33	friis	1.2	equation~\ref{eq:probFracToX}, the output $x_j$ can be related to $p_j^\tau
34			(p_j^{QCD})$ by
35	friis	1.1	\begin{equation}
36			x_j' = \frac{\rho^\tau \cdot p_j^\tau}
37			{\rho^\tau \cdot p_j^\tau + \rho^{QCD} \cdot p_j^{QCD} }
38			= \frac{p_j^\tau}
39			{p_j^\tau + \frac{\rho^{QCD}}{\rho^\tau} \cdot p_j^{QCD} }
40			\label{eq:probFracToXWithPriors}
41			\end{equation}
42
43	friis	1.2	Substituting equation~\ref{eq:rawTransformX} into
44			equation~\ref{eq:probFracToXWithPriors} yields the transformation of the output
45			$x_j$ of the neural neural network corresponding to any selected decay mode $j$
46			to a single discriminator output $x_j'$ which for a given point on the optimal
47			performance curve should be independent of $j$.
48	friis	1.1
49			\begin{equation}
50			x_j' = \frac{p_j^\tau}
51			{p_j^\tau + \left(\frac{1}{x_j}-1\right)\cdot p_j^{QCD} }
52			\end{equation}
53
54