TancNote/note/transformation_derivation.tex

An artificial neural network maps a point in the space of input observables to
some value of neural network output $x$.  The neural network training error is
given by equation~\ref{eq:NNerrorFunc}.  A given point in the vector space
spanned by the neural network input observables (denoted as ``feature space'')
contributes to the neural network training error $E$ by 
\begin{equation}
   E' = (1 - x)^2\cdot\rho^\tau + x^2\cdot\rho^{QCD}
\end{equation}
where $\rho^\tau (\rho^{QCD})$ denotes the training sample density of the
$\tau$ signal and QCD--jet background at that point in feature space.

The value $x$ assigned by the neural network to this region in feature space
should satisfy the requirement of minimal error:
\begin{align}
   \frac{\partial E'}{\partial x} &= 0 \nonumber \\ 
   0 &= -2(1-x)\cdot\rho^\tau+2x\cdot\rho^{QCD} \nonumber \\ 
   x &= \frac{\rho^\tau} {\rho^\tau + \rho^{QCD}} \label{eq:probFracToX} \\ 
   \rho^\tau &= x(\rho^\tau + \rho^{QCD}) \nonumber \\ 
   \frac{\rho^{QCD}}{\rho^\tau} &= \frac{1}{x} - 1  \label{eq:rawTransformX}
\end{align}

N.B. that the ratio $\frac{\rho^{QCD}}{\rho^\tau}$ corresponds to the ratio of
the normalized probability density functions of signal and background input
observable distributions, i.e. $\int \rho^{\tau} d\vec x = 1$.

In the case of multiple neural networks, one can derive a formula that maps the
output $x_j$ of the neural network corresponding to decay mode $j$ according to
the ``prior probabilities'' $p_j^\tau (p_j^{QCD})$ for true $\tau$ lepton
hadronic decays (quark and gluon jets) to pass the preselection criteria and
be reconstructed with decay mode $j$.

By substituting $\rho^s \rightarrow \rho^s p_j^s$ for $s \in \{\tau, QCD\}$ in
equation~\ref{eq:probFracToX}, the output $x_j$ can be related to $p_j^\tau
(p_j^{QCD})$ by 
\begin{equation}
   x_j' = \frac{\rho^\tau \cdot p_j^\tau} 
   {\rho^\tau \cdot p_j^\tau + \rho^{QCD} \cdot p_j^{QCD} }
   = \frac{p_j^\tau} 
   {p_j^\tau + \frac{\rho^{QCD}}{\rho^\tau} \cdot p_j^{QCD} }
   \label{eq:probFracToXWithPriors}
\end{equation}

Substituting equation~\ref{eq:rawTransformX} into
equation~\ref{eq:probFracToXWithPriors} yields the transformation of the output
$x_j$ of the neural neural network corresponding to any selected decay mode $j$
to a single discriminator output $x_j'$ which for a given point on the optimal
performance curve should be independent of $j$.

\begin{equation}
   x_j' = \frac{p_j^\tau} 
   {p_j^\tau + \left(\frac{1}{x_j}-1\right)\cdot p_j^{QCD} }
\end{equation}


Revision:	1.2
Committed:	Wed Apr 28 22:15:10 2010 UTC (15 years ago) by friis
Content type:	application/x-tex
Branch:	MAIN
CVS Tags:	HEAD
Changes since 1.1:	+16 -16 lines
Log Message:	Final text tweaks
#	Content
1	An artificial neural network maps a point in the space of input observables to
2	some value of neural network output $x$. The neural network training error is
3	given by equation~\ref{eq:NNerrorFunc}. A given point in the vector space
4	spanned by the neural network input observables (denoted as ``feature space'')
5	contributes to the neural network training error $E$ by
6	\begin{equation}
7	E' = (1 - x)^2\cdot\rho^\tau + x^2\cdot\rho^{QCD}
8	\end{equation}
9	where $\rho^\tau (\rho^{QCD})$ denotes the training sample density of the
10	$\tau$ signal and QCD--jet background at that point in feature space.
11
12	The value $x$ assigned by the neural network to this region in feature space
13	should satisfy the requirement of minimal error:
14	\begin{align}
15	\frac{\partial E'}{\partial x} &= 0 \nonumber \\
16	0 &= -2(1-x)\cdot\rho^\tau+2x\cdot\rho^{QCD} \nonumber \\
17	x &= \frac{\rho^\tau} {\rho^\tau + \rho^{QCD}} \label{eq:probFracToX} \\
18	\rho^\tau &= x(\rho^\tau + \rho^{QCD}) \nonumber \\
19	\frac{\rho^{QCD}}{\rho^\tau} &= \frac{1}{x} - 1 \label{eq:rawTransformX}
20	\end{align}
21
22	N.B. that the ratio $\frac{\rho^{QCD}}{\rho^\tau}$ corresponds to the ratio of
23	the normalized probability density functions of signal and background input
24	observable distributions, i.e. $\int \rho^{\tau} d\vec x = 1$.
25
26	In the case of multiple neural networks, one can derive a formula that maps the
27	output $x_j$ of the neural network corresponding to decay mode $j$ according to
28	the ``prior probabilities'' $p_j^\tau (p_j^{QCD})$ for true $\tau$ lepton
29	hadronic decays (quark and gluon jets) to pass the preselection criteria and
30	be reconstructed with decay mode $j$.
31
32	By substituting $\rho^s \rightarrow \rho^s p_j^s$ for $s \in \{\tau, QCD\}$ in
33	equation~\ref{eq:probFracToX}, the output $x_j$ can be related to $p_j^\tau
34	(p_j^{QCD})$ by
35	\begin{equation}
36	x_j' = \frac{\rho^\tau \cdot p_j^\tau}
37	{\rho^\tau \cdot p_j^\tau + \rho^{QCD} \cdot p_j^{QCD} }
38	= \frac{p_j^\tau}
39	{p_j^\tau + \frac{\rho^{QCD}}{\rho^\tau} \cdot p_j^{QCD} }
40	\label{eq:probFracToXWithPriors}
41	\end{equation}
42
43	Substituting equation~\ref{eq:rawTransformX} into
44	equation~\ref{eq:probFracToXWithPriors} yields the transformation of the output
45	$x_j$ of the neural neural network corresponding to any selected decay mode $j$
46	to a single discriminator output $x_j'$ which for a given point on the optimal
47	performance curve should be independent of $j$.
48
49	\begin{equation}
50	x_j' = \frac{p_j^\tau}
51	{p_j^\tau + \left(\frac{1}{x_j}-1\right)\cdot p_j^{QCD} }
52	\end{equation}
53
54