% -*- mode:Latex -*-
%\VignetteIndexEntry{Estimation of Local False Discovery Rates}
%\VignetteDepends{splines,stats,golubEsets,vsn}
%\VignetteKeywords{Gene expression analysis}
%\VignettePackage{twilight}

\documentclass[11pt,a4paper,fleqn]{report}

\usepackage{compdiag}
\usepackage{amsmath}
\usepackage[bf]{caption}
\setlength{\captionmargin}{30pt}
\fboxsep=2mm
\SweaveOpts{eps=false}
\parindent0mm
\parskip1ex

\newcommand{\Robject}[1]{{\texttt{#1}}}
\newcommand{\Rfunction}[1]{{\texttt{#1}}}
\newcommand{\Rpackage}[1]{{\textit{#1}}}
\newcommand{\Rclass}[1]{{\textit{#1}}}
\newcommand{\Rmethod}[1]{{\textit{#1}}}
\newcommand{\Rfunarg}[1]{{\texttt{#1}}}

\renewcommand{\textfraction}{0}

%---------------------------------------------------------------------------------%  
\title{Estimation of Local False Discovery Rates \bigskip \\ User's Guide to the Bioconductor Package \Rpackage{twilight}}

\author{Stefanie Scheid and Rainer Spang \bigskip \\
  \small email: \texttt{first.last@molgen.mpg.de}
  }

\reportnr{01}

\year{2004}

\abstract{This is the vignette of the Bioconductor compliant package \Rpackage{twilight}. We describe our implementation of a stochastic search algorithm to estimate the local false discovery rate. In addition, the package provides functions to test for differential gene expression in the common two-condition setting.}

\date{}
  
%---------------------------------------------------------------------------------%  
\begin{document}
\maketitle

<<eval=FALSE,echo=FALSE,results=hide>>=
q(save="no")

<<echo=FALSE,results=hide>>=
library(splines)
library(stats)
library(golubEsets)
library(vsn)
oldopt <- options(digits=3)
on.exit( {options(oldopt)} )
options(width=75)
@


%---------------------------------------------------------------------------------%  
%---------------------------------------------------------------------------------%  
\chapter{Introduction}

In a typical microarray setting with gene expression data observed under two conditions, the local false discovery rate describes the probability that a gene is not differentially expressed between the two conditions given its corrresponding observed score or $p$-value level. The resulting curve of $p$-values versus local false discovery rate offers an insight into the \textbf{twilight zone} between clear differential and clear non-differential gene expression. The Bioconductor compliant package \Rpackage{twilight} contains two main functions: Function \Rfunction{twilight.pval} performs a two-condition test on differences in means for a given input matrix or expression set (\Rclass{ExpressionSet}) and computes permutation based $p$-values. Function \Rfunction{twilight} performs the successive exclusion procedure described in Scheid and Spang (2004) \cite{scheid04} to estimate local false discovery rates and effect size distributions. The package is also described in a short application note \cite{scheid05}. From version 1.2.0 on, the package includes a permutation filtering algorithm introduced in Scheid and Spang (2006) \cite{scheid06}.


%---------------------------------------------------------------------------------%  
%---------------------------------------------------------------------------------%  
\section*{Acknowledgements}

This work was done within the context of the Berlin Center for Genome Based Bioinformatics (BCB), part of the German National Genome Network (NGFN), and supported by BMBF grants 031U109C and 03U117 of the German Federal Ministry of Education.


%---------------------------------------------------------------------------------%  
%---------------------------------------------------------------------------------%  
\chapter{Implemented functions}
\section{\texttt{twilight.pval}: Testing effect sizes}\label{sec_testing}

\fbox{
\begin{minipage}{0.95\textwidth}
\Rfunction{twilight.pval(xin, yin, method="fc", paired=FALSE, B=1000, yperm=NULL, balance=FALSE, quant.ci=0.95, s0=NULL, verbose=TRUE)}
\end{minipage}
}
\bigskip       
       
The input object \Rfunarg{xin} is either a pre-processed gene expression set of class \Rclass{ExpressionSet} or any data matrix where rows correspond to genes and columns to samples. Each sample was taken under one of two distinct conditions, for example under treatment A or treatment B. The functions in package \Rpackage{twilight} are not limited to microarray data only but can be applied to any two-sample data matrix. However, it is necessary for both expression set or numerical matrix that values are on \textbf{additive scale} like log or arsinh scale. The function does not check or transform the data to additive scale. The input vector \Rfunarg{yin} contains condition labels of the samples. Vector \Rfunarg{yin} has to be numeric and dichotomous. Note that in terms of \textit{under}- and \textit{over}-expression, the samples of the higher labeled condition are compared to the samples of the lower labeled condition.

We are given a preprocessed matrix for samples belonging to two distinct conditions A and B, and gene expression values on additive scale. For gene $i$ in the experiment ($i=1,\dots,N$), $\bar \alpha_i$ is the mean expression under condition A and $\bar \beta_i$ is the mean expression under condition B. To test the null hypothesis of no differential gene expression, function \Rfunction{twilight.pval} compares the mean expression levels $\bar \alpha_i$ and $\bar \beta_i$. The current version offers three test variants: The classical $t$-test uses score $T_i$ with
\begin{equation}
  T_i=\frac{\bar \alpha_i - \bar \beta_i }{s_i},
\end{equation}
where $s_i$ denotes the pooled standard deviation. The $t$-test is called with \Rfunarg{method="t"}.

The $t$-test score can be misleadingly high if $s_i$ is very small. To overcome this problem, the $Z$-test enlarges the denominator by a fudge factor $s_0$ \cite{tusher01}, \cite{efron01}:
\begin{equation}
  Z_i=\frac{\bar \alpha_i - \bar \beta_i}{s_i + s_0}.
\end{equation}
The $Z$-test is called with \Rfunarg{method="z"}. Fudge factor $s_0$ is set to \Rfunarg{s0=NULL} by default and is only evaluated if \Rfunarg{method="z"}. In that case, it is the median of the pooled standard deviations $s_1,\dots,s_N$. However, the fudge factor can be chosen manually. Note that if \Rfunarg{method="z"} is chosen with \Rfunarg{s0=0}, the test call is altered to \Rfunarg{method="t"}, the $t$-test as described above.

The third variant is based on log ratios only with score
\begin{equation}
  F_i=\bar \alpha_i - \bar \beta_i.
\end{equation}
The distribution of scores $F_i$ under the alternative is called \textit{effect size distribution}. With expression values on log or arsinh scale, $\exp (|F_i|)$ is an estimator for the fold change. We call $\exp (|F_i|)$ the \textit{fold change equivalent score} \cite{scheid04}. Note that the package contains a function for plotting the effect size distribution which is only available if function \Rfunction{twilight.pval} was run with \Rfunarg{method="fc"}, the fold change equivalent test.

Function \Rfunction{twilight.pval} handles paired and unpaired data. In the unpaired case (\Rfunarg{paired=FALSE}), only one microarray was hybridized for each patient, like in a treatment and control group setting. In the paired case (\Rfunarg{paired=TRUE}), we observed expression values of the same patient under both conditions. The typical example are before and after treatment experiments, where each patient's expression was measured twice. %'
       The input arguments \Rfunarg{xin} and \Rfunarg{yin} do not need to be ordered in a specific manner. It is only necessary that samples within each group have the same order, such that the first samples of the two groups represent the first pair and so on. However, the order of the samples in \Rfunarg{xin} has to equal the order in \Rfunarg{yin}.
       
As an example, we apply function \Rfunction{twilight.pval} on the training set of the leukemia data of Golub et al.~(1999) \cite{golub99} as given in \Rfunction{library(golubEsets)}. For normalization, apply the variance-stabilizing method \Rfunction{vsn} in \Rfunction{library(vsn)} \cite{huber02}.
<<results=hide>>=
data(Golub_Train)
golubNorm <- justvsn(Golub_Train)
id <- as.numeric(Golub_Train$ALL.AML) #$
@

There are \Sexpr{length(id)} samples either expressing acute lymphoblastic leukemia (ALL) or acute myeloid leukemia (AML). As the AML patients are labeled with ``2'' and ALL with ``1'', we compare AML to ALL expression.
<<>>=
Golub_Train$ALL.AML #$
id
@

Additionally to computation of scores, empirical $p$-values are calculated. Argument \Rfunarg{B} specifies the number of permutations with default set to \Rfunarg{B=1000}. The distribution of scores under the null hypothesis is estimated by computing test scores from the same input matrix with randomly permuted class labels. These permutations are either balanced or unbalanced, with default \Rfunarg{balance=FALSE}. The permutation options are described in detail in section \ref{twilight.combi}. For computing empirical $p$-values, we count for each gene how many of \textit{all} absolute permutation scores exceed the absolute observed score, and divide by \Rfunarg{B}$\cdot$(number of genes).

Permutation scores are also used to compute expected scores as described in Tusher et al.~(2001) \cite{tusher01}. In addition, we compute confidence bounds for the maximum absolute difference of each set of permutation scores to expected scores. The width of the confidence bound is chosen with \Rfunarg{quant.ci}. With default \Rfunarg{quant.ci=0.95}, the maximum absolute difference of permutation to expected scores exceeded the confidence bound in only 5\% of all permutations.

Using the optional argument \Rfunarg{yperm}, a user-specified permutation matrix can be passed to the function. In that case, \Rfunarg{yperm} has to be a \textit{binary} matrix where each row is one vector of permuted class labels. The label "1" in \Rfunarg{yperm} corresponds to the higher labeled original class. If the permutation matrix is specified, no other permutation is done and argument \Rfunarg{B} will be ignored. Besides \Rfunction{set.seed}, argument \Rfunarg{yperm} can be used to reproduce results by fixing the matrix of random permutations. Please note that the first row of \Rfunarg{yperm} must be the input vector \Rfunarg{yin}. Otherwise, the $p$-value calculation will be incorrect.

Continuing the example above, we perform a fold change test on the expression data in \Robject{golubNorm} which was transformed to arsinh scale by normalization with \Rfunction{vsn}. We do a quick example with few permutations.
<<>>=
library(twilight)
pval <- twilight.pval(golubNorm, id, B=100)
@

The function checks whether complete enumeration of all permutations is possible. Complete enumeration is performed as long as the number of permutations does not exceed the value set by \Rfunarg{B}. Thus, if you want to turn off the compulsive enumeration and use all possible permutations, you need to select a small \Rfunarg{B} or simply keep the default \Rfunarg{B=1000}. Details on the enumeration functions are given in section \ref{twilight.combi}.
The values in the accompanying data set \Robject{expval} were computed in the same manner as in the example above but with the complete data set \Rfunction{data(Golub\_Merge)} in \Rfunction{library(golubEsets)} and 1000 permutations.
<<>>=
data(expval)
expval
@

The output object of function \Rfunction{twilight.pval} is of class \Rclass{twilight} with several elements stored in a list.
<<>>=
class(expval)
names(expval)
@

The element \Robject{quant.ci} contains the corresponding input value which is passed to the plotting function. Element \Robject{ci.line} is used for plotting confidence bounds and contains the computed quantile of maximum absolute differences.  The output dataframe \Robject{result} contains a matrix with several columns.
<<>>=
names(expval$result) #$
@

The dataframe stores observed and expected scores and corresponding empirical $p$-values. The genes are ordered by absolute observed test scores. Genes with observed score exceeding the confidence bounds are marked as ``1'' in the binary vector \Robject{result\$candidate}. The output object is passed to function \Rfunction{plot.twilight} to produce a plot as in Tusher et al.~(2001) \cite{tusher01} with additional confidence lines and genes marked as candidates, see Figure \ref{fig_scores}.
{\small
<<>>=
expval$result[1:7,1:5] #$
@
}
 
<<echo=FALSE,results=hide>>=
bitmap(file="tr_2004_01-scores.png",width=6,height=4.5,pointsize=10)
plot(expval,which="scores",grayscale=F,legend=F)
dev.off()
bitmap(file="tr_2004_01-qvalues.png",width=6,height=4.5,pointsize=10)
plot(expval,which="qvalues")
dev.off()
@

      
\begin{figure}[tp]
\centering
\includegraphics[width=0.7\textwidth]{tr_2004_01-scores.png}
\caption{Expected versus observed test scores. Deviation from the diagonal line gives evidence for differential expression. The red lines mark the 95\% confidence interval on the absolute difference between oberved and expected scores. The plotting call is \texttt{plot(expval,which="scores",grayscale=F,legend=F)}.}\label{fig_scores}
\end{figure}


In addition, $q$-values and the estimated percentage of non-induced genes $\pi_0$ are computed as described in Remark B of Storey and Tibshirani (2003) \cite{storey03}. These are stored in \Robject{result\$qvalue} (see above) and \Robject{pi0}. The remaing output elements of \Robject{expval} are left free to be filled by function \Rfunction{twilight}. With \Rfunarg{"qvalues"}, Figure \ref{fig_qvalues} shows the plot of $q$-values against the corresponding number of rejected hypotheses.
<<>>=
expval$pi0 #$
@

\begin{figure}[tp]
\centering
\includegraphics[width=0.7\textwidth]{tr_2004_01-qvalues.png}
\caption{Stairplot of $q$-values against the resulting size of the list of significant genes. A list containing all genes with $q \leq q_0$ has an estimated global false discovery rate of $q_0$. The plotting call is \texttt{plot(expval,which="qvalues")}.}\label{fig_qvalues}
\end{figure}

Column \Robject{result\$index} contains the original gene ordering of the input object. With these numbers, resorting of the \Robject{result} table is possible without knowing the original order of the row names.


%---------------------------------------------------------------------------------%  
%---------------------------------------------------------------------------------%  
\clearpage
\section{\texttt{twilight.pval}: Testing correlation}\label{sec_correlation}

\fbox{
\begin{minipage}{0.95\textwidth}
\Rfunction{twilight.pval(xin, yin, method="fc", B=1000, yperm=NULL, quant.ci=0.95, verbose=TRUE)}
\end{minipage}
}
\bigskip       

From version 1.1.0 on, function \Rfunction{twilight.pval} offers the computation of correlation scores instead of effect size scores. Now, vector \Rfunarg{yin} can be any clinical parameter consisting of numerical values and having length equal to the number of samples. With \Rfunarg{method="pearson"}, Pearson's coefficient of correlation to \Rfunarg{yin} is computed for every gene in \Rfunarg{xin}. With \Rfunarg{method="spearman"}, \Rfunarg{yin} and the rows of \Rfunarg{xin} are converted into ranks and Spearman's rank correlation is computed.

Note that most input arguments of \Rfunction{twilight.pval} will be ignored. Only \Rfunarg{B} takes effect and causes the computation of $p$-values based on \Rfunarg{B} random permutations of \Rfunarg{yin}. A matrix of user-specified permutations can be passed on using argument \Rfunarg{yperm}. Here, each row has to contain a permutation of \Rfunarg{yin}. Note that the values in \Rfunarg{yperm} have to be changed to ranks beforehand if Spearman correlation is to be computed. Please note that the first row of \Rfunarg{yperm} must be the input vector \Rfunarg{yin} (probably changed into ranks). Otherwise, the $p$-value calculation will be incorrect.

All successive analyses like expected scores, $p$- and $q$-values are kept as before. As an illustration, we search for genes with high correlation to the highest scoring gene found in the effect size test. Figure \ref{fig_corr} displays the resulting scores.

<<>>=
gene <- exprs(golubNorm)[pval$result$index[1],]
corr <- twilight.pval(golubNorm,gene,method="spearman",quant.ci=0.99,B=100)
corr
@

Note that the overall percentage of non-induced genes $\pi_0$ is now interpreted as the overall percentage of genes not correlated to the clinical parameter under the null hypothesis.

{\small
<<>>=
corr$result[1:10,1:5] #$
@
}
 

<<echo=FALSE,results=hide>>=
bitmap(file="tr_2004_01-corr.png",width=6,height=4.5,pointsize=10)
plot(corr,which="scores",grayscale=F,legend=F)
dev.off()
@

\begin{figure}[tp]
\centering
\includegraphics[width=0.7\textwidth]{tr_2004_01-corr.png}
\caption{Expected versus observed Spearman correlation scores. Deviation from the diagonal line gives evidence for significant correlation. The red lines mark the 99\% confidence interval on the absolute difference between oberved and expected scores. The plotting call is \texttt{plot(corr,which="scores",grayscale=F,legend=F)}.}\label{fig_corr}
\end{figure}


%---------------------------------------------------------------------------------%  
%---------------------------------------------------------------------------------%  
\clearpage
\section{\texttt{twilight.filtering}: Filtering permutations}\label{sec_filtering}

\fbox{
\begin{minipage}{0.95\textwidth}
\Rfunction{twilight.pval(..., filtering = FALSE)}\\[0.3cm]
\Rfunction{twilight.filtering(xin, yin, method = "fc", paired = FALSE, s0 = 0, verbose = TRUE, num.perm = 1000, num.take = 50)}
\end{minipage}
}
\bigskip       

From version 1.2.0 on, we included a permutation filtering algorithm introduced in Scheid and Spang (2006) \cite{scheid06}. In a permutation based test approach, each permutation of the given class labels is thought to reflect the complete null model. However, in applications to real biological data, we often observe that certain permutations produce score distributions that still have larger margins than expected. Therefore, we treat each permutation as the original labeling, transform the permutation scores to pooled $p$-values and test the resulting distribution for uniformity. In an iterative search, we filter for a set of permutations whose $p$-value distributions fit well to a uniform distribution.

The filtering is added as an optional argument in function \Rfunction{twilight.pval}. Although large parts of the algorithm are written in C, the filtering is still time-consuming. Therefore, the default within \Rfunction{twilight.pval} is set to \Rfunarg{FALSE}. If \Rfunarg{filtering=TRUE}, the filtering is called internally with all the test parameters as given by the user. The only exception is the balancing parameter: The filtering is done on unbalanced permutations however \Rfunarg{balance} is specified. Balancing is just a simplier way to select a set of permutations that are not too close to the given labeling. However, this will not remove other sources of deviation from a complete null distribution as the filtering does.

Calling the filtering within \Rfunction{twilight.pval} is very convenient. If one wants to further examine the filtered permutations, function \Rfunarg{twilight.filtering} can be called directly. Most input arguments equal those of function \Rfunarg{twilight.pval}, see Sections \ref{sec_testing} and \ref{sec_correlation} for details on the different methods and formats. Only the fugde factor \Rfunarg{s0} differs. It takes effect only if \Rfunarg{method="z"} and is computed as the median pooled standard deviation if \Rfunarg{s0=0}.

The two input arguments \Rfunarg{num.perm} and \Rfunarg{num.take} are important. The first one is the number of wanted permutations. Within \Rfunction{twilight.pval}, it is set to \Rfunarg{B}. The argument \Rfunarg{num.take} specifies the number of valid permutations that are kept in each step of the iteration. Within each step, this number increases by \Rfunarg{num.take}. Hence, \Rfunarg{num.take} might be chosen such that \Rfunarg{num.take} is a divisor of \Rfunarg{num.perm}. Within \Rfunction{twilight.pval}, \Rfunarg{num.take} is set to the minimum of 50 and \Rfunarg{num.perm}/20.       

The output of function \Rfunction{twilight.filtering} is a matrix with the filtered permutations of \Rfunarg{yin} in rows. The number of rows is approximately \Rfunarg{num.perm}. The permutations are checked for uniqueness. If the number of possible unique permutations is less than \Rfunarg{num.perm}, the algorithm stops earlier. In this case, the result is likely to be just the set of all possible permutations and it is not sure whether all of these really produce uniform $p$-value distributions. Here, it is advisable to lower \Rfunarg{num.perm}.

The format of the output object complies with the needed format of the input argument \Rfunarg{yperm} in function \Rfunction{twilight.pval} where two-condition labels are binarized or numerical values are changed to ranks if \Rfunarg{method="spearman"}. Please note that the first row of the matrix always contains the original labeling \Rfunarg{yin} to be consistent with the other permutation functions described in Section \ref{twilight.combi}.
       
As an illustration, we proceed with a quick example of permutation filtering. We perform the filtering on log ratio scores and only filter for 50 permutations in steps of 10.

<<>>=
yperm <- twilight.filtering(golubNorm,id,method="fc",num.perm=50,num.take=10)
dim(yperm)
@

The filtering leads to a random subset of possible permutations. Next, we check whether one of these permutations really produces a uniform $p$-value distribution. As the first row of \Rfunarg{yperm} has to contain the labeling for which the $p$-values will be computed, we have to remove the current first row which is the original labeling \Rfunarg{yin}. Thus, we compute $p$-values for the first ``random'' permutation. The resulting histogram is shown in Figure \ref{fig_histo}.

<<>>=
yperm <- yperm[-1,]
b <- twilight.pval(golubNorm,yperm[1,],method="fc",yperm=yperm)
hist(b$result$pvalue,col="gray",br=20)
@

<<echo=FALSE,results=hide>>=
bitmap(file="tr_2004_01-hist.png",width=6,height=4.5,pointsize=10)
hist(b$result$pvalue,col="gray",br=20,main="",xlab="P-value")
dev.off()
@

\begin{figure}[tp]
\centering
\includegraphics[width=0.7\textwidth]{tr_2004_01-hist.png}
\caption{Histogram of $p$-values of one filtered permutation. The $p$-values are computed from pooling the scores of the set of filtered permutations. The resulting distribution appears to be uniform.}\label{fig_histo}
\end{figure}


%---------------------------------------------------------------------------------%  
%---------------------------------------------------------------------------------%         
\clearpage
\section{\Rfunction{twilight}: Estimating the local FDR}

\fbox{
\begin{minipage}{0.95\textwidth}
\Rfunction{twilight(xin, lambda=NULL, B=0, boot.ci=0.95, clus=NULL, verbose=TRUE)}
\end{minipage}
}
\bigskip       

Local false discovery rates (fdr) are estimated from a simple mixture model given the density $f(t)$ of observed scores $T=t$:
\begin{equation}
f(t) = \pi_0 \, f_0(t) + (1-\pi_0) \, f_1(t) \quad \Rightarrow \quad \mbox{fdr}(t)=\pi_0 \, \frac{f_0(t)}{f(t)},
\end{equation}
where $\pi_0 \in [0,1]$ is the overall percentage of non-induced genes. Terms $f_0$ and $f_1$ are score densities under no induction and under induction respectively. Assume that there exists a transformation $W$ such that $U=W(T)$ is uniformly distributed in $[0,1]$ for all genes not differentially expressed. In a multiple testing scenario these $u$-values are $p$-values corresponding to the set of observed scores. However, we do not regard the local false discovery rate as a multiple error rate but as an exploratory tool to describe a microarray experiment over the whole range of significance.

Mapping scores to $p$-values allows to assume $f_0(p)$ to be the uniform density instead of specifying the null density $f_0(t)$ with respect to a chosen scoring method. The implemented successive exclusion procedure (SEP) splits any vector of $p$-values into a uniformly distributed null part and an alternative part. The uniform part represents genes that are not differentially expressed. The proportion of the uniform part to the total number of genes in the experiment is a natural estimator for percentage $\pi_0$. We apply a smoothed density estimate based on the histogram counts of the observed mixture to estimate $f(p)$. Assuming uniformity leads to $f_0(p)=1$ for all $p \in [0,1]$. Hence, the ratio of the estimates $\widehat{\pi_0}$ and $\hat f(p)$ estimates the local false discovery rate for a certain $p$-value level.
       
The successive exclusion procedure is described in detail in Scheid and Spang (2004) \cite{scheid04}. The functionality of \Rfunction{twilight} is not limited to microarray experiments. In principle, any vector of $p$-values can be passed to \Rfunction{twilight} as long as the assumption of uniformity under the null hypothesis is valid.

The objective function in \Rfunction{twilight} includes a penalty term that is controlled by the regularization parameter $\lambda \geq 0$. The regularization ensures that we find a separation such that the uniform part contains as many $p$-values as possible. As percentage $\pi_0$ is often underestimated, the inclusion of a penalty term results in a more ``conservative'' estimate that is usually less biased. If not specified (\Rfunarg{lambda=NULL}), function \Rfunction{twilight.getlambda} finds a suitable $\lambda$.

The estimates for probability $\pi_0$ and the local false discovery rate are averaged over 10 runs of SEP. In addition, bootstrapping can be performed to give bootstrap estimates and bootstrap percentile confidence intervals on both $\pi_0$ and the local false discovery rate. The number of bootstrap samples is set by argument \Rfunarg{B}, and the width of the bootstrap confidence interval is set by argument \Rfunarg{boot.ci}.

Function \Rfunction{twilight} takes \Rclass{twilight} objects or any vector of $p$-values as input and returns a \Rclass{twilight} object. If the input is of class \Rclass{twilight}, the function works on the set of empirical $p$-values and fills in the remaining output elements. Note that the estimate for $\pi_0$ is replaced, and $q$-values are recalculated with the new estimate $\pi_0$.
As an example, we run SEP with 1000 bootstrap samples and $95\%$ boostrap confidence intervals: \Rfunction{twilight(xin=expval, B=1000, boot.ci=0.95)}, as was done for data set \Robject{exfdr}.
<<>>=
data(exfdr)
exfdr

<<>>=
exfdr$result[1:5,6:9] #$
@

The output elements \Robject{result\$fdr}, \Robject{result\$mean.fdr}, \Robject{result\$lower.fdr} and \linebreak[5] \Robject{result\$upper.fdr} contain the estimated local false discovery rate, the bootstrap average and upper and lower bootstrap confidence bounds. These values are used to produce the following plots which are only available after application of function \Rfunction{twilight}. First, we plot $p$-values against the corresponding conditional probabilities of being induced given the $p$-value level, that is $1-\mbox{fdr}$, see Figure \ref{fig_fdr}. Going back to observed scores, we produce a \textit{volcano plot}, that is observed scores versus local false discovery rate, see Figure \ref{fig_volcano}.

<<echo=FALSE,results=hide>>=
bitmap(file="tr_2004_01-fdr.png",width=6,height=4.5,pointsize=10)
plot(exfdr,which="fdr",grayscale=F,legend=T)
dev.off()
bitmap(file="tr_2004_01-volcano.png",width=6,height=4.5,pointsize=10)
plot(exfdr,which="volcano")
dev.off()
bitmap(file="tr_2004_01-effectsize.png",width=6,height=4.5,pointsize=10)
plot(exfdr,which="effectsize",legend=T)
dev.off()
@

\begin{figure}[tp]
\centering
\includegraphics[width=0.7\textwidth]{tr_2004_01-fdr.png}
\caption{Curve of estimated local false discovery over $p$-values. The red lines denote the bootstrap mean (solid line) and the 95\% bootstrap confidence interval on the local false discovery rate (dashed lines). The bottom ticks are 1\% quantiles of $p$-values. The plotting call is  \texttt{plot(exfdr,which="fdr",grayscale=F,legend=T)}.}\label{fig_fdr}
\end{figure}

\begin{figure}[tp]
\centering
\includegraphics[width=0.7\textwidth]{tr_2004_01-volcano.png}
\caption{Volcano plot of observed test scores versus local false discovery rate. The bottom ticks are 1\% quantiles of observed scores. The plotting call is \texttt{plot(exfdr,which="volcano")}.}\label{fig_volcano}
\end{figure}

Output element \Robject{effect} contains histogram information about the effect size distribution, that is log ratio under the alternative. One run of the successive exclusion procedure results in a split of the input $p$-value vector into a null and an alternative part. We estimate the effect size distribution from the distribution of log ratio scores corresponding to $p$-values in the alternative part. Again, this estimate is averaged over 10 runs of the procedure. Argument \Rfunarg{which="effectsize"} produces the histogram of all observed log ratios overlaid with the averaged histogram of log ratios in the alternative, see Figure \ref{fig_effectsize}. The x-axis is changed to fold change equivalent scores or rather to increase in effect size. Given an observed log ratio $F$, the increase in effect size is $(\exp (|F|)-1) \cdot \mbox{sign} (F) \cdot 100\%$. A value of 0\% corresponds to no change (fold change of 1), a value of 50\% to fold change 1.5 and so on. A value of -100\% corresponds to a 2-fold down-regulation, that is fold change of 0.5.

\begin{figure}[tp]
\centering
\includegraphics[width=0.7\textwidth]{tr_2004_01-effectsize.png}
\caption{Observed effect size distribution (gray histogram) overlaid with the estimated effect size distribution under the null hypothesis (black histogram). The plotting call is \texttt{plot(exfdr,which="effectsize",legend=T)}.}\label{fig_effectsize}
\end{figure}

The last plotting argument \Rfunarg{which="table"} tabulates the histogram information in terms of fold change equivalent scores and log ratios.
<<>>=
tab <- plot(exfdr,which="table")
tab[1:8,]
@

The input argument \Rfunarg{clus} of function \Rfunction{twilight} is used to perform parallel computation within \Rfunction{twilight}. Parallelization saves computation time which is especially useful if the number of bootstrap samples \Rfunarg{B} is large. With default \Rfunarg{clus=NULL}, no parallelization is done. If specified, \Rfunarg{clus} is passed as input argument to \Rfunction{makeCluster} in \Rfunction{library(snow)}. Please make sure that \Rfunction{makeCluster(clus)} works properly in your environment.


%---------------------------------------------------------------------------------%  
%---------------------------------------------------------------------------------%  
\clearpage
\section{\Rfunction{twilight.combi}: Enumerating permutations of binary vectors}\label{twilight.combi}

\fbox{
\begin{minipage}{0.95\textwidth}
\Rfunction{twilight.combi(xin, pin, bin)}
\end{minipage}
}
\bigskip       

Function \Rfunction{twilight.combi} is used within \Rfunction{twilight.pval} to completely enumerate all permutations of a \textit{binary} input vector \Rfunarg{xin}. Argument \Rfunarg{pin} specifies whether the input vector corresponds to paired or unpaired data. Argument \Rfunarg{bin} specifies whether permutations are balanced or unbalanced. Note that the resulting permutations are always ``as balanced as possible'': The balancing is done for the smaller subsample. If its sample size is odd, say 7, \Rfunction{twilight.combi} computes all permutations with 3 and 4 samples unchanged.

As first example, compute all unbalanced permutations of an unpaired binary vector of length 5 with two zeros and three ones. The number of rows are 
\begin{equation}
m = \frac{5!}{2! \cdot 3!} = 10.
\end{equation}
<<>>=
x <- c(rep(0,2),rep(1,3))
x
@

<<>>=
twilight.combi(x,pin=FALSE,bin=FALSE)
@

Each row contains one permutation. The first row contains the input vector. In balanced permutations, we omit those rows where both original zeros have been shifted to the last three columns. The number of balanced rows is
\begin{equation}
m = {2 \choose 1} \cdot \frac{3!}{1! \cdot 2!} = 6.
\end{equation}
<<>>=
twilight.combi(x,pin=FALSE,bin=TRUE)
@
Note that the function returns six balanced rows \textit{and} the original input vector although it is not balanced.
       
Next, consider a paired input vector with four pairs. The first zero and the first one are the first pair and so on. In paired settings, values are flipped only within a pair. The number of rows is
\begin{equation}
m = \frac{1}{2} \cdot 2^4 = 2^3 = 8.
\end{equation}
<<>>=
y <- c(rep(0,4),rep(1,4))
y
@

<<>>=
twilight.combi(y,pin=TRUE,bin=FALSE)
@
       
The matrix above contains only half of all possible $2^4=16$ permutations. The reversed case \Rfunction{1 - twilight.combi(y, pin=TRUE, bin=FALSE)} is omitted as this will lead to the same absolute test scores as \Rfunction{twilight.combi(y, pin=TRUE, bin=FALSE)}. The same concept applies to balanced paired permutations. Now, two pairs are kept fixed and two pairs are flipped in each row. The number of balanced rows is
\begin{equation}
m = \frac{1}{2} \cdot {4 \choose 2} = 3.
\end{equation}
<<>>=
twilight.combi(y,pin=TRUE,bin=TRUE)
@
Again, the input vector is part of the output.

The complete enumeration of \Rfunction{twilight.combi} is limited by the sample sizes. The function returns \Rfunarg{NULL} if the resulting number of rows exceeds 10\,000. If \Rfunarg{NULL} is returned, function \Rfunction{twilight.pval} uses the functions \Rfunction{twilight.permute.unpair} and \Rfunction{twilight.permute.pair} which return a matrix of random permutations. For example, use the latter function to compute 7 balanced permutations of the paired vector \Rfunarg{y}. Similar to \Rfunction{twilight.combi}, these two functions return the input vector in the first row of their output matrices.
<<>>=
twilight.permute.pair(y,7,bal=TRUE)
@

<<echo=FALSE>>=
Sys.sleep(30)
@


%---------------------------------------------------------------------------------%  
%---------------------------------------------------------------------------------%  
\chapter{Differences to earlier versions}

\section*{Changes in version 1.11.1}

Usual bump in version numbering. In addition, functions work on \Rclass{ExpressionSet} objects, too.

\section*{Changes in version 1.9.2}

Bug-fix for computation of fudge factor in permutation scores. The estimated fudge factor $s_0$ will now be returned by functions \Rfunction{twilight.pval} and \Rfunction{twilight.teststat}.

\Rfunction{library(snow)} was set to comment in \Rfunction{twilight}. This was necessary to complete the new R checks under Bioconductor. Note that no checks under the current version of \Rfunction{library(snow)} are performed any more. If any problem occurs, please report this.


\section*{Changes in version 1.9.1}

New version number due to Bioconductor Release 1.8.


\section*{Changes in version 1.6.2}

Adapted to changes of data package \Rpackage{golubEsets}.


\section*{Changes in version 1.6.1}

It is now possible to directly compute observed test statistics via function \Rfunction{twilight.teststat}. Additional minor cosmetic changes in the plot function. 

Within \Rfunction{twilight.pval}, the complete enumeration depends now on the value of argument \Rfunarg{B}. If complete enumeration would lead to a larger number of permutations than \Rfunarg{B}, it is not done but \Rfunarg{B} random permutations are taken instead.


\section*{Changes in version 1.5.1 and 1.5.2}

Minor cosmetic changes. The jump in version numbers is due to Bioconductor's version bumping regime for packages in the release and in the developmental repository.


\section*{Changes in version 1.2.3}

We updated the bootstrapping procedure in \Rfunction{twilight.getlambda} to get a reliable value for the regularization parameter when $\pi_0$ is small and many genes are truly differentially expressed.


\section*{Changes in version 1.2.2}

Changes in the C code which do not effect the results.


\section*{Changes in version 1.2.1}

Bug fixed on the calculation of Hamming distances.


\section*{Changes in version 1.2.0}

We added the argument \Rfunarg{filtering = FALSE} to function \Rfunction{twilight.pval} which (if set to \Rfunarg{TRUE}) invokes the filtering for class label permutations that produce uniform $p$-value distributions. The set of admissible permutations is found using function \Rfunarg{twilight.filtering} which is called internally in function \Rfunarg{twilight.pval}. However, it can also be used directly.

We changed the local FDR estimation in function \Rfunction{twilight} slightly. Instead of estimating both densities $f_0$ and $f$ from the output of SEP, we rely on the uniform assumption such that $f_0(p)=1$ for all $p \in [0,1]$. Hence, the 10 runs of SEP lead to 10 estimates $\widehat{\pi_0}$. The average of these is taken as the final value which is mutliplied with the density estimate of the mixture density $f$.

The mixture density estimation of $f$ also changed slightly. Still, the estimates are based on smoothed histogram counts. To improve the estimation for very small $p$-values, the histogram bins were changed from equidistant to quantile bins.
       
       
\section*{Changes in version 1.1.0}

The computation of $p$-values in \Rfunction{twilight.pval} changed from gene-wise to pooled $p$-values. For the computation of a gene-wise $p$-value for gene $i$, only the permutation scores of gene $i$ are taken into account. For pooled $p$-values, all permutation scores of all genes are taken as null distribution. This change has several advantages: First, gene-wise $p$-values were not monotonically increasing with scores because each gene had its own null distribution. Thus, two genes with almost equal scores might get quite different $p$-values. Now, the null distribution is the same for all genes, that is the union set of all permutation scores. Second, pooled $p$-values are less granular than gene-wise $p$-values. Gene-wise $p$-values are computed from \Rfunarg{B} permutation scores whereas pooled $p$-values are computed from \Rfunarg{B} $\cdot$ (number of genes) scores.

These two important features gave rise to further changes: The ordering of the \Robject{result} table is now more intuitive because the most significant genes on top have the highest scores, the lowest $p$- and $q$-values and are candidates (if there are any). In addition, the default value of the number of permutations \Rfunarg{B} is lowered to 1000 permutations. Computation of pooled $p$-values is slower than for gene-wise $p$-values. On the other hand, changing to pooled $p$-values increases the number of values in the null distribution by the factor of number of genes. Hence, even with less permutations, the number of null values is larger than before.

We integrated Pearson and Spearman correlation coefficients into \Rfunction{twilight.pval}. Each gene is correlated to an numerical input vector. Expected scores are computed from random permutations of the input vector.
       
The \Robject{result} table contains an additional \Robject{index} column with genes indices which comes in handy for sorting back to original ordering.

All output matrices of the permutation functions \Rfunction{twilight.combi}, \Rfunction{twilight.permute.pair} and \Rfunction{twilight.permute.unpair} have the original labeling vector as first row. This is also the case if balanced permutations are wanted, although the input vector is not balanced. Hence, the permutation matrix within \Rfunction{twilight.pval} now includes the original labeling even for balanced permutations implying that the smallest \textit{possible} $p$-value is 1/(number of permutations).


\section*{Changes in version 1.0.3}

A \Rfunction{print.twilight} function was added which produces a short information about the contents stored in the \Robject{twilight} object.


\section*{Changes in version 1.0.2}

The \Rfunarg{which} argument of the plot command changed from \Rfunction{plot1} style to more intuitive labels like \Rfunarg{scores} or \Rfunarg{fdr}.

       
%---------------------------------------------------------------------------------%  
%---------------------------------------------------------------------------------%         
\begin{thebibliography}{1}

\bibitem{efron01}
B. Efron, R. Tibshirani, J.D. Storey and V.G. Tusher, "Empirical Bayes Analysis of a Microarray Experiment", \emph{J. Am. Stat. Assoc.}, vol. 96, no. 456, pp. 1151-1160, 2001.

\bibitem{huber02}
W. Huber, A. von Heydebreck, H. S{\"u}ltmann, A. Poustka and M. Vingron, ``Variance stabilization applied to microarray data calibration and to the quantification of differential expression'', \emph{Bioinformatics}, vol. 18, suppl. 1, pp. S96-S104, 2002.

\bibitem{golub99}
T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield and E.S. Lander, "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring", \emph{Science}, vol. 286, pp. 531-537, 1999.

\bibitem{scheid04}
S. Scheid and R. Spang, "A stochastic downhill search algorithm for estimating the local false discovery rate", \emph{IEEE Transactions on Computational Biology and Bioinformatics}, vol. 1, no. 3, pp. 98-108, 2004.

\bibitem{scheid05}
S. Scheid and R. Spang, "twilight; a Bioconductor package for estimating the local false discovery rate", \emph{Bioinformatics}, vol. 21, no. 12, pp. 2921-2922, 2005.
       
\bibitem{scheid06}
S. Scheid and R. Spang R, "Permutation filtering: A novel concept for significance analysis of large-scale genomic data", in: Apostolico A, Guerra C, Istrail S, Pevzner P, and Waterman M (Eds.): \emph{Research in Computational Molecular Biology: 10th Annual International Conference, Proceedings of RECOMB 2006, Venice, Italy, April 2-5, 2006}. Lecture Notes in Computer Science vol. 3909, Springer, Heidelberg, pp. 338-347, 2006.

\bibitem{storey03}
J.D. Storey and R. Tibshirani, ``Statistical significance for genomewide studies'', \textit{Proc. Natl. Acad. Sci.}, vol. 100, no. 16, pp. 9440-9445, 2003.

\bibitem{tusher01}
V. Tusher, R. Tibshirani and C. Chu, ``Significance analysis of microarrays applied to ionizing radiation response'', \textit{Proc. Natl. Acad. Sci.}, vol. 98, no. 9, pp. 5116-5121, 2001.

\end{thebibliography}


\end{document}