%\VignetteIndexEntry{Introduction to the ddCt method for qRT-PCR data analysis: background, algorithm and example} %\VignetteDepends{Biobase,ddCt,lattice} %\VignetteKeywords{RT-PCR,ddCt,algorithm,statistics,visualization} %\VignettePackage{ddCt} \documentclass[11pt]{article} \usepackage{times} \usepackage{hyperref} %\usepackage{listings} %% windows server does not have listings \usepackage{geometry} \usepackage{longtable} \usepackage[pdftex]{graphicx} \SweaveOpts{keep.source=TRUE,eps=FALSE,pdf=TRUE,prefix=FALSE} \usepackage{natbib} \usepackage{times} %\textwidth=6.2in %\textheight=8.5in %\parskip=.3cm %\oddsidemargin=.1in %\evensidemargin=.1in %\headheight=-.3in \newcommand{\todo}[2]{\textit{\textbf{To do:} (#1) #2}} \newcommand{\fixme}[2]{\textit{\textbf{FIXME:} (#1) #2}} \newcommand{\scscst}{\scriptscriptstyle} \newcommand{\scst}{\scriptstyle} \newcommand{\R}[1]{{\textsf{#1}}} \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textit{#1}}} \newcommand{\Rclass}[1]{{\textit{#1}}} \newcommand{\Metas}[1]{{\texttt{#1}}} \newcommand{\myincfig}[3]{% \begin{figure}[htbp] \begin{center} \includegraphics[width=#2]{#1} \caption{\label{#1}#3} \end{center} \end{figure} } \def\TReg{\textsuperscript{\textregistered}} \def\TCop{\textsuperscript{\textcopyright}} \def\TTra{\textsuperscript{\texttrademark}} \def\Ct{$C_T$ } \def\dCt{$dC_T$ } \def\ddCt{$ddC_T$ } \bibliographystyle{plain} \begin{document} \title{ddCt method for qRT--PCR data analysis} \author{Jitao David Zhang, Markus Ruschhaupt and Rudolf Biczok} \maketitle \begin{abstract} Here we describe the $2^{-\Delta \Delta C_{T}}$ algorithm implemented in the \Rpackage{ddCt} package. The package is designed for the data analysis of quantitative real--time PCR (qRT--PCR) experiemtns in \R{Bioconductor}. With the \Rpackage{ddCt} package, one can acquire the relative expression of the target gene in different samples. This vignette mainly dicusses the principles of the ddCt algorithm and demonstrates the functionality of the package with a compact example. Another vignette in the package, \emph{rtPCR-usage}, gives instructions to call the script for end--to--end analysis. \end{abstract} Both absolute and relative quantification have been used to analyse the data from the quantitative real--time PCR (qRT--PCR, or RT--PCR for short) experiments. The $2^{-\Delta \Delta C_{T}}$ algorithm, also known as the the \emph{delta-delta-Ct} or \emph{ddCt} algorithm, is a convenient method to analyze the relative changes in gene expression~\cite{Livak2001402}. It requires the assignment of one or more housekeeping genes, which are assumed to be uniformly and constantly expressed in all samples, as well as one or more reference samples. The expression of other samples is then compared to that in the reference sample\footnote{The \Rpackage{qpcrNorm} package in the \R{Bioconductor} repository introduces the data--driven normalization method for high--throughput qPCR data, which does not depend on the house--keeping genes but makes extra assumptions. See the help pages and the vignette of the \Rpackage{qpcrNorm} package for further information}. \section{RT--PCR}\label{sec:principle} Rich background knowledge about RT--PCR can be found at~\cite{rtPCRtutorial}, ~\cite{rtPCRambion} and ~\cite{rtPCRtaqman}. There are many variations in the experimental processes. Here we shortly summarize the general key steps in the TaqMan\TReg assay to help the understanding of following discussions. \begin{enumerate} \item RNA preparation: total or specific type of RNA are extracted from cell lines, tissues, biopsies, etc. \item RNA is Reversed Transcribed into DNA, which is also known as the \emph{RT-reaction}. \item qPCR probes (sometimes also known as 'primers') are added to the transcribed cDNA sample and the polymerase chain reaction takes place. This probe is an oligonucleotide with a reporter dye attached to the 5' end and a quencher dye attached to the 3' end. Till the time the probe is not hydrolized, the quencher and the fluorophore remain in proximity to each other, which does not completely quench the flourescence of the reporter dye and therefore only a background flourescence is observed. \item During PCR, the probe anneals specifically between the forward and reverse primer to an internal region of the PCR product. The polymerase then carries out the extension of the primer and replicates the template to which the TaqMan\TReg is bound. The 5' exonuclease activity of the polymerase cleaves the probe, releasing the reporter molecule away from the close vicinity of the quencher. The fluorescence intensity of the reporter dye, as a result increases. This process repeats in every cycle. \item As the cycle number increases, the detected fluorescence also increases. And when the fluorescence crosses an arbitrary line, the device recodes the cycle number until then, which is known as the \Ct value. \end{enumerate} In princple one could also report the \Ct values of the housekeeping gene and the sample gene(s) in the form of barplots to show their relative relation. However, this has two main drawbacks: \begin{itemize} \item This is only applicable in cases where more than one genes are compared in the same sample. In case of mutilple samples one has to calculate the relative expression to a specified \emph{reference sample} \item \Ct value is exponential. In case of a ideal amplification efficiency of $1$, increase of the \Ct value by $1$ indicates a two--fold expression. Therefore, it maybe misleading to illustrate the expression with the raw \Ct value. \end{itemize} \section{The \emph{ddCt} Algorithm} The \emph{ddCt} method was one of the first methods used to to calculate real--time PCR results. Different the standard curve ~\cite{rtPCRtutorial} and the Pfaffl method~\cite{Pfaffl2001}, \emph{ddCt} is an approximation method and makes various assumptions. However, it reduces lot of experiment effort by making these assumptions and is easy to implement, and in many cases they return results similarly to other non-approximation methods~\cite{Livak2001402}. \subsection{Deviation} The exponential amplification of the polymerase chain reaction (PCR) can be described by the equation~\ref{eq:expAmp}. \begin{equation} X_n = X_0 \times (1 + E_X)^n \label{eq:expAmp} \end{equation} \noindent where $X_n$ is the number of target molecules at cycle $n$ of the reaction, and $X_0$ is the number of target molecules initially. $E_x$ is the amplification efficiency of target amplification, and $n$ is the number of cycles. The threshold value (\Ct) records the fractional cycle number at which the fluorescence reaches a fixed threshold (see section~\ref{sec:principle}). Therefore \begin{equation} X_T = X_0 \times (1 + E_X)^{C_{T,X}} = K_X \label{eq:thr} \end{equation} \noindent where $X_T$ is the threshold number of target molecules, $C_{T,X}$ is the readout \Ct value, and $K_X$ is a constant. Similarly we can express the equation~\ref{eq:thr} for the endogenous reference gene (house-keeping genes) as \begin{equation} R_T = R_0 \times (1 + E_R)^{C_{T,R}} = K_R \label{eq:thrHKG} \end{equation} \noindent where $R_T$ is the threshold number of the reference molecules, $R_0$ is the initial number of reference molecules. $E_R$ is the efficiency of reference amplification, $C_{T,R}$ is the \Ct readout for the reference, and $K_R$ is a constant. Combining equation~\ref{eq:thr} and \ref{eq:thrHKG} we get \begin{equation} \frac{X_T}{R_T} = \frac{X_0 \times (1 + E_X)^{C_{T,X}}}{R_0 \times (1 + E_R)^{C_{T,R}}} = \frac{K_X}{K_R} = K \label{eq:XDivR} \end{equation} For qRT--PCR using TaqMan\TReg probes, the exact values of $X_T$ and $R_T$ depend on several factors including the chemistry of reporter dye, the sequence context effects on the fluorescence properties of the probe, the fficiency of probe cleavage, purity of the probe, and the setting of the fluorescence threshold~\cite{Livak2001402}. Therefor, the constant $K$ does not have to be equal to one. Assuming efficiencies of the target and the reference are the same, \begin{eqnarray} E_X = E_R = E \nonumber \\ \frac{X_0}{R_0} \times (1 + E)^{C_{T,X}-C_{T,R}} = K \end{eqnarray} \noindent or \begin{eqnarray} X_N \times (1 + E)^{\Delta C_T} = K \label{eq:deltaCt} \end{eqnarray} \noindent where $X_N$ is equal to the \emph{normalized} amount of target ($X_0/R_0$) and the \dCt is equal to the difference in the \Ct for target and reference ($C_{T,X}-C_{T,R}$). Equation~\ref{eq:deltaCt} can be rearranged as \begin{equation} X_N = K \times (1 + E)^{-\Delta C_T} \label{eq:deltaCtRe} \end{equation} The final step is to divide the $X_N$ in the equation~\ref{eq:deltaCtRe} for any sample $q$ by the reference sample (also known as the calibrator, \emph{cb}): \begin{equation} \frac{X_{N,q}}{X_{N,cb}} = \frac{K \times (1 + E)^{-\Delta C_{T,q}}}{K \times (1 + E)^{-\Delta C_{T,cb}}} = (1 + E)^{-\Delta \Delta C_T} \end{equation} \noindent Here $-\Delta \Delta C_T = -(\Delta C_{T,q} - \Delta C_{T,cb})$. For amplicons designed to be less than 150 basepairs and for which the primer and $Mg^{2+}$ concentration have been optimized, the efficiency $E$ is close to one. Therefore, the amount of target, normalized to the endogenous reference and relative to a reference sample, is given by \begin{equation} \mbox{amount of target} = 2^{-\Delta \Delta C_T} \end{equation} \textbf{Attention:} Note that for the \ddCt calculation to be valid, the amplification efficiencies of the target and reference must be approximately equal. \section{Application example} Here we show how to use the \Rpackage{ddCt} package by a short example. \subsection{File I/O setup} We have attached two SDS output files, 'Experiment1.txt' and 'Experiment2.txt', in the package directory. The sample annotation information is also provided as the tab-delimited text file 'sampleData.txt'. Any warning information (for example \emph{Undetermined} in reference sample) is saved as a text file specified by the parameter 'warningFile'. <>= library(Biobase) library(lattice) library(RColorBrewer) library(ddCt) datadir <- function(x) system.file("extdata", x, package="ddCt") savedir <- function(x) file.path(tempdir(), x) file.names <- c(datadir("Experiment1.txt"),datadir("Experiment2.txt")) info <- datadir("sampleData.txt") warningFile <- savedir("warnings.txt") @ \subsection{Reference sample and housekeeping gene} For the sake of simplicity, we choose \textbf{Sample1} and \textbf{Sample2} as reference samples (calibrators), and \textbf{Gene2} as the reference gene (housekeeping gene) respectively. This could happen, for example, if the \textbf{Sample1} and \textbf{Sample2} are untreated samples while the \textbf{Sample3} has been treated with certain drugs. And \textbf{Gene2} is a housekeeping gene which we assume is expressed constantly in all the samples. <>= name.reference.sample <- c("Sample1", "Sample2") name.reference.gene <- c("Gene2") @ Note that more than one reference sample or renference gene can be specified. \subsection{Read in data} \Rfunction{SDMFrame} function is called to read in experiment data.Optionally one could also read in the sample annotation, which is the \Robject{sampleInformation} object in the example. <>= library(Biobase) CtData <- SDMFrame(file.names) sampleInformation <- read.AnnotatedDataFrame(info,header=TRUE, row.names=NULL) @ Note that \Rfunction{SDMFrame} is able to accept one or more files as input. \subsection{Apply the \emph{ddCt} method} Next step we call \Rfunction{ddCtExpression} to perform \emph{ddCt} method on the data. <>= result <- ddCtExpression(CtData, calibrationSample=name.reference.sample, housekeepingGene=name.reference.gene, sampleInformation=sampleInformation, warningStream=warningFile) @ Please refer to the help page of \Robject{ddCtExpression-class} for the methods to access all the values calculated by the \emph{ddCt} method. For example the error (either standard deviation or median absolute deviation) of all replicates are accessible through <>= CtErr(result) @ \subsection{Visualization} \Rfunction{errBarchart} provides a simple way to visualize the experiment results with the modified \Rfunction{barchart} from the \Rpackage{lattice} package, as seen in the figure~\ref{fig:errbarchart} \begin{figure}[htbp] \begin{center} <>= br <- errBarchart(result) print(br) @ \caption{Barchart with error bars to visualize the expression fold change of target genes in samples. Each panel represents one target gene (Gene 1, 2 and 3 in this case), and the expression level in each sample is indicated by the height of bars. If one gene is not detected in one sample, a 'NA' symbol in grey will appear in the position of the bar (Gene3 in Sample4), which helps to differ from the situations where the expression is very low but not yet undetectable (for instance Gene3 in Sample3). See the help page of \Rfunction{errBarchart} for more details.} \label{fig:errbarchart} \end{center} \end{figure} \subsection{Write result to text file} Finally we can save the results as tab-delimited text files. <>= elistWrite(result,file=savedir("allValues.txt")) @ \section{Acknowledgement} We thank Florian Hahne, Wolfgang Huber, Andreas Buness and Stefan Wiemann for their suggestion and comments during the development of the package. The example data has been kindly provided by Ute Ernst and was produced in the Division of Molecular Genome Analysis, DKFZ Heidelberg, Germany. \newpage \section{Session Info} The script has been running in the following session: <>= toLatex(sessionInfo(), locale=FALSE) @ \bibliography{rtPCR}{} \end{document}