%\VignetteIndexEntry{07 Read Counting in RNA-seq -- Exercises}
%\VignetteEngine{knitr::knitr}

\documentclass{article}

<<style, eval=TRUE, echo=FALSE, results='asis'>>=
options(max.print=1000)
stopifnot(BiocInstaller::biocVersion() == "2.13")
BiocStyle::latex()
library(knitr)
opts_chunk$set(cache=TRUE, tidy=FALSE)
@ 

\title{Practical: Read Counting in RNA-seq}
\author{Herv\'e Pag\`es (\url{hpages@fhcrc.org})}
\date{5 February 2014}

\newcommand{\Hsap}{\emph{H.~sapiens}}
\newcommand{\Dmel}{\emph{D.~melanogaster}}

\usepackage{theorem}
\newtheorem{Ext}{Exercise}
\newenvironment{Exercise}{
  \renewcommand{\labelenumi}{\alph{enumi}.}\begin{Ext}%
}{\end{Ext}}
\newenvironment{Solution}{%
  \noindent\textbf{Solution:}\renewcommand{\labelenumi}{\alph{enumi}.}%
}{\bigskip}

\setlength{\abovecaptionskip}{6pt}
\setlength{\belowcaptionskip}{6pt}

\begin{document}

\maketitle
\tableofcontents


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Introduction}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


In the context of a high-throughput sequencing experiment, \emph{counting
the reads} means counting the number of reads per gene (or exon). It is
usually the preliminary step to a \emph{differential analysis} at the
gene level (or exon level).

The result of this counting will typically be organized as a matrix
where:
\begin{itemize}
  \item each row represents a gene (or exon);
  \item each column represents a sequencing run (usually a given sample);
  \item and each value is the raw number of reads from the sequencing run
        that were \emph{assigned} to the gene (or exon).
\end{itemize}

Different criteria can be used for assigning reads to genes. A common
one is to assign a read to a gene if the aligned read overlaps with that
gene and with that gene only.

\medskip

In this practical we learn how aligned reads stored in BAM files can be
counted with the \Rcode{summarizeOverlaps()} function from the
\Rpackage{GenomicRanges} package. With this function, the criteria used
for assigning reads to genes is controlled via 2 arguments: the \Rcode{mode}
and \Rcode{inter.feature} arguments. In addition to the man page for
\Rcode{summarizeOverlaps()}, the ``Counting reads with summarizeOverlaps''
vignette (located in the \Rpackage{GenomicRanges} package) is recommended
reading if you're planning to use this function for your work.

The output of \Rcode{summarizeOverlaps()} will be a
\Rclass{SummarizedExperiment} object containing the matrix of counts
together with information about the genes (or exons) in the \Rcode{rowData}
component and about the samples (e.g. patient ID, treatment, etc...) in
the \Rcode{colData} component. This object will be suitable input to
the \Rpackage{DESeq2} package for performing a \emph{differential analysis}.

\medskip

IMPORTANT NOTE: Starting with the upcoming version of \Bioconductor{}
(BioC 2.14, scheduled for April 2014), the \Rcode{summarizeOverlaps()}
function and its vignette will be located in the new
\Rpackage{GenomicAlignments} package.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{First look at some precomputed read counts}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


Before we do our own read counting, we start by having a quick look at some
precomputed counts so we get an idea of what a \Rclass{SummarizedExperiment}
object looks like.

The \Rpackage{parathyroidSE} package contains RNA-seq data from the
publication of Haglund et al. \cite{Haglund2012Evidence}.
The \emph{paired-end} sequencing was performed on primary cultures from
parathyroid tumors of 4 patients at 2 time points over 3 conditions
(control, treatment with diarylpropionitrile (DPN) and treatment with
4-hydroxytamoxifen (OHT)). DPN is a selective estrogen receptor $\beta$ 1
agonist and OHT is a selective estrogen receptor modulator.
One sample (patient 4, 24 hours, control) was omitted by the paper
authors due to low quality.

The \Rpackage{parathyroidSE} package contains several data sets. One of them
is the \Rcode{parathyroidGenesSE} data set which contains the counts of reads
per gene.

\begin{Exercise}
  In this exercise, we load the \Rcode{parathyroidGenesSE} data set
  from the \Rpackage{parathyroidSE} package and perform some basic
  manipulations on it.
  \begin{enumerate}
    \item Load the \Rcode{parathyroidGenesSE} data set from the
          \Rpackage{parathyroidSE} package.

          What's the class of this object? What are its dimensions?

    \item The information in a \Rclass{SummarizedExperiment} object can
          be accessed with accessor functions. For example, to get the
          actual data (i.e., here, the read counts), we use the
          \Rcode{assay()} function.

          What's returned by \Rcode{assay()}? What are its dimensions.
          Display the top left corner of it (e.g. first 8 rows and columns).
          Does it have row names? Column names? What are the row names?

    \item In this matrix of read counts, each row represents an Ensembl gene,
          each column a sequencing run, and the values are the raw numbers
          of reads in each sequencing run that were assigned to the
          respective gene.

          How many reads were assigned to a gene in each sequencing run?
          How many genes have non-zero counts?

    \item Use \Rcode{rowData()} on \Rcode{parathyroidGenesSE}. What do you
          get? What's its length?

    \item Use \Rcode{colData()} on \Rcode{parathyroidGenesSE}. What do
          you get? How many rows does it have?

          Use \Rcode{table()} to summarize the number of runs for each
          treatment (Control, DPN, and OHT).
  \end{enumerate}
\end{Exercise}

\begin{Solution}
  \begin{enumerate}
    \item First we load the \Rpackage{parathyroidSE} package.
<<results='hide'>>=
library(parathyroidSE)
@
          Before we load the \Rcode{parathyroidGenesSE} data set, we can
          check what data sets are contained in the \Rpackage{parathyroidSE}
          package with:
<<eval=FALSE>>=
data(package="parathyroidSE")
@
          Load the \Rcode{parathyroidGenesSE} data set:
<<>>=
data(parathyroidGenesSE)
parathyroidGenesSE
class(parathyroidGenesSE)
dim(parathyroidGenesSE)
@
\medskip

    \item
<<>>=
class(assay(parathyroidGenesSE))
dim(assay(parathyroidGenesSE))
assay(parathyroidGenesSE)[1:8, 1:8]
@
          The row names are Ensembl gene ids. No column names:
<<>>=
colnames(parathyroidGenesSE)
@
\medskip

    \item To compute the number of reads that were assigned to a gene in
          each sequencing run, we just need to sum all the counts that are
          in a column and do this for each column:
<<>>=
colSums(assay(parathyroidGenesSE))
@
          Genes with non-zero counts:
<<>>=
sum(rowSums(assay(parathyroidGenesSE)) != 0)
@
\medskip

    \item
<<>>=
rowData(parathyroidGenesSE)
@
          We get a \Rclass{GRangesList} object with one list element per
          gene. Each list element is a \Rclass{GRanges} object containing
          the exon ranges for the gene.
\medskip

    \item
<<>>=
colData(parathyroidGenesSE)
@
          We get a \Rclass{DataFrame} object with one row per sequencing
          run.
<<>>=
table(colData(parathyroidGenesSE)$treatment)
@
  \end{enumerate}
\end{Solution}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Aligned reads and BAM files}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


To operate, the \Rcode{summarizeOverlaps()} function needs 2 data objects:
\begin{enumerate}
  \item one representing the genomic ranges of the genes (or exons);
  \item one representing the aligned reads.
\end{enumerate}

The aligned reads are typically stored in one BAM file per sequencing run.
In the next exercise we will have a quick look at the BAM files included
in the \Rpackage{parathyroidSE} package. The reads in these files are
\emph{paired-end reads} that were aligned using the TopHat aligner.
To keep the package to a reasonable size, only a subset of all the aligned
reads from the experiment have been placed in these files. More information
on how these BAM files were obtained can be found in the vignette located
in the \Rpackage{parathyroidSE} package.

To get the paths to these files, do:
<<bampaths>>=
bamdir <- system.file("extdata", package="parathyroidSE")
bampaths <- list.files(bamdir, pattern="bam$", full.names=TRUE)
bampaths
@

To load \emph{single-end reads} from a BAM file, we can use the
\Rcode{readGAlignmentsFromBam()} function from the \Rpackage{Rsamtools}
package. If the reads are \emph{paired-end} and we want to preserve
the pairing, the \Rcode{readGAlignmentPairsFromBam()} function can be used.
However, for downstream analyses where the pairing doesn't need to be
preserved (e.g. if we're only going to compute the coverage of the reads),
the reads can be loaded with \Rcode{readGAlignmentsFromBam()}, which is
faster and returns an object that is simpler and easier to manipulate.

One last thing before we start the exercise. By default
\Rcode{readGAlignmentsFromBam()} and \Rcode{readGAlignment\-Pairs\-From\-Bam()}
load PCR or optical duplicates as well as secondary alignments.
These alignments are generally discarded from the read counting step.
We can discard them up-front by filtering them out when we load the
alignments from the BAM files. This is done by creating and passing a
\Rclass{ScanBamParam} object to the \Rcode{param} argument of
\Rcode{readGAlignmentsFromBam()} or \Rcode{readGAlignmentPairsFromBam()}.

\begin{Exercise}
  In this exercise, we learn how to load paired-end reads and filter out
  the alignments that are not suitable for read counting.
  \begin{enumerate}
    \item Load the SRR479052.bam file included in the \Rpackage{parathyroidSE}
          package, first with \Rcode{readGAlignmentsFromBam()}, then with
          \Rcode{readGAlignmentPairsFromBam()}.

          What are the classes of the returned objects?

          How many pairs are there in the 2nd object?

    \item The first and last mate for each pair can be extracted from the
          \Rclass{GAlignmentPairs} object with the \Rcode{first()} and
          \Rcode{last()} accessor functions.

          Extract the first mates. Extract the last mates.

    \item See the man page for the \Rcode{ScanBamParam()} constructor
          in the \Rpackage{Rsamtools} package.

          Construct a \Rclass{ScanBamParam} object (that you will pass
          to \Rcode{readGAlignmentPairsFromBam()}) that will filter out
          PCR or optical duplicates as well as secondary alignments.

          Use it to load the pairs again.
  \end{enumerate}
\end{Exercise}

\begin{Solution}
  \begin{enumerate}
    \item Loading the BAM file first with \Rcode{readGAlignmentsFromBam()}:
<<>>=
library(Rsamtools)
gal0 <- readGAlignmentsFromBam(bampaths[1])
gal0
@
          then with \Rcode{readGAlignmentPairsFromBam()}:
<<>>=
galp0 <- readGAlignmentPairsFromBam(bampaths[1])
galp0
@
          \Rcode{gal} is a \Rclass{GAlignments} object. \Rcode{galp} is a
          \Rclass{GAlignmentPairs} object. A \Rclass{GAlignmentPairs} object
          is also vector-like object where each element represents an
          aligned \emph{paired-end read}. So the number of pairs in it is
          just:
<<>>=
length(galp0)
@
\medskip

    \item
<<>>=
first(galp0)
last(galp0)
@
\medskip

    \item
<<>>=
param <- ScanBamParam(flag=scanBamFlag(isDuplicate=FALSE,
                                       isNotPrimaryRead=FALSE))
readGAlignmentPairsFromBam(bampaths[1], param=param)
@
  \end{enumerate}
\end{Solution}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Choosing and loading a gene model}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


To operate, \Rcode{summarizeOverlaps()} needs access to the genomic ranges
of the genes (or exons). This information can be extracted from what we call
a \emph{gene model}. Gene models for various organisms are provided by many
annotation providers on the internet (UCSC, Ensembl, NCBI, TAIR, FlyBase,
WormBase, etc...) In \Bioconductor{} a gene model is typically represented
as a \Rclass{TranscriptDb} object. The \Rpackage{GenomicFeatures} package
contains tools for obtaining a gene model from these providers and store
it in a \Rclass{TranscriptDb} object (the container for gene models).
For convenience, the most commonly used gene models are available as
\Bioconductor{} data packages (called TxDb packages). Each TxDb package
contains a \Rclass{TranscriptDb} object ready to use.

According to the vignette located in the \Rpackage{parathyroidSE} package,
the reads in the BAM files were aligned to the GRCh37 human reference genome.
If we wanted to use the gene model for Human provided by Ensembl, we could
do:
<<makeTranscriptDbFromBiomart, eval=FALSE>>=
## Requires INTERNET ACCESS and takes about 6 min. Please don't try to run this!
library(GenomicFeatures)
txdb <- makeTranscriptDbFromBiomart(biomart="ensembl",
                                    dataset="hsapiens_gene_ensembl")
@
This would return a \Rclass{TranscriptDb} object containing the Ensembl
gene model for Human.

IMPORTANT NOTE: One must be careful to choose a gene model based on the
same reference genome that was used to align the reads. The annotations
provided by Ensembl are updated at each new Ensembl release, which typically
happens 2 or 3 times per year (current release is Ensembl 73).
The \Rcode{"hsapiens\_gene\_ensembl"} dataset is usually based on the
most recent version of the Human reference genome (currently GRCh37).
So before we proceed with this \Rclass{TranscriptDb} object, we would need
to make sure that it's compatible with our BAM files, that is, we would
need to check that the \Rcode{"hsapiens\_gene\_ensembl"} dataset was based
on GRCh37 human at the time the \Rclass{TranscriptDb} object was made.

Because our goal is to use the counts to perform a \emph{differential
analysis} at the gene level, we will need to feed \Rcode{summarizeOverlaps()}
with a \Rclass{GRangesList} object containing the exon ranges grouped by
gene. This can be extracted from the \Rclass{TranscriptDb} object with
the \Rcode{exonsBy()} function:
<<exonsBy, eval=FALSE>>=
ex_by_gene <- exonsBy(txdb, by="gene")  # GRangesList object
@

For the purpose of this practical, we'll use a subset of the Ensembl genes.
This subset is stored in the \Rpackage{parathyroidSE} package and is based
on the GRCh37 human reference genome.

\begin{Exercise}
  In this exercise, we have a quick look at the \Rcode{exonsByGene} data set
  included in the \Rpackage{parathyroidSE} package.
  \begin{enumerate}
    \item Load the \Rcode{exonsByGene} data set from the
          \Rpackage{parathyroidSE} package. What is it?

    \item How many genes are represented in this object?
  \end{enumerate}
\end{Exercise}

\begin{Solution}
  \begin{enumerate}
    \item
<<>>=
data(exonsByGene)
exonsByGene
@
\medskip

    \item Number of genes in this object:
<<>>=
length(exonsByGene)
@
  \end{enumerate}
\end{Solution}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Count the reads}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


To count the reads, we use the \Rcode{summarizeOverlaps()} function
defined and documented in the \Biocpkg{GenomicRanges} package.

The aligned reads must be passed to the \Rcode{reads} argument of the
function (the 2nd argument). They can be represented in different ways,
including as a \Rclass{BamFile}, a \Rclass{GAlignments},
a \Rclass{GAlignmentPairs}, or a \Rclass{BamFileList} object.
The first 3 types of objects only allow passing the reads from a single
sequencing run at a time. Using a \Rclass{BamFileList} object allows us
to pass the reads from all the sequencing runs at once.
To create such an object, we use the \Rcode{BamFileList()} constructor
function from the \Rpackage{Rsamtools} package:
<<BamFileList>>=
library(Rsamtools)
bamfile_list <- BamFileList(bampaths, index=character())
@
Note that we need to use \Rcode{index=character()} here because there are
no BAM index files (\Rcode{.bam.bai} extension) associated with our BAM files.

\begin{Exercise}
  Let's do the read counting.
  \begin{enumerate}
    \item Use \Rcode{summarizeOverlaps()} on \Rcode{exonsByGene} and
          \Rcode{bamfile\_list} to count the reads. Check the man page for
          the details. Note that because the RNA-seq protocol was not strand
          specific, you need to specify \Rcode{ignore.strand=TRUE}.
          Also because the reads are \emph{paired-end}, you need to specify
          \Rcode{singleEnd=FALSE}. This will tell \Rcode{summarizeOverlaps()}
          to use \Rcode{readGAlignment\-Pairs\-From\-Bam()} instead of
          \Rcode{readGAlignmentsFromBam()} internally to read the BAM files.

    \item When \Rcode{summarizeOverlaps()} calls the reading function
          internally on each BAM file, it does so without specifying any
          particular \Rcode{param} value, so, by default, PCR or optical
          duplicates and secondary alignments are loaded. However, if a
          \Rcode{param} argument is passed to \Rcode{summarizeOverlaps()},
          it will be passed along to the reading function.

          Count the reads again but discard PCR or optical duplicates as well
          as secondary alignments.
  \end{enumerate}
\end{Exercise}

\begin{Solution}
  \begin{enumerate}
    \item
<<>>=
read_count0 <- summarizeOverlaps(exonsByGene, bamfile_list,
                                 ignore.strand=TRUE,
                                 singleEnd=FALSE)
read_count0
@
\medskip

    \item To discard PCR or optical duplicates as well as secondary alignments,
          we re-use the \Rclass{ScanBamParam} object we prepared earlier:
<<>>=
read_count <- summarizeOverlaps(exonsByGene, bamfile_list,
                                ignore.strand=TRUE,
                                singleEnd=FALSE,
                                param=param)
read_count
@
          Let's do a quick comparison between the 2 counts:
<<>>=
colSums(assay(read_count0))
colSums(assay(read_count))
@
          No difference here in the final count. This means the reads we
          discarded didn't get assigned to any gene the first time we counted
          (but this wouldn't necessarily be the case with a bigger data set).
  \end{enumerate}
\end{Solution}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Conclusion}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Now that we have our read counts, we're ready to perform a \emph{differential
analysis} with the \Rpackage{DESeq2} package.

\medskip
\medskip
\medskip

\centerline{THANKS!}


\appendix

\bibliography{SummerX}

\end{document}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{How to get rid of reads with low mapping quality?}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


In a BAM file, the mapping quality of the alignments is stored in the
MAPQ field (MAPping Quality). For more information about this field (and
about the SAM format in general) please refer to the SAM Spec document
hosted on the SAMtools website (\url{http://samtools.sourceforge.net/}).

By default \Rcode{readGAlignmentsFromBam()} and
\Rcode{readGAlignmentPairsFromBam()} don't load the MAPQ field from the file.
In fact, they completely ignore this field and the alignments are loaded
independently of their mapping quality. Using a \Rclass{ScanBamParam} object
when calling these functions allows to choose what BAM fields to load.

\begin{Exercise}
  The goal of this exercise is to get rid of reads with low mapping
  quality before the counting.
  \begin{enumerate}
    \item Construct a \Rclass{ScanBamParam} object that will filter out
          PCR or optical duplicates as well as secondary alignments, and
          that will load the MAPQ field.

    \item Use this \Rclass{ScanBamParam} object to load the pairs in
          SRR479052.bam

    \item Extract the first mates and last mates separately. Check their
          metadata columns.

    \item Subset the \Rclass{GAlignmentPairs} object to keep only the
          pairs in which the 2 mates have a mapping quality greater than
          some threshold.

    \item In addition to a \Rclass{BamFileList}, the \Rcode{reads}
          argument of \Rcode{summarizeOverlaps()} can be a \Rclass{BamFile},
          a \Rclass{GAlignments}, or a \Rclass{GAlignmentPairs} object.

          Call \Rcode{summarizeOverlaps()} on \Rcode{exonsByGene} and
          the above \Rclass{GAlignmentPairs} object, and with
          \Rcode{ignore.strand=TRUE}.

    \item To obtain the full count like in the previous section, we would
          need to repeat this for all the BAM files. This would give us
          one \Rclass{SummarizedExperiment} object per file that we would
          then need to combine with \Rcode{cbind()}.
  \end{enumerate}
\end{Exercise}