# (PART) Preamble {-} # Introduction [Bioconductor](https://bioconductor.org) is an open source, open development software project to provide tools for the analysis and comprehension of high-throughput genomic data. It is based primarily on the [R](http://www.r-project.org/) programming language. ## What you will learn The goal of this book is to provide a solid foundation in the usage of Bioconductor tools for single-cell RNA-seq analysis by walking through various steps of typical workflows using example datasets. We strive to tackle key concepts covered in the manuscript, __"Orchestrating Single-Cell Analysis with Bioconductor"__, with each workflow covering these in varying detail, as well as essential preliminaries that are important for following along with the workflows on your own. ### Preliminaries For those unfamiliar with R (and those looking to learn more), we recommend reading the [_Learning R and More_](#learning-r-and-more) chapter, which first and foremost covers how to get started with R. We point to many great online resources for learning R, as well as related tools that are nice to know for bioinformatic analysis. For advanced users, we also point to some extra resources that go beyond the basics. While we provide an extensive list of learning resources for the interested audience in this chapter, we only ask for [some familiarity with R](#getting-started-with-r) before going to the next section. We then briefly cover getting started with [_Using R and Bioconductor_](#using-r-and-bioconductor). Bioconductor, being its own repository, has a unique set of tools, documentation, resources, and practices that benefit from some extra explanation. [_Data Infrastructure_](#data-infrastructure) merits a separate chapter. The reason for this is that common data containers are an essential part of Bioconductor workflows because they enable interoperability across packages, allowing for "plug and play" usage of cutting-edge tools. Specifically, here we cover the _SingleCellExperiment_ class in depth, as it has become the working standard for Bioconductor based single-cell analysis packages. Finally, before diving into the various workflows, armed with knowledge about the _SingleCellExperiment_ class, we briefly discuss the datasets that will be used throughout the book in [_About the Data_](#about-the-data). ### Workflows All workflows begin with data import and subsequent _quality control and normalization_, going from a raw (count) expression matrix to a clean one. This includes adjusting for experimental factors and possibly even latent factors. Using the clean expression matrix, _feature selection_ strategies can be applied to select the features (genes) driving heterogeneity. Furthermore, these features can then be used to perform _dimensionality reduction_, which enables downstream analysis that would not otherwise be possible and visualization in 2 or 3 dimensions. From there, the workflows largely focus on differing downstream analyses. _Clustering_ details how to segment a scRNA-seq dataset, and _differential expression_ provides a means to determine what drives the differences between different groups of cells. _Integrating datasets_ walks through merging scRNA-seq datasets, an area of need as the number of scRNA-seq datasets continues to grow and comparisons between datasets must be done. Finally, we touch upon how to work with _large scale data_, specifically where it becomes impractical or impossible to work with data solely in-memory. As an added bonus, we dedicate a chapter to _interactive visualization_, which focuses on using the _iSEE_ package to enable active exploration of a single cell experiment's data. ## What you won't learn The field of bioinformatic analysis is large and filled with many potential trajectories depending on the biological system being studied and technology being deployed. Here, we only briefly survey some of the many tools available for the analysis of scRNA-seq, focusing on Bioconductor packages. It is impossible to thoroughly review the plethora of tools available through R and Bioconductor for biological analysis in one book, but we hope to provide the means for further exploration on your own. Thus, it goes without saying that you may not learn the optimal workflow for your own data from our examples - while we strive to provide high quality templates, they should be treated as just that - a template from which to extend upon for your own analyses. ## Who we wrote this for We've written this book with the interested experimental biologist in mind, and do our best to make few assumptions on previous programming or statistical experience. Likewise, we also welcome more seasoned bioinformaticians who are looking for a starting point from which to dive into single-cell RNA-seq analysis. As such, we welcome any and all feedback for improving this book to help increase accessibility and refine technical details. ## Why we wrote this This book was conceived in the fall of 2018, as single-cell RNA-seq analysis continued its rise in prominence in the field of biology. With its rapid growth, and the ongoing developments within Bioconductor tailored specifically for scRNA-seq, it became apparent that an update to the [Orchestrating high-throughput genomic analysis with Bioconductor](https://www.nature.com/articles/nmeth.3252) paper was necessary for the age of single-cell studies. We strive to highlight the fantastic software by people who call Bioconductor home for their tools, and in the process hope to showcase the Bioconductor community at large in continually pushing forward the field of biological analysis. ## Acknowledgements We would like to thank all Bioconductor contributors for their efforts in creating the definitive leading-edge repository of software for biological analysis. It is truly extraordinary to chart the growth of Bioconductor over the years. We are thankful for the wonderful community of scientists and developers alike that together make the Bioconductor community special. We would first and foremost like to thank the Bioconductor core team and the emerging targets subcommittee for commissioning this work, Stephanie Hicks and Raphael Gottardo for their continuous mentorship, and all our contributors to the companion manuscript of this book. We'd also like to thank Garret Grolemund and Hadley Wickham for their book, [R for Data Science](https://r4ds.had.co.nz/index.html), from which we drew stylistic and teaching inspiration. We also thank Levi Waldron and Aaron Lun for advice on the code-related aspects of managing the online version of this book.