This page is the main information point for Bioconductors participation in the Google summer of Code project this year (2014).
Each selected student (mentee) will be paid USD $5500 to work on a Bioconductor project for 3 months during the summer.
Students should look at the list of projects and see if any project interests them. Email the project mentors to express your interest, and describe any prior experience.
Students with ideas for Bioconductor projects not listed below are encouraged to email any of the mentors listed below with project ideas.
Students will submit project applications directly to Google.
Google will award a certain number of student slots to the Bioconductor project.
The Bioconductor administrators and mentors will rank projects in order of importance to the project, and the top projects will be funded.
Any selected students will be expected to register with the bioconductor and bioc-devel mailing lists.
There is a timeline posted at Google explaining how this works. Students are encouraged to look at this and make sure that they can commit to this. There is also a FAQ in case people have other questions that are not addressed here.
Background/Motivation: As very large genomic data sets become more and more common, computational biologists are spending inordinate time transforming data from the format of the original resource to a format amenable to computation in their programming language of choice. The R / Bioconductor community needs programmatic access to cloud-based experimental data resources that can be readily incorporated into their own work flows.
AnnotationHub and its supporting packages are primed to support such a project. AnnotationHub provides infrastructure to make well-curated resources available to R software clients, but it needs the addition of a GUI interface to allow addition of user-supplied resources, including transformation of data into formats amenable to direct use by R clients.
Work with us to create a GUI interface that does the following:
1) allows the user to add large genomic resources that have been
transformed into a GenomicRanges::GRanges object along with their associated metadata to a
NOSQL back-end database.
2) provides an intuitive front end using a shiny method that allows the user
to upload the object that was passed in to the method up to the DB
3) checks and validates that all the metadata has been filled in
appropriately when the shiny GUI is being run and then uploads that to the DB.
4) on the back end, enable a new instance of the AnnotationHubServer
that knows how to listen for requests from the GUI and can add the
data when appropriate.
5) Once the method and back end are both working for GenomicRanges::GRanges objects, you should also write methods for other popular Bioconductor objects such as:
Biobase::ExpressionSet,
GenomicRanges::SummarizedExperiment,
GenomicRanges::GrangesList.
Familiarity with R S4 methods and with shiny.
Subscribe to the mailing lists for bioconductor and bioc-devel.
Create a basic S4 method similar to what you can see in the interactiveDisplay package. Focus on the method for data.frames since it is the simplest to understand. Your method will be conceptually similar to this. Except in this case, instead of the focus being on displaying the object, this method should be intended to allow the user to collect metadata from the user about the Title, Species, TaxonomyId and Genome and then send a message back to the user R session that contains this information along with the object itself. The method should take a GRanges object as an argument and then should launch a simple GUI to extract information from the user (or even better-from the object itself), and then return that data to the user.
Marc Carlson mcarlson@fhcrc.org
Dan Tenenbaum dtenenba@fhcrc.org Martin Morgan mtmorgan@fhcrc.org
Every scientific analysis should result in a reproducible report. The ReportingTools Bioconductor package provides multiple means of report generation, including an imperative API driven by an R script, as well as a declarative interface through knitr. ReportingTools converts common R/Bioconductor data structures like data.frames and ExpressionSets into report elements, such as tables and plots, according to user-definable mappings. It supports multiple backends, with the HTML backend being the most developed. The HTML report elements have some limited interactivity (such as sortable tables). Additional interactivity is enabled through integration with the shiny package.
Our goal is to add some simple interactive HTML report elements, where the interactivity is implemented in the front-end, independent of shiny or other R instance. These would include basic plots of summaries, as well as simple genomic plots, which would display alignments, per-position summaries, and annotations along the genome. Summarized data could be directly embedded in the report, and the client would access large datasets by querying standard biological data sources like DAS, AnnotationHub, ExperimentHub and web-accessible BAM/VCF/BigWig files. This work would occur in a new or at least separate package that could be used directly, or through its integration with ReportingTools. We may be able to leverage existing work like clickme and Rcharts for the low-level plotting. The genomic plotting might rely on and even drive the development of the pViz javascript library.
The goal is for simple, lightweight plots in redistributable reports. There is no intent for this to replace the more sophisticated shiny-based solutions, nor applications like epiviz(R).
Michael Lawrence lawrence.michael@gene.com
Martin Morgan mtmorgan@fhcrc.org Marc Carlson mcarlson@fhcrc.org
The mzR R/Bioconductor package provides a unified application programming interface to the common open and community-driven file formats and parsers available for mass spectrometry data, namely mzXML, mzML and mzData (see current vignette for details and references). It relies on C and C++ code from other third party open-source projects and the Rcpp package to, notably, provide a direct mapping from R to C++ infrastructure.
Currently, mzR provides two back-ends to read mass spectrometry raw data:
More details about the project can be found on the official package page.
The goal is to extend current useful, yet limited capabilities of mzR by adding support for the state-of-the-art proteowizard project.
We will provide example data in all formats and support on the domain. This project will use the mzR github page as main collaboration and communication hub.
Subscribe to the Bioconductor and Bioc-devel mailing lists.
Give us your github account to be added as a mzR collaborator on the github page.
Install and explore the mzR and Rcpp packages and the proteowizard project. We will provide simple tasks to be coded in C++ using the proteowizard code base to familiarise yourself.
Expand mzR on raw data and meta-data accession using the proteowizard 'msdata' object type. This will allow read and write support for the widely used Proteomics Standards Initiative XML-based raw data formats and their latest definitions.
Implement support for additional open formats, in particular the MSMS identification (mzIdentML) using the 'identdata' object type.
Ensure successful compilation of the updated mzR package on the respective architectures.
The candidate will have to familiarise herself proteowizard code base.
Laurent Gatto lg390@cam.ac.uk
Steffen Neuman sneumann@ipb-halle.de Dirk Eddelbuettel edd@debian.org
Source Code & Build Reports »
Source code is stored in
svn
(user: readonly
, pass: readonly
).
Software packages are built and checked nightly. Build reports:
Development Version»
Bioconductor packages under development:
Developer Resources: