%\VignetteIndexEntry{Using the inSilicoDb v2 package} %\VignetteDepends{RCurl, rjson} \documentclass{article} \usepackage{url} \usepackage{color} \newcommand{\todo}[1]{\textcolor{red}{\textbf{#1}}} \begin{document} \title{Using inSilicoDb 2.0} \author{Quentin De Clerck$\footnote{\texttt{qdeclerc@vub.ac.be}}$} \maketitle \section{Functions from inSilicoDb 2.0} This new version of the package provides all the functionality of previous inSilicoDb packages developed by \emph{Jonatan Taminau}. The functions did not change, therefore we refer to the description of previous versions. \subsection{Access to InSilico MySafe} One of the new features of this package is the possibility for users to access their private data stored on InSilico MySafe. This feature is implemented by the functions: \begin{description} \item[InSilicoLogin(login, password)] Logs the user in for the given login and password. There is no secure way to login to a webservice in R. Therefore the password given to the function has to be the md5 hash of the real password. After login in, the user can use the normal functions as described earlier with their private data. \item[InSilicoLogout()] Logs the currently user out of the InSilico DB webservice. \item[getInSilicoUserDetails()] Returns informations (id, name, email) about the user that is currently logged in. \end{description} \subsection{Check the accessibility of data} Some helper functions were added to provide information about the availability of data: getDatasetInfo and getPlatformList. The purpose of getDatasetInfo is two-fold. First, it will return the default values of all optional parameters and the title of the study. Second, it will return the availability of the requested dataset for specified parameters. It returns an error if the data is not available for download. You need to login to access datasets and datasetinfo. Use your InSilicoDB login and an md5 hash of your password. For this example we're using a restricted test account. <>= library("inSilicoDb"); InSilicoLogin("rpackage_tester@insilicodb.com", "5c4d0b231e5cba4a0bc54783b385cc9a"); eset = getDatasetInfo("GSE781", "GPL96"); print(eset); ## We check the availability of following normalizations for series GSE781 on platform GPL97 norms = c("FRMA", "ORIGINAL") output = sapply(norms, function(n) { tryCatch({ eset <- getDatasetInfo("GSE781", "GPL97", norm = n); eset$norm; }, error = function(e) { "Unavailable" }); }); print(output); # We can thus conclude that the series GSE781 on platform GPL97 does not support FRMA @ The getPlatformList function returns all the platforms supported by the package. It is also possible to query the availability of the normalizations using this function. <>= platforms = getPlatformList(); print(platforms); FRMAplatforms = getPlatformList(norm = "FRMA"); print(FRMAplatforms); @ \subsection{Curated Data} What changed in the functionality with respect to previous package is the better integration of the curated data. The user can now decide if he wants curated data or not by specifying the format.There are two possible formats: ESET and CURESET. By default the format is CURESET, this means with curated data. It is also possible to specify the curation wanted for the requested expression set. <>= # without curated data eset = getDataset("GSE4635", "GPL96", format = "ESET"); # with curated data cureset = getDataset("GSE4635", "GPL96", format = "CURESET", curation = 9016); print(phenoData(eset)); print(phenoData(cureset)); @ It is possible to get information about the default curation (id, time, curator) of a particular dataset by calling the getDefaultCuration function. Using the getCurationInfo function returns a complete overview of all the possible curations in more details. <>= default = getDefaultCuration("GSE4635"); print(default); getCurationInfo("GSE4635"); @ \section{SCAN and UPC normalizations} Two normalizations were added in to the package. It is now possible to request data in Single Channel Array Normalization (SCAN) \cite{piccolo2013scan} or Universal exPression Codes (UPC) \cite{piccolo2013upc}. SCAN is a possible alternative to fRMA and UPC outputs the probabilities of the probes/genes being expressed. <>= scan = getDataset("GSE7670", "GPL96", norm = "SCAN", features = "gene"); # example of values in SCAN normalization print(exprs(scan)[1:10, 1:5]); upc = getDataset("GSE7670", "GPL96", norm = "UPC", features = "gene"); # example of values in UPC normalization print(exprs(upc)[1:10, 1:5]); @ \section{Session Info} <<>>= sessionInfo() @ \bibliographystyle{plain} \bibliography{inSilicoDb2} \end{document}