\documentclass{article} \usepackage{mathptmx} \usepackage{amsmath} \usepackage{graphicx} \usepackage{float} \usepackage{epsfig} %% \usepackage{semcolor} \usepackage[pdftex,colorlinks=false,pdfborder=0 0 0,pageanchor=false,bookmarks=false,urlcolor=black]{hyperref} \pdfpagewidth=11truein \pdfpageheight=8.5truein \pdfhorigin=1truein % default value(?), but doesn't work without \pdfvorigin=1truein % default value(?), but doesn't work without %% \newcommand{\fs}{\fromSlide} %% \newcommand{\bs}{\begin{slide}} %% \newcommand{\es}{\end{slide}} %% \slideframe{none} \setcounter{secnumdepth}{0} \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textsf{#1}}} \newcommand{\Rmethod}[1]{{\texttt{#1}}} \newcommand{\Rfunarg}[1]{{\texttt{#1}}} \newcommand{\Rclass}[1]{{\textit{#1}}} \newcommand{\R}{\textsf{R}} \newcommand{\term}[1]{\emph{#1}} \newcommand{\mref}[2]{\htmladdnormallinkfoot{#2}{#1}} \begin{document} <>= options(width=70) @ \author{M. T. Morgan\footnote{\url{mtmorgan@fhcrc.org}} \\ Fred Hutchinson Cancer Research Center \\ Seattle, WA} \title{Parallel R Practical} \date{4 August, 2006} \maketitle \section{Part 1: getting going} We have two objectives in the first part of this practical: to orient you on using our `cluster', and to expose you to some of the basic functionality of the \Rpackage{Rmpi} package. \subsection{Connecting to our `cluster'} We've set up a small `cluster' of computers to use as a parallel environment. The cluster is very minimalist in some senses (just three nodes), and the linux environment might be unfamiliar to some of you. We'll try to walk through the very basics here. If you're familiar with \verb|ssh|, then feel free to connect immediately to the ip address available at the front of the room. Otherwise, follow instructions available at the start of the lab. All of us will use the same username \verb|knoppix| and password \verb|bioc|. %% The steps you'll follow are: %% \begin{enumerate} %% \item Connect to the cluster. %% \item Change to your working directory. %% \item Perform preliminary operations to make sure \R{} is available %% and functioning properly on a single node. %% \end{enumerate} %% \subsubsection{Connecting to the cluster} %% You'll need to connect to the cluster using \verb|ssh| or %% \verb|telnet|. The cluster nodes are at the ip addresses %% \begin{verbatim} %% 140.107.169.186 %% 140.107.169.28 %% 140.107.169.31 %% \end{verbatim} %% To use \verb|telnet| on Windows: %% \begin{enumerate} %% \item Go to the `Start' and then `Run...' menu items. %% \item In the field labeled 'Open:', enter \verb|telnet 140.107.169.186|. %% \item When prompted for a username, enter \verb|knoppix| and then the %% return key. For a password, enter \verb|bioc|. %% \end{enumerate} %% To use \verb|telnet| on a Mac: %% \begin{enumerate} %% \item Open a console window. %% \item Type \verb|telnet 140.107.169.186| at the prompt. %% \item Enter username \verb|knoppix| and password \verb|bioc| %% \end{enumerate} \subsection{Once connected} Congratulations! You are now at a linux command line. You've all logged in as the same user. To avoid stepping on each other's toes too much, please change to your assigned directory immediately. Do this by typing \begin{verbatim} [% user0 %] cd ~/user0 \end{verbatim} where \verb|user0| is the user id assigned to you at the start of the practical. This will be the directory to use for storing any files you might create, and will be the home directory for your \R{} session. An (initially identical) directory exists on each of the other nodes in our cluster. Here are some linux commands that you might find useful: \begin{description} \item[ls] list contents of directory; \verb|ls| list current directory contents, \verb|ls mydir| list contents of directory \verb|mydir| located in current directory, \verb|ls ~/mydir| list contents of directory \verb|mydir| in the `home' (\verb|~|) directory. \item[cd] change directory. \verb|cd ..| -- change up a level, \verb|cd mydir| to change to directory \verb|mydir| in the current directory; \verb|cd| to change to `home' directory. \item[pwd] list the path to the current directory. \item[R] launch \R{}, with the `working directory' set to the current directory. \item[logout] logout of the session. \end{description} You are actually in a fairly full-featured linux environment, so if you're familiar with this feel free to launch the editor of your choice, etc. Since we're all sharing the same computer, please don't be too resource greedy, and please do not modify files such as \verb|.bashrc| that will influence other users. As a simple test that things are going well, try launching the `command line' version of \R. Do this by typing \begin{verbatim} [% user0 %] R \end{verbatim} at the prompt. You should be greeted by the familiar welcome information screen, and be presented by the \verb|>| prompt. Try typing some of your favorite \R{} commands, and loading a few packages. Everything should work as expected. <>= x <- 10:1 y <- 1:10 x + y hist(runif(1000)) sessionInfo() library(Biobase) search() @ If there are problems, let's try to work them out now before carrying on. A couple of quick things about \R{} in command-line mode. You might find it useful to use the keys \verb|ctrl-p| and \verb|ctrl-n| to recall commands that you've already entered. Some commands make use of other programs that can be confusing, and in particular the command <>= f <- function(x) x fix(f) @ will open the \verb|vi| editor to edit the function \verb|f|. Unless you're familiar with \verb|vi|, probably the best thing to do is to exit immediately by typing \verb|:q| followed by the return key. We'll try to stick with simple input and output, to avoid having to make complex editing changes. Another confusing command can result from searching for help \begin{verbatim} > library(help = Rmpi) > ?mpi.bcast.cmd > help.search("bcast") \end{verbatim} Each of these commands opens the \verb|less| `pager'. You can scroll through the resulting display by typing \verb|f| and \verb|b| to move forward and backward a page at a time, and type \verb|q| to stop displaying the help information. \subsection{Launching and exploring \Rpackage{Rmpi}} The objective of this section is to familiarize you with the basic functionality of the \Rpackage{Rmpi} package. We won't really do any useful work, but hopefully spur some thoughts about how to use parallel programming. Start by loading the \Rpackage{Rmpi} package, starting and stopping a collection of nodes, and executing a few simple commands: <>= library(Rmpi) mpi.spawn.Rslaves() mpi.remote.exec(search()) x <- 1:5 mpi.bcast.Robj2slave(x) rm(x) mpi.remote.exec(x^2) mpi.remote.exec(x <- x^2) mpi.remote.exec(x <- x*mpi.comm.rank()) mpi.close.Rslaves() @ Now let's get a more comprehensive feeling for the functionality of the package. Look at the overview of functions provided by <>= library(help=Rmpi) @ Remember that you can scroll forward and backward with the commands \verb|f|, \verb|b|, and that you can stop reading with \verb|q|. There are three groups of functions. We'll find those toward the end, in the section `MPI Extensions specifically to slavedaemon.R' most useful (slavedaemon.R refers to a script in the \Rpackage{Rmpi} package used to launch nodes when we use the command \Rfunction{mpi.spawn.Rslaves}). Explore the help pages for specific functions that look particularly relevant, e.g., \begin{verbatim} > ?mpi.remote.exec > ?mpi.parLapply > ?mpi.setup.rngstream > ?mpi.close.Rslaves \end{verbatim} Think about how these functions might be helpful in parallel analysis. The functions in the section `MPI Extensions in R Environment' are useful in conjunction with the \Rfunction{mpi.spawn.Rslaves} function, as well as in more general parallel programming contexts. Scan the help pages for a few of these \begin{verbatim} > ?mpi.spawn.Rslaves > ?mpi.bcast.Robj \end{verbatim} How would you launch a cluster with, say, 6 slaves? Any ideas about when would you want a cluster with more or fewer slaves than there are compute nodes in the cluster? The functions in the section `MPI APIs' are the core of \Rpackage{Rmpi}. If you have used MPI in other programming languages, then you'll recognize that \Rpackage{Rmpi} has an interface to some but not all of the functionality of MPI. \Rpackage{Rmpi} provides the common point-to-point (e.g., \verb|send| and \verb|recv| data between two nodes), collective (e.g., \verb|bcast|, \verb|scatter| and \verb|gather| data efficiently to collections of nodes), and non-blocking operations. We will not emphasize these functions, though they are actually used in many of the functions we have already executed. \subsection{A first example: mad about Golub} Many Bioconductor packages summarize gene expression data in \Rclass{ExpressionSet} objects. Type the following <>= library(golubEsets) data(golubMerge) golubMerge @ and view the help page (\verb|?golubMerge|) about this data set to get a feel for what it represents. The \R{} function \Rfunction{mad} calculates the \emph{median average deviation} of a sample. The \Rfunction{mad} is a non-parametric statistic like the standard deviation, describing variability in the sample. As an exploratory step, we might be interested in identifying genes with large variability, thinking perhaps that these are going to be most useful in distinguishing between samples (this is naive!). Here is how we might extract the matrix of expression values from \verb|golubMerge|, calculate the \Rfunction{mad} for each gene, and view the top 10 most variable genes in \verb|golubMerge|: <>= exprSet <- exprs(golubMerge) res <- apply(exprSet, 1, mad) sort(res, decreasing=TRUE)[1:10] @ See the help pages (\verb|?exprs|, \verb|?apply|, etc.) for information on \R{} functions you do not recognize, or to learn more about arguments provided to functions. We'll now walk through a couple of different ways in which we could use a parallel environment to perform these calculations. The first method is to replace \Rfunction{apply} with \Rfunction{mpi.parApply}. To do this, evaluate commands like the following: <>= library(Rmpi) mpi.spawn.Rslaves(nslaves=3) res <- mpi.parApply(exprSet, 1, mad) sort(res, decreasing=TRUE)[1:10] mpi.close.Rslaves() @ (It is not necessary to use \Rfunction{mpi.spawn.Rslaves} and \Rfunction{mpi.close.Rslaves} for every calculation, just once for each R session.) \begin{itemize} \item Hopefully the results of the two different ways of calculating \Robject{res} are the same. Use \Rfunction{identical} to confirm this. Are there situations when results might not be exactly identical, but `close enough'? \end{itemize} Use \Rfunction{system.time} to get a sense for how long calculations are taking: <>= mpi.spawn.Rslaves(nslaves=3) system.time(res1 <- apply(exprSet, 1, mad)) system.time(res2 <- mpi.parApply(exprSet, 1, mad)) identical(res1, res2) @ This is very approximate, especially with the setup in the lab! \begin{itemize} \item How does the system time for parallel evaluation compare with that for evaluation on a single node (pay particular attention to the first and third numbers reported by \Rfunction{system.time}, and use the help page for \Rfunction{system.time} to help you interpret the results)? \item Does \Rfunction{system.time} offer any evidence that calculations are actually occurring on remote nodes? \end{itemize} In my trials, this command took only slightly longer or was even faster on one node as on several nodes. \begin{itemize} \item What sorts of factors might influence parallel evaluation time? \item If evaluating this type of command really is about as time-efficient on one as on several nodes, does this mean that parallelization is pointless? Would it be better to dissect the calculation into smaller pieces (e.g., figuring out where \Rfunction{mad} spends most of its time) and try to parallelize those parts, or to look for ways of combining the \Rfunction{mad} calculation with additional steps of analysis? How would you go about each of these steps (dissection, or combining analyses)? \end{itemize} Let's explore a second way of performing our calculation in parallel. The previous example sends portions of the \Robject{exprSet} `over the wire' to each node. But actually, each node has access to a copy of \verb|golubMerge| on its own disk. So let's just send the \emph{commands} over the wire, and see what happens. The single node code we will try to emulate uses \Rfunction{sapply}: <>= ff <- function(i) mad(exprSet[i,]) res1 <- sapply(1:nrow(exprSet), ff) @ To execute in parallel, we have to load the required data: <>= mpi.bcast.cmd(library(golubEsets)) mpi.bcast.cmd(data(golubMerge)) mpi.bcast.cmd(exprSet <- exprs(golubMerge)) @ We can then do our parallel evaluation sending only our function \Rfunction{ff} to the remote nodes: <>= system.time(res1 <- sapply(1:nrow(exprSet), ff)) system.time(res2 <- mpi.parSapply(1:nrow(exprSet), ff)) identical(res1, res2) @ On a real cluster, this calculation seems to scale very well, almost inversely proportional to the number of nodes performing calculations. \begin{itemize} \item Is it `fair' to only include the time spent in \Rfunction{mpi.parSapply}? Or should we also include the time for sending the additional command? Or even distributing the \Robject{golubEsets} library to the nodes in the first place? \item Suppose the calculation really does scale inversely to the number of nodes. Sketch a graph, the x-axis indicating number of nodes in a cluster and the y axis time. Sketch a curve reflecting the total 'computation' time as a function of node number. How much do I gain by adding a constant number of nodes, say 1, to a small cluster, compared to a large cluster? Sketch a second curve reflecting communication time. If there were no network issues, communication time might increase close to logarithmically with number of nodes. Combine computation plus communication time into a single, overall computation, time. What does this say about limits to parallelization? \end{itemize} \subsection{Reproducibility} As a final introductory exploration of \Rpackage{Rmpi}, and as an introduction to a thorny issue in parallel programming, let's explore generating random numbers. Here are some things to try: \begin{itemize} \item Start a cluster of slaves. \item Use \Rfunction{runif} and \Rfunction{mpi.remote.exec} to generate 15 random numbers on each node. Why does each node generate supposedly random numbers that are all in the same sequence? \item Figure out how to use \Rfunction{mpi.setup.rngstream} to generate streams of random numbers that are different on each node. \item Figure out how to use \Rfunction{mpi.setup.rngstream} so that each node produces a different stream of random numbers (different on each node -- call the streams `A'), but that two separate invocations of the same random number call again produces the streams `A'. \item Now arrange to start the cluster, generate a sequence of random numbers (stream `A'), stop the cluster, and start the cluster again (with the same number of nodes) and generate the sequence of random numbers `A' again. \item Try stopping your cluster, and staring with a \emph{different} number of nodes (remember that each node starts up a new instance of \R{}, and that we only have three actual nodes available, so don't be too ambitious about your cluster size). Can you generate the sequence of random numbers `A'? If you wanted to be able to repeat exactly the same sequence of random numbers regardless of cluster size, what might you do? \end{itemize} \section{Part 2: cross-validation, bootstrap, and `SPMD' processing} This portion of the lab tackles some useful work, in the form of an analysis (\Rfunction{xval}) that can require substantial computational time. We also explore a model of parallel computation in \R, where each node executes the same programming, performing some redundant calculation before or after a parallel section where work is divided between nodes. \subsection{Cross-validation} The serial version of the analysis we want to do is <>= library(MLInterfaces) library(golubEsets) data(golubMerge) smallG <- golubMerge[200:250, ] lk1 <- xval(smallG, "ALL.AML", knnB, xvalMethod = "LOO", group = as.integer(0)) table(lk1, smallG$ALL.AML) @ This takes a subset of the \Robject{golubMerge} data set, devises an algorithm to classify gene expresssion profiles into tumor types (acute lymphoblastic leukemia, ALL, or acute myeloid leukemia, AML) using the $k$-nearest neighbor machine learning algorithm and leave-one-out cross-validation. The table summarizes how the cross-validations performed: diagonal elements represent correct classifications. If the next paragraph becomes too complicated, use the command <>= source("~/rsrcs/xval-setup.R") @ to read the required set-up into your \R{} session. The \Rfunction{xval} provides provides a `hook' that allows users to reach into the function and parallelize the computationally intensive section of code. We'll access the hook by creating a class that serves as a `predicate' to influence code execution, and then create an object of that class: <>= setClass("RmpiXval",representation("list")) cluster <- new("RmpiXval") @ The documentation for \Rfunction{xval} suggests that we can define a method \Rfunction{xvalLoop} that will be executed in place of the \Rfunction{lapply} loop that performs the computationally important and parallelizeable parts of the calculation. The following method makes it possible to use \Rfunction{mpi.parLapply} instead of \Rfunction{lapply}. <>= setMethod("xvalLoop", signature(cluster="RmpiXval"), function(cluster, ...) mpi.parLapply) @ With these preliminaries, we're ready for our first attempt at parallelizing \Rfunction{xval}: <>= mpi.bcast.cmd(library(MLInterfaces)) lk1 <- xval(smallG, "ALL.AML", knnB, xvalMethod = "LOO", group = as.integer(0), cluster=cluster) table(lk1, smallG$ALL.AML) @ \begin{itemize} \item Hopefully the results of the single-processor and parallel results agree. Do they? \item We discussed \Rfunction{mpi.parLapply} above. When it is used in \Rfunction{xval}, the effect is to forward \Robject{exprs(smallG)} to each of the nodes in the cluster. How would you expect this solution to scale? Any ideas about how to make this more efficient? \end{itemize} \subsection{`Single program, multiple data' model} Before starting this section, execute the command <>= source("~/rsrsc/batch.R") @ <>= wrap <- function(func, pseudo, comm=1) { iter <- -1 sz <- mpi.comm.size(comm)-1 rank <- mpi.comm.rank(comm) tag <- list(result=1) if (rank==0) function(...) { iter <<- iter + 1 mpi.recv.Robj(iter %% sz + 1, tag$result, comm) } else function(...) { iter <<- iter + 1 if (iter %% sz + 1 == rank) { result <- func(...) mpi.send.Robj(result, 0, tag$result, comm) } else result <- pseudo result } } mpi.bcast.Robj2slave(wrap) @ <>= bcast.Rcmd <- function (cmd = NULL, rank = 0, comm = 1){ if (mpi.comm.rank(comm) == rank) { cmdp <- deparse(substitute(cmd), width.cutoff = 500) cmdp <- paste(cmdp, collapse = "\"\"/") mpi.bcast(x = nchar(cmdp), type = 1, rank = rank, comm = comm) mpi.bcast(x = cmdp, type = 3, rank = rank, comm = comm) eval(cmd, parent.frame()) } else { charlen <- mpi.bcast(x = integer(1), type = 1, rank = rank, comm = comm) if (is.character(charlen)) parse(text = "break") else { out <- mpi.bcast(x = .Call("mkstr", as.integer(charlen), PACKAGE = "Rmpi"), type = 3, rank = rank, comm = comm) parse(text = unlist(strsplit(out, "\"\"/"))) } } } @ The program flow in the previous example is that the manager starts to evaluate \Rfunction{xval}. The manager `massages' input data, and at a critical point forwards data and work to other nodes in the cluster. A different model is to start \Rfunction{xval} on all nodes, and for all nodes to input and massage data. At the critical section of code marked by \Rfunction{xvalLoop}, each node performs only its own work, and arranges to communicate results between all nodes. This model of computation is potentially very efficient, avoiding communication of large data betweeen nodes. We will now explore how to implement this solution. Consider the following commands: <>= bcast.Rcmd(ff <- function() { f <- function(i) mpi.comm.rank() sapply(1:20, f) }) bcast.Rcmd(ff()) @ So far not very interesting! One way to think the function \Rfunction{ff} is as a small program. \Rfunction{bcast.Rcmd} is a function I wrote for this practical, available in the file \verb|~/rsrsc/batch.R|. It evaluates the single program on all nodes, including the master node. The first \Rfunction{bcast.Rcmd} assigns the value \Robject{ff} to all nodes on the cluster. The second \Rfunction{bcast.Rcmd} evaluates \Robject{ff} on all nodes. \Rfunction{bcast.Rcmd} returns the value of the program as run on the manager node. We now introduce a `wrapper' that changes the way evaluation of \Rfunction{f} works. <>= bcast.Rcmd(ff <- function() { f <- function(i) mpi.comm.rank() ff <- wrap(f, -1) sapply(1:20, ff) }) bcast.Rcmd(ff()) @ The \Rfunction{wrap} function (also written for this practical) takes a function and `wraps' it so that successive calls to the function only sometimes compute a new value. The other times, it returns a previously-stored psuedo-value. Computations occur in a round-robin fashion -- first node 1, then node 2, etc. The manager node never evaluates the function, but instead collects and coordinates the results of the worker nodes. The end result is that the manager node collects the round-robin evaluations from all nodes, returning something approaching a useful result. Here is a slightly more extensive example, where a bootstrap calculation is distributed across nodes in a cluster. <>= bcast.Rcmd(ff <- function() { library(boot) ratio <- function(d, w) sum(d$x * w)/sum(d$u * w) ratiow <- wrap(ratio, -1) boot(city, ratiow, R = 999, stype = "w") }) bcast.Rcmd(ff()) @ Notice that this is changed only a very little from the single-processor evaluation. Here is a sketch of the \Rfunction{wrap} function: <>= <> @ This uses several features of the \R{} language, and as an exercise it would be useful to discuss these with others in the lab. \begin{itemize} \item In \R{}, a function can be treated just like any other object. What is the return value of \Rfunction{wrap}? \item \R{} is \emph{lexically scoped}. This means that the return values of \Rfunction{wrap} `remember' the environment in which they were defined. So the return value of \Rfunction{wrap} can access the variables \Robject{iter, sz, rank} and \Robject{tag}. \item A line near then end of \Rfunction{wrap} reads \verb|iter <<- iter + 1|. The \R{} operator \verb|<<-| causes assignment, but unlike the usual assignment statement \Rfunction{<-} it looks for a variable (in this case named \Robject{iter}) to assign to in the \emph{environment} where the function is defined. So the effect is to change the value of the \Robject{iter} that is defined in the very first line of \Rfunction{wrap} \item \Rfunction{wrap} is really quite incomplete. What issues can you see with it, and how migth it be improved? \end{itemize} \subsection{A parallel \Rfunction{lapply}} \Rfunction{lapply} provides a very useful framework for embarassingly parallel problems. <>= args(lapply) @ The argument \Robject{X} represents the `work' to be done, \Robject{FUN} the program that needs to be evaluated on each piece of work, and \Robject{...} the data on which the program is to be evaluated. Here is a batch program that uses \Rfunction{lapplys}, an \Rfunction{lapply}-like alternative, to perform a simple parallel computation: <>= lapplys <- function(X, FUN, ..., comm=1) { rank <- mpi.comm.rank(comm) + 1 n <- mpi.comm.size(comm) tasks <- 1:length(X) mywork <- X[split(tasks, cut(tasks, n))[[rank]]] result <- lapply(mywork, FUN, ...) allgather.Robj(result, comm) } mpi.bcast.Robj2slave(lapplys) @ <>= allgather.Robj <- function(obj=NULL, comm=1) { obj <- as.integer(charToRaw(serialize(obj, NULL))) sz <- mpi.allgather(length(obj), 1, integer(mpi.comm.size(comm)), comm) objs <-mpi.allgatherv(obj, 1, integer(sum(sz)), sz, comm) as.list(unlist(lapply(split(as.raw(objs),rep(1:length(sz)-1, sz)), unserialize), recursive=FALSE, use.names=FALSE)) } mpi.bcast.Robj2slave(allgather.Robj) @ <>= res <- bcast.Rcmd(lapplys(1:20, function(i) mpi.comm.rank())) unlist(res) @ The `work' is a list from 1 to 20. The function to be performed by each node is to determine the rank of the node. \Rfunction{lapplys} ensures that work gets distributed approximately evenly between nodes, and collates results from each node into a single list (\Rfunction{unlist(res)} simplifies results for compact presentation). As a more complicated example, here is a way to perform cross-validation in batch mode. <>= bcast.Rcmd(ff <- function() { library(MLInterfaces) setClass("BatchXval",representation("list")) setMethod("xvalLoop", signature(cluster="BatchXval"), function(cluster, ...) lapplys) cluster <- new("BatchXval") data(golubMerge) smallG <- golubMerge[200:250, ] lk1 <- xval(smallG, "ALL.AML", knnB, xvalMethod = "LOO", group = as.integer(0), cluster=cluster) table(lk1, smallG$ALL.AML) }) bcast.Rcmd(ff()) @ \begin{itemize} \item Describe what steps the flow of execution of this program. Which node(s) read data from disk? How much communication is likely occurring between nodes? \item Based on experience earlier in the lab, how do you think this program performs as more nodes are used in the computation? \end{itemize} Let's take a closer look at \Rfunction{lapplys}. <>= <> @ The function has nearly the same signature as \Rfunction{lapply}. The first several lines make \Rfunction{lapplys} slightly different on each node that it runs on. The line \Rfunction{mywork<-...} divides \Robject{X} into approximately evenly sized lists, and selects a different list for each node. The next line evaluates \Robject{FUN} for a subset of the original \Robject{X}, with the subset differing depending on the the node. The final line is an \Rpackage{Rmpi} extension that collates results from different nodes. \Rfunction{lapplys} makes use of an extension to \Rpackage{Rmpi} <>= <> @ Allgather is an MPI concept, where objects from all nodes are collated and redistributed to all nodes. The \Rfunction{allgather.Robj} is an extension to this, and is included here to illustrate how MPI routines can be included in \R{} functions for easy evaluation. Finally, \Rfunction{bcast.Rcmd} is defined on the manager node to broadcast commands to all workers, and to evaluate the same commands on the manager. <>= <> @ This command makes use of several \R{} language concepts, including the ability to deparse and serialize \R{} objects (including functions), and to evaluate \R{} objects in environments different from the calling environment. \section{Conclusions} We have covered alot of ground in the lab. Here is a brief summary of the highlights: \begin{enumerate} \item There are at present few `out of the box' solutions for parallel programming in \R{}; you'll have to grapple with at least some details of parallelization. \item Not all parts of a script can be usefully parallelized. \item Communication costs can be important, and for large data sets can outweigh much of the benefit of faster computation. \item Even without communication costs, the decrease in compute time is inversely proportional to the number of nodes available: adding 1 node to a 1 node cluster doubles potential computational throughput, but adding 1 node to a 10 node cluster matters hardly at all. \item Repeatability is essential for scientific research, and can be a thorny issue in a parallel environment. \item \Rpackage{Rmpi} provides some functionality for parallel programming, including an interface to lower-level MPI functions. \item Useful new parallel techniques introduced in the lab rely on executing a series of commands as a single program on the remote and manager nodes. \item This technique is useful in conjunction with a `wrapper' that determines which nodes engage in expensive computations, and \Rfunction{lapplys}, an \Rfunction{lapply}-like function for distributing work evenly between nodes. \end{enumerate} Needless to say, we have barely scratched the surface of possibility. The algorithms presented here are only a fraction of those useful in parallel computation. They lack any error checking or recovery, and make no attempt to check-point or otherwise ensure against system failure. These are the directions for exploration. \end{document}