% % NOTE -- ONLY EDIT THE .Rnw FILE!!! The .tex file is % likely to be overwritten. % %\VignetteIndexEntry{RBioinf Introduction} %\VignettePackage{RBioinf} \documentclass[12pt]{article} \usepackage{amsmath} \usepackage[authoryear,round]{natbib} \usepackage{hyperref} \textwidth=6.2in \textheight=8.5in %\parskip=.3cm \oddsidemargin=.1in \evensidemargin=.1in \headheight=-.3in \newcommand{\scscst}{\scriptscriptstyle} \newcommand{\scst}{\scriptstyle} \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textit{#1}}} \newcommand{\Rmethod}[1]{{\texttt{#1}}} \newcommand{\Rfunarg}[1]{{\texttt{#1}}} \newcommand{\Rclass}[1]{{\textit{#1}}} \bibliographystyle{plainnat} \begin{document} \title{RBioinf Introduction} \maketitle \section*{Introduction} The \Rpackage{RBioinf} package has a handful of functions that support the monograph, \textit{R Programming for Bioinformatics}. It also has a few functions that were developed for a more in depth study of S4 OOP that were never included in that monograph, but for which the discussion may be of use to some (and which unfortunately may no longer be correct as S4 has evolved a lot in recent times). \section*{S4 Classes} Details on S4 classes are given in a number of other locations. This is not a tutorial on their use, but rather moves quickly to a discussion of the code for computing linearizations. \subsection{Class Linearization} For many different computations, especially method dispatch, an algorithm for specifying a linear order of the class inheritance tree is needed. All object oriented programming languages support the computation of a linearization. Here we provide a succinct description of the procedure and give some simple examples. The methods used to compute a linearization are different for different languages. For languages that only support single inheritance the precedence rule that is used is that a class is more specific than any of its superclasses. Methods defined for subclasses will override methods defined for superclasses. When multiple inheritance is supported then it may not be clear which one of two superclasses is more specific. The issue arises when there are two superclasses, and these classes do not themselves have a subclass -- superclass relationship. In S4 the style of linearization named C3 \citep{DylanOOPSLA} has been adopted and will be used. An implementation of this linearization is provided by the \Rfunction{computeClassLinearization} function with the \Rfunarg{C3} argument set to \Robject{TRUE}. In languages with multiple inheritance some mechanism must be provided to resolve the appropriate order in which to inherit properties. There can be conflicts or multiple methods which might be applicable and it is important these issues be resolved. Ideally they should be resolved at class definition time, as there will be obvious efficiency losses if this is done at dispatch time. C++ and Eiffel provide programming constructs that allow/require the programmer to specify the class precedence hierarchy. In languages such as Common Lisp (and the CLOS object system) and Dylan the approach that is taken is usually to have an automatic method of computing a linearization. A linearization of a class hierarchy is a linear ordering of the class labels contained in the hierarchy, such that each class appears once. There has been a reasonable amount of research in these automatic methods and in this section we discuss some of those and propose the one that was recommended in \cite{DylanOOPSLA}. In most object-oriented languages (especially those with single inheritance) the precedence rule that is used is that a class is more specific than any of its superclasses. And methods defined for sub-classes will override methods defined for superclasses. When multiple inheritance is supported then it may not be clear which one of two superclasses is more specific. The issue arises when there are two superclasses, and these classes do not themselves have a subclass -- superclass relationship. Both CLOS and Dylan use the local precedence order (LPO) the order of the direct superclasses given in the class definition in computing the linearization, with earlier superclasses considered more specific than later ones. If there are no duplicate class labels in the hierarchy then this is then simply a bread-first search of the superclass definitions. But when one or more classes are inherited from different superclasses this definition becomes more complicated, and can in fact not be satisfied, as we will see below. We begin with an example from \cite{DylanOOPSLA} and note that our examples will mainly be from this reference. We also note that we will make use of the \Rpackage{graph}, \Rpackage{RBGL} and \Rpackage{Rgraphviz} packages from the Bioconductor project to provide us with the necessary infrastructure for representing, computing on and drawing the class hierarchies. <>= library(RBioinf) setClass("object") setClass("grid-layout", contains="object") setClass("horizontal-grid", contains="grid-layout") setClass("vertical-grid", contains="grid-layout") setClass("hv-grid", contains=c("horizontal-grid", "vertical-grid")) LPO("hv-grid") @ <>= computeClassLinearization("object") computeClassLinearization("grid-layout") computeClassLinearization("vertical-grid") @ An inheritance graph can be inconsistent under a given linearization mechanism. This means that the linearization is over-constrained and thus does not exist for the given inheritance structure. In the next code segment we create a class, \Rclass{confused}, for which the LPO will attempt to place \Rclass{horizontal-grid} before \Rclass{vertical-grid} because this is their order in \Rclass{hv-grid}, and it will also attempt to place \Rclass{vertical-grid} ahead of \Rclass{horizontal-grid} since this is their order in \Rclass{vh-grid}; but clearly both of these properties cannot be satisfied simultaneously. We use the function \Rfunction{LPO} to compute the local precedence order. In the code below, we see that there is no linearization of the classes possible for \Rclass{confused}, which is why it is wrapped in a call to \Rfunction{tryCatch}. <>= setClass("vh-grid", contains=c("vertical-grid", "horizontal-grid")) setClass("confused", contains=c("hv-grid", "vh-grid")) LPO("vh-grid") tryCatch(LPO("confused"), error=function(x) "this one failed") @ A plot of the class hierarchy graph for the \Rclass{confused} class is given here. <>= library(Rgraphviz) confG = class2Graph("confused") cGa = makeNodeAttrs(confG, shape="ellipse", fill="grey", width=4) plot(confG, nodeAttrs=cGa) @ \begin{figure}[tp] \begin{center} \includegraphics[width=\textwidth]{RBioinf-plotconfG} \end{center} \caption{\label{fig:confG} A class hierarchy that cannot be resolved.} \end{figure} In order to fully describe and understand the mechanism used to determine which method should be applied, for a given argument list, and more fully to specify an ordering of methods, we must have a linearization of the class hierarchy. In many regards determining this hierarchy quickly and according to some widely recognized design principles will be essential for efficiency and for writing programs that work as their authors intended. The order in which one class precedes another is often called the class precedence list. In languages with single inheritance it is quite easy to compute but when multiple inheritance is used some problems can arise. Some of the history, and a detailed discussion of the issues is given in \cite{DylanOOPSLA}. The strategy used in Dylan as well as an implementation are described in \citealp[pg.~55]{Shalit96}. We begin by considering only the class hierarchy that comes from using the \Robject{contains} argument when defining classes. Later, we will return to some of the more complex issues that arise if the \Rfunction{setIs} function is used to further affect the class hierarchy. A class definition specifies a local ordering on that class and its direct superclasses using the following two rules: \begin{itemize} \item the class precedes all direct superclasses \item the superclasses have precedence in the order in which they are provided in the \Rfunction{contains} argument in the call to \Rfunction{setClass}, the ordering is right to left (highest to lowest). \end{itemize} The class precedence list for a class is total ordering on that class and all of its superclasses that is consistent with the local ordering and with the class precedence lists of all of its superclasses. Shalit indicates that two lists are consistent if for every \textit{A} and \textit{B} that are in both lists then either \textit{A} precedes \textit{B} in both or \textit{B} precedes \textit{A} in both. There may be several orderings that are consistent -- some one of these must be chosen. (this an implementation detail). There may be no possible orderings and in that case an error is given on the call to \Rfunction{setClass}. The first, and most basic rule, is that the ordering on direct superclasses of a class is defined by the order in which they are given in the \Robject{contains} argument to \Rfunction{setClass}. One way of determining the class precedence list is via the function \Rfunction{extends} which if called with one argument returns the class precedence list. We first introduce a set of classes that will be used for our initial explorations. In the code chunk below, we see that the only real difference between the class \Rclass{c} and \Rclass{d} is the order in which they include \Rclass{a} and \Rclass{b} and that this difference is reflected in the output from \Rfunction{extends}. <>= setClass("a") setClass("b") setClass("c", contains = c("a", "b")) setClass("d", contains = c("b", "a")) extends("c") extends("d") setClass("e", contains=c("c", "d")) @ There are some functions that allow the user to understand the superclass relationships. Function from the \Rpackage{methods} package include \Rfunction{getAllSuperClasses} and \Rfunction{superClassDepth}. In addition we have added another helper function in the \Rpackage{RBioinf} package called \Rfunction{superClasses} which finds and reports only the direct, or most immediate superclasses. This last function will be used to build a graph of the class hierarchy for visual inspection. <>= getAllSuperClasses(getClass("e")) cD = superClassDepth(getClass("e")) cD$label cD$depth superClasses(getClass("e")) @ We now turn our attention to a more complicated situation, and one in which the ordering of the classes in the class precedence list might not be so obvious. <>= cH = class2Graph("e") @ <>= setClass("pane", contains="object") setClass("editing-mixin", contains="object") setClass("scrolling-mixin", contains="object") setClass("scrollable-pane", contains=c("pane", "scrolling-mixin")) setClass("editable-pane", contains=c("pane", "editing-mixin")) setClass("editable-scrollable-pane", contains=c("scrollable-pane", "editable-pane")) @ For the \Rclass{editable-scrollable-pane} class the two linearizations, Dylan-style and C3 provide different answers. It can be argued that the C3 linearization is less surprising since it puts \Rclass{scrolling-mixin} ahead of \Rclass{editing-mixin} and that is somewhat sensible since this would be the same ordering as their direct superclasses. <>= LPO("editable-scrollable-pane") LPO("editable-scrollable-pane", C3=TRUE) @ In the figure below we see the class graph for the \Rclass{editable-scrollable-pane} class. <>= eWG = class2Graph("editable-scrollable-pane") eWGattrs = makeNodeAttrs(eWG, shape="ellipse", fill="grey", width=4) plot(eWG, nodeAttrs=eWGattrs) @ \section*{Session Information} The version number of R and packages loaded for generating the vignette were: \begin{verbatim} <>= sessionInfo() @ \end{verbatim} \bibliography{RBioinf} \end{document}