"
WF <- createParam(cmd, overwrite = TRUE, writeParamFiles = TRUE, confirm = TRUE)
targetspath <- system.file("extdata", "targets.txt", package = "systemPipeR")
WF_test <- loadWorkflow(targets = targetspath, wf_file="hisat2.cwl",
input_file="hisat2.yml", dir_path = "param/cwl/hisat2/")
WF_test <- renderWF(WF_test, inputvars = c(FileName = "_FASTQ_PATH1_"))
WF_test
cmdlist(WF_test)[1:2]
```
# Visualize Workflow - Full details {#plotWF}
## Color and text
On the plot, different colors and numbers indicate different status. This information
can be found also in the plot legends.
**Shapes:**
- circular steps: pure R code steps
- rounded squares steps: `sysargs` steps, steps that will invoke command-line calls
- blue colored steps and arrows: main branch (see [main branch](#main-branch) section below)
**Step colors**
- black: pending steps
- Green: successful steps
- Green: failed steps
**Number and colors**
There are 4 numbers in the second row of each step, separated by `/`
- First No.: number of passed samples
- Second No.: number of warning samples
- Third No.: number of erros samples
- Forth No.: number of total samples
**Duration**
This is shown after the sample information, as how long it took to run this step.
Units are a few seconds (**s**), some minutes (**m**), or some hours (**h**).
## on hover
When the mouse is hovering on each step, detailed information will be displayed.
## logging
The workflow steps will also become clickable if `in_log = TRUE`. This will create links
for each step that navigate to corresponding log section in the SPR
[workflow log file](change to page that introduce the log file). Normally this option
is handled by SPR log file generating function to create this plot on top of the log file,
so when a certain step is click, it will navigate to the detailed section down the page.
Here is only an example to demo how the plot can be clickable (will not navigate you to
anywhere). Visit [this page](link) to see a real example.
```{r}
plotWF(sal, in_log = TRUE)
```
## Plot Method
The default plotting method is `svg`. It means the plot is generated by `svg` embedding.
Sometimes certain browsers may not display `svg` correctly. In this case, the other option
is to use `png` to embed the plot. However, you will **lose hovering, clicking and some**
**responsiveness** (plot auto resizing ability) of the plot.
```{r}
plotWF(sal, plot_method = "png")
```
## Rstudio
By default, even if you are working inside Rstudio the plot is **not displayed in Rstudio viewer**.
This is because the workflow steps will be too small inside Rstudio viewer too
see the details. We recommend to view it in a larger space, so by default it will
open up your web browser to display it. You can enforce `rstudio = TRUE` to see it
in Rstudio Viewer.
```{r}
plotWF(sal, rstudio = TRUE)
```
## Responsiveness
This is a term often used in web development. It means will the plot resize itself
if the user resize the document window? By default, `plotWF` will be responsive,
meaning it will fit current window container size and adjust the size once the window
size has changed. To always display the full sized plot, use `responsive = FALSE`,
useful for embedding the plot in a full-screen mode.
```{r}
plotWF(sal, responsive = FALSE)
```
For the plot above, you need to scroll to see the plot.
## Layout
There a few different layout you can choose. There is no best layout. It all depends
on the workflow structure you have. The default is `compact` but we recommend you
to try different layouts to find the best fitting one.
- `compact`: try to plot steps as close as possible.
- `vertical`: main branch will be placed vertically and side branches will be placed
on the same horizontal level and sub steps of side branches will be placed
vertically.
- `horizontal`: main branch is placed horizontally and side branches and sub
steps will be placed vertically.
- `execution`: a linear plot to show the workflow execution order of all steps.
**vertical**
```{r}
plotWF(sal, layout = "vertical", height = "600px")
```
The plot is very long, use `height` to make it smaller.
**horizontal**
```{r}
plotWF(sal, layout = "horizontal")
```
**execution**
```{r}
plotWF(sal, layout = "execution", height = "600px", responsive = FALSE)
```
The plot is very long but if we use `height` to limit to a smaller size, details are
hard to see. Then it will be good to use `height` and `responsive = FALSE` together.
## Main branch
From the plots above, you can that there are many steps which do not connect to any
other steps. These dead-ends are called ending steps. If we connect the first step,
steps in between and these ending step, this will become a branch. Imagine the workflow is
a upside-down tree structure and the root is the first step. Therefore, there are
many possible ways to connect the workflow. For the convenience of plotting, we
introduce a concept of _"main branch"_, meaning one of the possible connecting
strategies that will be placed at the center of the plot. Other steps that are not
in this major branch will surround this major space.
This main branch will not impact the `compact` layout so much but will have a huge
effect on `horizontal` and `vertical` layouts.
The plotting function has an algorithm that will automatically choose a best branch for
you by default. In simple words, it favors: a. branches that connect first and last step;
b. as long as possible.
You can also choose a branch you want by `branch_method = "choose"`. It will first
list all possible branches, and then give you a prompt to ask for your favorite branch.
Here, for rendering the Rmarkdown, we cannot have a prompt, so we use a second argument
in combination, `branch_no = x` to directly choose a branch and skip the prompt. Also,
we use the `verbose = TRUE` to imitate the branch listing in console. In a real case,
you only need `branch_method = "choose"`.
Watch closely how the plot change by choosing different branches. Here we use `vertical`
layout to demo. Remember, the main branch is marked in blue.
```{r collapse=TRUE}
plotWF(sal, layout = "vertical", branch_method = "choose", branch_no = 1, verbose = FALSE)
```
### Unmark main branch
The _main branch_ concept may not represent the main workflow. It is introduced
for the convenience of plotting. Most times by auto detecting, it will find the
major steps in a workflows, sometimes it does not. It depends on how the users
design the workflow. If you think this is not a good representation, you can mute it
by `mark_main_branch = FALSE`. You will no longer see the blue-colored steps on
plot and on legends.
```{r}
plotWF(sal, mark_main_branch = FALSE, height = "500px")
```
## Legends
The legend can also be removed by `show_legend = FALSE`
```{r}
plotWF(sal, show_legend = FALSE, height = "500px")
```
## Output formats
There are current three output formats: `"html"` and `"dot"`, `"dot_print"`. If first
two were chosen, you also need provide a path `out_path` to save the file.
- html: a single html file contains the plot.
- dot: a DOT script file with the code to reproduce the plot in a [graphiz](https://graphviz.org/)
DOT engine.
- dot_print: directly cat the dot script to console.
```{r}
plotWF(sal, out_format = "html", out_path = "example_out.html")
file.exists("example_out.html")
```
```{r}
plotWF(sal, out_format = "dot", out_path = "example_out.dot")
cat(readLines("example_out.dot")[1:5], sep = "\n")
```
```{r eval=FALSE}
plotWF(sal, out_format = "dot_print") #
```
### Save to a static image file
Some users may want to save the plot to a static image, like `.png` format. We will
need do some extra work to save the file. The reason we cannot directly save it to
a png file is the plot is generated in real-time by a browser javascript engine. It
requires one type of javascript engine, like Chrome, MS Edge, Viewer in Rstudio,
to render the plot before we can see it.
#### Interactive
- If you are working in Rstudio, you can use the `export` button in the viewer to save
an image file.
- If you are working from command-line, use `plot_method = 'png'` to first ask the browser
to generate a png and then when you see the image, you can right-click to save it.
#### Non-interactive
If you cannot have an interactive session, like submitting a job to a cluster,
but still want the png, we recommend to use the {[webshot2](https://github.com/wch/webshot)}
package to screenshot the plot. It runs headless Chrome in the back-end (which has a javascript engine).
Install the package
```{r eval=FALSE}
# remotes::install_github("rstudio/webshot2")
```
Save to html first
```{r eval=FALSE}
#plotWF(sal, out_format = "html", out_path = "example_out.html")
# file.exists("example_out.html")
```
Use `webshot2` to save the image
```{r}
# webshot2::webshot("example_out.html", "example_out.png")
```
# Inner Classes
`SYSargsList` steps are can be defined with two inner classes, `SYSargs2` and
`LineWise`. Next, more details on both classes.
## `SYSargs2` Class {#sysargs2}
*`SYSargs2`* workflow control class, an S4 class, is a list-like container where
each instance stores all the input/output paths and parameter components
required for a particular data analysis step. *`SYSargs2`* instances are
generated by two constructor functions, *loadWF* and *renderWF*, using as data
input *targets* or *yaml* files as well as two *cwl* parameter files (for
details see below).
In CWL, files with the extension *`.cwl`* define the parameters of a chosen
command-line step or workflow, while files with the extension *`.yml`* define
the input variables of command-line steps. Note, input variables provided by a
*targets* file can be passed on to a *`SYSargs2`* instance via the *inputvars*
argument of the *renderWF* function.
The following imports a *`.cwl`* file (here *`hisat2-mapping-se.cwl`*) for
running the short read aligner HISAT2 [@Kim2015-ve]. For more details about the
file structure and how to design or customize our own software tools, please
check `systemPipeR and CWL` pipeline.
```{r sysargs2_cwl_structure, echo = FALSE, eval=FALSE}
hisat2.cwl <- system.file("extdata", "cwl/hisat2/hisat2-mapping-se.cwl", package = "systemPipeR")
yaml::read_yaml(hisat2.cwl)
```
```{r sysargs2_yaml_structure, echo = FALSE, eval=FALSE}
hisat2.yml <- system.file("extdata", "cwl/hisat2/hisat2-mapping-se.yml", package = "systemPipeR")
yaml::read_yaml(hisat2.yml)
```
The *loadWF* and *renderWF* functions render the proper command-line strings for
each sample and software tool.
```{r SYSargs2_structure, eval=TRUE}
library(systemPipeR)
targetspath <- system.file("extdata", "targets.txt", package = "systemPipeR")
dir_path <- system.file("extdata/cwl", package = "systemPipeR")
WF <- loadWF(targets = targetspath, wf_file = "hisat2/hisat2-mapping-se.cwl",
input_file = "hisat2/hisat2-mapping-se.yml",
dir_path = dir_path)
WF <- renderWF(WF, inputvars = c(FileName = "_FASTQ_PATH1_",
SampleName = "_SampleName_"))
```
Several accessor methods are available that are named after the slot names of
the *`SYSargs2`* object.
```{r names_WF, eval=TRUE}
names(WF)
```
Of particular interest is the *`cmdlist()`* method. It constructs the system
commands for running command-line software as specified by a given *`.cwl`* file
combined with the paths to the input samples (*e.g.* FASTQ files) provided by a
*`targets`* file. The example below shows the *`cmdlist()`* output for running
HISAT2 on the first SE read sample. Evaluating the output of *`cmdlist()`* can
be very helpful for designing and debugging *`.cwl`* files of new command-line
software or changing the parameter settings of existing ones.
```{r cmdlist, eval=TRUE}
cmdlist(WF)[1]
```
The output components of *`SYSargs2`* define the expected output files for each
step in the workflow; some of which are the input for the next workflow step,
here next *`SYSargs2`* instance.
```{r output_WF, eval=TRUE}
output(WF)[1]
```
The targets components of `SYSargs2` object can be accessed by the targets
method. Here, for single-end (SE) samples, the structure of the targets file is
defined by:
- `FileName`: specify the FASTQ files path;
- `SampleName`: Unique IDs for each sample;
- `Factor`: ID for each treatment or condition.
```{r, targets_WF, eval=TRUE}
targets(WF)[1]
as(WF, "DataFrame")
```
Please note, to work with custom data, users need to generate a *`targets`* file
containing the paths to their own FASTQ files and then provide under
*`targetspath`* the path to the corresponding *`targets`* file.
In addition, if the [Environment Modules](http://modules.sourceforge.net/) is
available, it is possible to define which module should be loaded, as shown
here:
```{r, module_WF, eval=TRUE}
modules(WF)
```
Additional information can be accessed, as the parameters files location and the
`inputvars` provided to generate the object.
```{r, other_WF, eval=FALSE}
files(WF)
inputvars(WF)
```
## LineWise Class {#linewise}
`LineWise` was designed to store all the R code chunk when an RMarkdown file is
imported as a workflow.
```{r lw, eval=TRUE}
rmd <- system.file("extdata", "spr_simple_lw.Rmd", package = "systemPipeR")
sal_lw <- SPRproject(overwrite = TRUE)
sal_lw <- importWF(sal_lw, rmd, verbose = FALSE)
codeLine(sal_lw)
```
- Coerce methods available:
```{r, lw_coerce, eval=TRUE}
lw <- stepsWF(sal_lw)[[2]]
## Coerce
ll <- as(lw, "list")
class(ll)
lw <- as(ll, "LineWise")
lw
```
- Access details
```{r, lw_access, eval=TRUE}
length(lw)
names(lw)
codeLine(lw)
codeChunkStart(lw)
rmdPath(lw)
```
- Subsetting
```{r, lw_sub, eval=TRUE}
l <- lw[2]
codeLine(l)
l_sub <- lw[-2]
codeLine(l_sub)
```
- Replacement methods
```{r, lw_rep, eval=TRUE}
replaceCodeLine(lw, line = 2) <- "5+5"
codeLine(lw)
appendCodeLine(lw, after = 0) <- "6+7"
codeLine(lw)
```
- Replacement methods for `SYSargsList`
```{r, sal_rep_append, eval=FALSE}
replaceCodeLine(sal_lw, step = 2, line = 2) <- LineWise(code={
"5+5"
})
codeLine(sal_lw, step = 2)
appendCodeLine(sal_lw, step = 2) <- "66+55"
codeLine(sal_lw, step = 2)
appendCodeLine(sal_lw, step = 1, after = 1) <- "66+55"
codeLine(sal_lw, step = 1)
```
## Workflow design structure using *`SYSargs`*: Previous version
Instances of this S4 object class are constructed by the *`systemArgs`* function
from two simple tabular files: a *`targets`* file and a *`param`* file. The
latter is optional for workflow steps lacking command-line software. Typically,
a *`SYSargs`* instance stores all sample-level inputs as well as the paths to
the corresponding outputs generated by command-line- or R-based software
generating sample-level output files, such as read preprocessors
(trimmed/filtered FASTQ files), aligners (SAM/BAM files), variant callers
(VCF/BCF files) or peak callers (BED/WIG files). Each sample level input/output
operation uses its own *`SYSargs`* instance. The outpaths of *`SYSargs`* usually
define the sample inputs for the next *`SYSargs`* instance. This connectivity is
established by writing the outpaths with the *`writeTargetsout`* function to a
new *`targets`* file that serves as input to the next *`systemArgs`* call.
Typically, the user has to provide only the initial *`targets`* file. All
downstream *`targets`* files are generated automatically. By chaining several
*`SYSargs`* steps together one can construct complex workflows involving many
sample-level input/output file operations with any combination of command-line
or R-based software.
```{r, eval=TRUE, echo=FALSE, out.width="100%", fig.align = "center", fig.cap= "Workflow design structure of *`systemPipeR`* using previous version of *`SYSargs`*"}
knitr::include_graphics(system.file("extdata/images", "SystemPipeR_Workflow.png", package = "systemPipeR"))
```
# Third-party software tools {#tools}
Current, *systemPipeR* provides the _`param`_ file templates for third-party
software tools. Please check the listed software tools.
```{r table_tools, echo=FALSE, message=FALSE}
library(magrittr)
SPR_software <- system.file("extdata", "SPR_software.csv", package = "systemPipeR")
software <- read.delim(SPR_software, sep = ",", comment.char = "#")
colors <- colorRampPalette((c("darkseagreen", "indianred1")))(length(unique(software$Category)))
id <- as.numeric(c((unique(software$Category))))
software %>%
dplyr::mutate(Step = kableExtra::cell_spec(Step, color = "white", bold = TRUE,
background = factor(Category, id, colors)
)) %>%
dplyr::select(Tool, Description, Step) %>%
dplyr::arrange(Tool) %>%
kableExtra::kable(escape = FALSE, align = "c", col.names = c("Tool Name", "Description", "Step")) %>%
kableExtra::kable_styling(c("striped", "hover", "condensed"), full_width = TRUE) %>%
kableExtra::scroll_box(width = "80%", height = "500px")
```
Remember, if you desire to run any of these tools, make sure to have the
respective software installed on your system and configure in the `PATH`.
You can check as follows:
```{r test_tool_path, eval=FALSE}
tryCMD(command="gzip")
```
# Version information
```{r sessionInfo}
sessionInfo()
```
# Funding
This project is funded by NSF award [ABI-1661152](https://www.nsf.gov/awardsearch/showAward?AWD_ID=1661152).
# References