You may download a PDF version of this manual here
Authors: Sahil Seth [aut, cre]
Version: 0.9.10.9016
License: MIT + file LICENSE
This framework allows you to design and implement complex pipelines, and deploy them on your institution’s computing cluster. This has been built keeping in mind the needs of bioinformatics workflows. However, it is easily extendable to any field where a series of steps (shell commands) are to be executed in a (work)flow.
R (>= 3.0.2), methods, params (>= 0.3), utils
diagram, whisker, tools
reshape2, knitr, ggplot2, openxlsx, testthat, funr
(none)
Use a set of shell commands (flow mat) and flow definiton to create flow object.
to_flow(x, ...)
is.flow(x)
## method for class 'character'
to_flow(x, def, grp_col, jobname_col, cmd_col, ...)
## method for class 'flowmat'
to_flow(x, def, flowname, grp_col, jobname_col, cmd_col, submit = FALSE, execute = FALSE, containerize = TRUE, platform, flow_run_path, qobj, verbose = opts_flow$get("verbose"), ...)
## method for class 'data.frame'
to_flow(x, ...)
## method for class 'list'
to_flow(x, def, flowname, flow_run_path, desc, qobj, module_cmds = opts_flow$get("module_cmds"), verbose = opts_flow$get("verbose"), ...)
opts_flow$get("flow_run_path")</code>.</dd> <dt>qobj</dt> <dd class="rd-dd">Depreciated, modify cluster templates as explained on <a href = http://docs.flowr.space/install.html#hpcc_support_overview>docs.flowr.space</a>. An object of class <a href=#queue>queue</a>.</dd> <dt>verbose</dt> <dd class="rd-dd">A numeric value indicating the amount of messages to produce. Values are integers varying from 0, 1, 2, 3, .... Please refer to the <a href=#verbose>verbose</a> page for more details. <code>opts_flow$get(“verbose”)
opts_flow$get(“module_cmds”)
.
The parameter x can be a path to a flow_mat, or a data.frame (as read by read_sheet). This is a minimum three column table with columns: samplename, jobname and cmd. See to_flowmat for details.
Behaviour: What goes in, and what to expect in return?
## Use this link for a few elaborate examples:
## http://docs.flowr.space/flowr/tutorial.html#define_modules
ex = file.path(system.file(package = "flowr"), "pipelines")
flowmat = as.flowmat(file.path(ex, "sleep_pipe.tsv"))
flowdef = as.flowdef(file.path(ex, "sleep_pipe.def"))
fobj = to_flow(x = flowmat, def = flowdef, flowname = "sleep_pipe", platform = "lsf")
## create a vector of shell commands
cmds = c("sleep 1", "sleep 2")
## create a named list
lst = list("sleep" = cmds)
## create a flowmat
flowmat = to_flowmat(lst, samplename = "samp")
## Use flowmat to create a skeleton flowdef
flowdef = to_flowdef(flowmat)
## use both (flowmat and flowdef) to create a flow
fobj = to_flow(flowmat, flowdef)
## submit the flow to the cluster (execute=TRUE) or do a dry-run (execute=FALSE)
fobj2 = submit_flow(fobj, execute=FALSE)
fobj3 = submit_flow(fobj, execute=TRUE)
## Get the status or kill all the jobs
status(fobj3)
kill(fobj3)
to_flowmat, to_flowdef, to_flowdet, flowopts and submit_flow
Create a flowmat (data.frame) using a named list a commands. as.flowmat(): reads a file and checks for required columns. If x is data.frame checks for required columns.
to_flowmat(x, ...)
## method for class 'list'
to_flowmat(x, samplename, ...)
## method for class 'data.frame'
to_flowmat(x, ...)
## method for class 'flow'
to_flowmat(x, ...)
as.flowmat(x, grp_col, jobname_col, cmd_col, ...)
is.flowmat(x)
# Use this link for a few examples:
# http://docs.flowr.space/tutorial.html#define_modules
# create a flow mat, starting with a list of commands.
cmd_sleep = c("sleep 1", "sleep 2")
cmd_echo = c("echo 'hello'", "echo 'hello'")
# create a named list
lst = list("sleep" = cmd_sleep, "echo" = cmd_echo)
flowmat = to_flowmat(lst, samplename = "samp")
# read in a tsv; check and confirm format
ex = file.path(system.file(package = "flowr"), "pipelines")
flowmat = as.flowmat(file.path(ex, "sleep_pipe.tsv"))
# if your column names are different than defaults, explicitly specify them.
flowmat = as.flowmat(file.path(ex, "sleep_pipe.tsv"), jobname_col = "jobname")
# check if a object is a flowmat
is.flowmat(flowmat)
# create a flowdef, from this flowmat
flowdef = to_flowdef(flowmat)
# create a flow object using flowmat and flowdef
fobj = to_flow(flowmat, flowdef)
# extract a flowmat from a flow (here the samplename also contains the name of the flow)
flowmat2 = to_flowmat(fobj)
## submit the flow to the cluster (execute=TRUE) or do a dry-run (execute=FALSE)
fobj2 = submit_flow(fobj, execute=FALSE)
fobj3 = submit_flow(fobj, execute=TRUE)
## Get the status or kill all the jobs
status(fobj3)
kill(fobj3)
This function enables creation of a skeleton flow definition with several default values, using a flowmat. To customize the flowdef, one may supply parameters such as sub_type and dep_type upfront. As such, these params must be of the same length as number of unique jobs using in the flowmat. Each row in this table refers to one step of the pipeline. It describes the resources used by the step and also its relationship with other steps, especially, the step immediately prior to it.
Submission types: This refers to the sub_type column in flow definition.
Consider an example with three steps A, B and C. A has 10 commands from A1 to A10, similarly B has 10 commands B1 through B10 and C has a single command, C1. Consider another step D (with D1-D3), which comes after C. step (number of sub-processes) A (10) —-> B (10) —–> C (1) —–> D (3)
scatter
: submit all commands as parallel, independent jobs. Submit A1 through A10 as independent jobs
serial
: run these commands sequentially one after the other. - Wrap A1 through A10, into a single job.
none
: independent job.
serial
: one to one relationship with previous job.
gather
: many to one, wait for all commands in the previous job to finish then start the current step.
burst
: one to many wait for the previous step which has one job and start processing all cmds in the current step. - D1 to D3 are started as soon as C1 finishes.
to_flowdef(x, ...)
## method for class 'flowmat'
to_flowdef(x, sub_type, dep_type, prev_jobs, queue = "short", platform = "torque", memory_reserved = "2000", cpu_reserved = "1", nodes = "1", walltime = "1:00", guess = FALSE, verbose = opts_flow$get("verbose"), ...)
## method for class 'flow'
to_flowdef(x, ...)
## method for class 'character'
to_flowdef(x, ...)
as.flowdef(x, ...)
is.flowdef(x)
opts_flow$get(“verbose”)
This is a tab separated file, with a minimum of 4 columns:
required columns:
jobname
: Name of the step
sub_type
: Short for submission type, refers to, how should multiple commands of this step be submitted. Possible values are serial
or scatter
.
prev_jobs
: Short for previous job, this would be the jobname of the previous job. This can be NA/./none if this is a independent/initial step, and no previous step is required for this to start. Additionally, one may use comma(s) to define multiple previous jobs (A,B).
dep_type
: Short for dependency type, refers to the relationship of this job with the one defined in prev_jobs
. This can take values none
, gather
, serial
or burst
.
resource columns (recommended):
Additionally, one may customize resource requirements used by each step. The format used varies and depends to the computing platform. Thus its best to refer to your institutions guide to specify these.
cpu_reserved
integer, specifying number of cores to reserve [1]
memory_reserved
Usually in KB [2000]
nodes
number of server nodes to reserve, most tools can only use multiple cores on a single node [1]
walltime
maximum time allowed for a step, usually in a HH:MM or HH:MM:SS format. [1:00]
queue
the queue to use for job submission [short]
NOTE: Guessing is an experimental feature, please check the definition carefully. it is provided to help but not replace your best judgement.
Optionally, one may provide the previous jobs and flowr can try guessing the appropriate submission and dependency types. If there are multiple commands, default is submitting them as scatter, else as serial. Further, if previous job has multiple commands and current job has single; its assumed that all of the previous need to complete, suggesting a gather type dependency.
# see ?to_flow for more examples
# read in a tsv; check and confirm format
ex = file.path(system.file(package = "flowr"), "pipelines")
# read in a flowdef from file
flowdef = as.flowdef(file.path(ex, "sleep_pipe.def"))
# check if this a flowdef
is.flowdef(flowdef)
# use a flowmat, to create a sample flowdef
flowmat = as.flowmat(file.path(ex, "sleep_pipe.tsv"))
to_flowdef(flowmat)
# change the platform
to_flowdef(flowmat, platform = "lsf")
# change the queue name
def = to_flowdef(flowmat,
platform = "lsf",
queue = "long")
plot_flow(def)
# guess submission and dependency types
def2 = to_flowdef(flowmat,
platform = "lsf",
queue = "long",
guess = TRUE)
plot_flow(def2)
Check consistency of flowdef and flowmat, using various rules.
check(x, ...)
## method for class 'flowmat'
check(x, ...)
## method for class 'flowdef'
check(x, verbose = opts_flow$get("verbose"), ...)
check.flowdef
OR check.flowmat
functions
opts_flow$get(“verbose”)
A typical output from flowdef with verbose level: 2
checking if required columns are present... checking if resources columns are present... checking if dependency column has valid names... checking if submission column has valid names... checking for missing rows in def... checking for extra rows in def... checking submission and dependency types... jobname prev.sub_type --> dep_type --> sub_type: relationship 1: aln1_a none --> none --> scatter 2: aln2_a scatter --> none --> scatter 3: sampe_a scatter --> serial --> scatter rel: complex one:one 4: fixrg_a scatter --> serial --> scatter rel: complex one:one 5: merge_a scatter --> gather --> serial rel: many:one 6: markdup_a serial --> serial --> serial rel: simple one:one 7: target_a serial --> serial --> serial rel: simple one:one 8: realign_a serial --> burst --> scatter rel: one:many 9: baserecalib_a scatter --> serial --> scatter rel: complex one:one 10: printreads_a scatter --> serial --> scatter rel: complex one:one
Submit a flow to the cluster or perform a dry-run to check and debug issues.
submit_flow(x, verbose = opts_flow$get("verbose"), ...)
## method for class 'list'
submit_flow(x, verbose = opts_flow$get("verbose"), ...)
## method for class 'flow'
submit_flow(x, verbose = opts_flow$get("verbose"), execute = FALSE, uuid, plot = TRUE, dump = TRUE, .start_jid = 1, ...)
object
of class flow
.
logical
whether or not to submit the jobs
character
Advanced use. This is the final path used for flow execution. Especially useful in case of re-running a flow.
logical
whether to make a pdf flow plot (saves it in the flow working directory).
NOTE: Even if you want to kill the flow, its best to let submit_flow do its job, when done simply use kill(flow_wd)
. If submit_flow is interrupted, files like flow_details.rds etc are not created, thus flowr looses the association of jobs with flow instance and cannot monitor, kill or re-run the flow.
submit_flow(fobj = fobj, ... = ...)
Plot a flowchart using a flow object or flowdef
plot_flow(x, ...)
## method for class 'flow'
plot_flow(x, ...)
## method for class 'list'
plot_flow(x, ...)
## method for class 'character'
plot_flow(x, ...)
## method for class 'flowdef'
plot_flow(x, detailed = TRUE, type = c("1", "2"), pdf = FALSE, pdffile, ...)
## method for class 'flowdef'
plot(x, detailed = TRUE, type = c("1", "2"), pdf = FALSE, pdffile, ...)
## method for class 'flow'
plot(x, ...)
flow
, or a list of flow objects or a flowdef
flow_path/flow_details.pdf
]
qobj = queue(type="lsf")
cmds = rep("sleep 5", 10)
jobj1 <- job(q_obj=qobj, cmd = cmds, submission_type = "scatter", name = "job1")
jobj2 <- job(q_obj=qobj, name = "job2", cmd = cmds, submission_type = "scatter",
dependency_type = "serial", previous_job = "job1")
fobj <- flow(jobs = list(jobj1, jobj2))
plot_flow(fobj)
### Gather: many to one relationship
jobj1 <- job(q_obj=qobj, cmd = cmds, submission_type = "scatter", name = "job1")
jobj2 <- job(q_obj=qobj, name = "job2", cmd = cmds, submission_type = "scatter",
dependency_type = "gather", previous_job = "job1")
fobj <- flow(jobs = list(jobj1, jobj2))
plot_flow(fobj)
### Burst: one to many relationship
jobj1 <- job(q_obj=qobj, cmd = cmds, submission_type = "serial", name = "job1")
jobj2 <- job(q_obj=qobj, name = "job2", cmd = cmds, submission_type = "scatter",
dependency_type = "burst", previous_job = "job1")
fobj <- flow(jobs = list(jobj1, jobj2))
plot_flow(fobj)
Summarize status of a flow OR multiple flows OR a high-level summary of all flows in a folder.
status(x, use_cache = FALSE, verbose = opts_flow$get("verbose"), out_format = "markdown", ...)
get_status(x, ...)
## method for class 'flow'
get_status(x, verbose, use_cache, out_format, ...)
## method for class 'character'
get_status(x, verbose, use_cache, out_format, ...)
## method for class 'data.frame'
get_status(x, verbose, use_cache, progress = TRUE, ...)
opts_flow$get(“verbose”)
basename(x) is used in a wild card search.
flowr status x=~/flowr/runs/sleep_pipe*
flowr status x=~/flowr/runs
Use use_cache=TRUE to speed up checking the status. This assumes that no new jobs have been submitted and skips (re-)checking status of completed jobs.
Once all the jobs have been submitted to the cluster you may always use use_cache=TRUE
.
status(x = "~/flowr/runs/sleep_pipe*")
## an example for running from terminal
flowr status x=path_to_flow_directory
NOTE: This requires files which are created at the end of the submit_flow command. Even if you want to kill the flow, its best to let submit_flow do its job, when done simply use kill(flow_wd)
. If submit_flow is interrupted, files like flow_details.rds etc are not created, thus flowr looses the association of jobs with flow instance and cannot monitor, kill or re-run the flow.
kill(x, ...)
## method for class 'character'
kill(x, force = FALSE, ...)
## method for class 'flow'
kill(x, kill_cmd, verbose = opts_flow$get("verbose"), jobid_col = "job_sub_id", ...)
opts_flow$get(“verbose”)
## example for terminal
## flowr kill_flow x=path_to_flow_directory
## In case path matches multiple folders, flowr asks before killing
kill(x='fastq_haplotyper*')
Flowr: streamlining workflows
found multiple wds:
/fastq_haplotyper-MS132-20150825-16-24-04-0Lv1PbpI
/fastq_haplotyper-MS132-20150825-17-47-52-5vFIkrMD
Really kill all of them ? kill again with force=TRUE
## submitting again with force=TRUE will kill them:
kill(x='fastq_haplotyper*', force = TRUE)
rerun(x=flow_wd)
rerun(x=flow_wd, mat = new_flowmat, def = new_flowdef)
rerun(x, ...)
## method for class 'character'
rerun(x, ...)
## method for class 'flow'
rerun(x, mat, def, start_from, samplename, execute = TRUE, kill = TRUE, select, ignore, verbose = opts_flow$get("verbose"), ...)
opts_flow$get(“verbose”)
This function fetches details regarding the previous execution from the flow working directory (flow_wd).
It reads the flow object from the flow_details.rds file, and extracts flowdef and flowmat from it using to_flowmat and to_flowdef functions.
Using new flowmat OR flowdef for re-run:
Optionally, if either of flowmat or flowdef are supplied; supplied ones are used instead of those extracted from previous submission.
This functions efficiently updates job details of the latest submission into the previous file; thus information regarding previous job ids and their status is not lost.
#
rerun(wd = wd, fobj = fobj, execute = TRUE, kill = TRUE)
There are three helper functions which attempt to manage parameters used by flowr and ngsflows:
opts_flow$get()
: show all default options
opts_flow$set()
: set default options
opts_flow$load()
: load options specified in a tab seperated text file
flowopts
get_opts(...)
set_opts(...)
load_opts(...)
opts_flow
By default flowr loads, ~/flowr/conf/flowr.conf
and ~/flowr/conf/ngsflows.conf
Below is a list of default flowr options, retrieved via
opts_flow$get()
:
|name |value | |:-----------------|:------------------------| |default_regex |(.*) | |flow_base_path |~/flowr | |flow_conf_path |~/flowr/conf | |flow_parse_lsf |.*(\<[0-9]*\>).* | |flow_parse_moab |(.*) | |flow_parse_sge |(.*) | |flow_parse_slurm |(.*) | |flow_parse_torque |(.?)\..* | |flow_pipe_paths |~/flowr/pipelines | |flow_pipe_urls |~/flowr/pipelines | |flow_platform |local | |flow_run_path |~/flowr/runs | |my_conf_path |~/flowr/conf | |my_dir |path/to/a/folder | |my_path |~/flowr | |my_tool_exe |/usr/bin/ls | |time_format |%a %b %e %H:%M:%S CDT %Y | |verbose |FALSE |
## Set options: opts_flow$set()
opts = opts_flow$set(flow_run_path = "~/mypath")
## OR if you would like to supply a long list of options:
opts = opts_flow$set(.dots = list(flow_run_path = "~/mypath"))
## load options from a configuration file: opts_flow$load()
conffile = fetch_conf("flowr.conf")
opts_flow$load(conffile)
## Fetch options: get_opts()
opts_flow$get("flow_run_path")
opts_flow$get()
This functions creates a directory structure in user’s home directory. Additionally it creates a shortcut to the flowr
helper script in ~/bin
.
setup(bin = "~/bin", flow_base_path = opts_flow$get("flow_base_path"), flow_run_path = opts_flow$get("flow_run_path"), flow_conf_path = opts_flow$get("flow_conf_path"), flow_pipe_path = opts_flow$get("flow_pipe_paths"))
opts_flow$get(“flow_run_path”)
.
Will add more to this, to identify cluster and aid in other things.
There are several levels of verboseness one can choose from. levels:
opts_flow$set(verbose=2)
, which will be used across flowr and ngsflows packages. Additionally one may set this value in the configurations files: ~/flowr/conf/flowr.conf
OR ~/flowr/conf/ngsflows.conf
.
verbose
An object of class NULL
of length 0.
fl = system.file("pipelines/abcd.def", package = "flowr")
def = as.flowdef(fl, verbose = 0)
# def seems to be a file, reading it...
def = as.flowdef(fl, verbose = 1)
# def seems to be a file, reading it...
# checking if required columns are present...
# checking if resources columns are present...
# checking if dependency column has valid names...
# checking if submission column has valid names...
# checking for missing rows in def...
# checking for extra rows in def...
# checking submission and dependency types...
def = as.flowdef(fl, verbose = 2)
# def seems to be a file, reading it...
# checking if required columns are present...
# checking if resources columns are present...
# checking if dependency column has valid names...
# checking if submission column has valid names...
# checking for missing rows in def...
# checking for extra rows in def...
# checking submission and dependency types...
# jobname prev.sub_type --> dep_type --> sub_type: relationship
# 1: A none --> none --> scatter
# 2: B scatter --> serial --> scatter rel: complex one:one
# 3: C scatter --> gather --> serial rel: many:one
# 4: D serial --> burst --> scatter rel: one:many
These functions help in searching for specific files in the user’s space. fetch_pipes()
: Fetches pipelines in the following places, in this specific order:
~/flowr/pipelines
./
fetch_conf()
: Fetches configuration files in ALL of the following places:
conf
folders in flowr and ngsflows packages.
~/flowr/conf
folder.
./
flowr.conf
and ngsflows.conf
. See the details sections, for more explanation on this.
fetch(x, places, urls, verbose = opts_flow$get("verbose"))
fetch_pipes(x, places, last_only = FALSE, urls = opts_flow$get("flowr_pipe_urls"), silent = FALSE, verbose = opts_flow$get("verbose"), ask = TRUE)
fetch_conf(x = "flowr.conf", places, ...)
.R
and .conf
respectively.
opts_flow$get(“verbose”)
For example flowr has a variable flow_run_path
where it puts all the execution logs etc. The default value is picked up from the internal flowr.conf file. To redefine this value, one could create a new file called ~/flowr/conf/flowr.conf and add a line:
flow_run_path TAB my_awesome_path
, where TAB
is a tab character, since these are tab seperated files.
Also, at any time you can run, opts_flow$load; to load custom options.
## let us find a default conf file
conf = fetch_conf("flowr.conf");conf
## load this
opts_flow$load(conf)
## this returns a list, which prints pretty
pip = fetch_pipes("sleep_pipe")
pip$name
pip$pipe
pip$def
Run complete pipelines, by wrapping several steps into one convinient function. NOTE: please use flowr version 0.9.8.9010 or higher. In summary, this function performs the following steps:
x
defines the name of the pipeline. Say, for example sleep_pipe
.
sleep_pipe.R
, sleep_pipe.def
and sleep_pipe.conf
files)
sleep_pipe(…)
: Create all the required commands (flowmat
)
flowmat
and sleep_pipe.def
to create a flow object.
run(x, platform, def, conf, wd = opts_flow$get("flow_run_path"), flow_run_path = wd, rerun_wd, start_from, execute = FALSE, ...)
run_pipe(x, platform, def, conf, wd = opts_flow$get("flow_run_path"), flow_run_path = wd, rerun_wd, start_from, execute = FALSE, ...)
## Run a short pipeline (dry run)
run("sleep_pipe")
## Run a short pipeline on the local machine
run("sleep_pipe", platform = "local", execute = TRUE)
## Run a short pipeline on the a torque cluster (qsub)
run("sleep_pipe", platform = "torque", execute = TRUE)
## Run a short pipeline on the a MOAB cluster (msub)
run("sleep_pipe", platform = "moab", execute = TRUE)
## Run a short pipeline on the a IBM (LSF) cluster (bsub)
run("sleep_pipe", platform = "lsf", execute = TRUE)
## Run a short pipeline on the a MOAB cluster (msub)
run("sleep_pipe", platform = "moab", execute = TRUE)
## change parameters of the pipeline
## All extra parameters are passed on to the function function.
run("sleep_pipe", platform = "lsf", execute = TRUE, x = 5)
queue
object defines details regarding how a job is submitted
Internal function (used by to_flow), to define the format used to submit a job.
queue(object, platform = c("local", "lsf", "torque", "sge", "moab"), format = "", queue = "long", walltime, memory, cpu = 1, extra_opts = "", submit_exe, nodes = "1", jobname = "name", email = Sys.getenv("USER"), dependency = list(), server = "localhost", verbose = FALSE, cwd = "", stderr = "", stdout = "", ...)
Resources: Can be defined once using a queue object and recylced to all the jobs in a flow. If resources (like memory, cpu, walltime, queue) are supplied at the job level they overwrite the one supplied in queue Nodes: can be supplied ot extend a job across multiple nodes. This is purely experimental and not supported.
Server: This a hook which may be implemented in future.
Submission script The platform variable defines the format, and submit_exe; however these two are avaible for someone to create a custom submission command.
qobj <- queue(platform='lsf')
Internal function (used by to_flow), which aids in creating a job object.
job(cmds = "", name = "myjob", q_obj = new("queue"), previous_job = "", cpu = 1, memory, walltime, submission_type = c("scatter", "serial"), dependency_type = c("none", "gather", "serial", "burst"), ...)
qobj <- queue(platform="torque")
## torque job with 1 CPU running command 'sleep 2'
jobj <- job(q_obj=qobj, cmd = "sleep 2", cpu=1)
## multiple commands
cmds = rep("sleep 5", 10)
## run the 10 commands in parallel
jobj1 <- job(q_obj=qobj, cmd = cmds, submission_type = "scatter", name = "job1")
## run the 10 commands sequentially, but WAIT for the previous job to complete
jobj2 <- job(q_obj=qobj, cmd = cmds, submission_type = "serial",
dependency_type = "gather", previous_job = "job1")
fobj <- flow(jobs = list(jobj1, jobj2))
## plot the flow
plot_flow(fobj)
## dry run, only create the structure without submitting jobs
submit_flow(fobj)
## execute the jobs: ONLY works on computing cluster, would fail otherwise
submit_flow(fobj, execute = TRUE)
Internal function (used by to_flow), which aids in creating a flow object.
flow(jobs = list(new("job")), name = "newflow", desc = "my_super_flow", mode = c("scheduler", "trigger", "R"), flow_run_path = opts_flow$get("flow_run_path"), trigger_path = "", flow_path = "", version = "0.0", status = "created", module_cmds = opts_flow$get("module_cmds"), execute = "")
list
: A list of jobs to be included in this flow
character
: Name of the flow. [newflow]
character
Description of the flow, used to uniquely identify a flow instance. [my_super_flow]
character
Mode of submission of the flow (depreciated). [scheduler]
character
[~/flows/trigger
].
character
: A unique path identifying a flow instance, populated by submit_flow.
character
: Status of the flow.
module load mytool1;module load mytool2
cmds = rep("sleep 5", 10)
qobj <- queue(platform='torque')
## run the 10 commands in parallel
jobj1 <- job(q_obj=qobj, cmd = cmds, submission_type = "scatter", name = "job1")
## run the 10 commands sequentially, but WAIT for the previous job to complete
## Many-To-One
jobj2 <- job(q_obj=qobj, cmd = cmds, submission_type = "serial",
dependency_type = "gather", previous_job = "job1", name = "job2")
## As soon as first job on 'job1' is complete
## One-To-One
jobj3 <- job(q_obj=qobj, cmd = cmds, submission_type = "scatter",
dependency_type = "serial", previous_job = "job1", name = "job3")
fobj <- flow(jobs = list(jobj1, jobj2, jobj3))
## plot the flow
plot_flow(fobj)
## dry run, only create the structure without submitting jobs
submit_flow(fobj)
## execute the jobs: ONLY works on computing cluster, would fail otherwise
submit_flow(fobj, execute = TRUE)
get_unique_id(prefix = "id", suffix = "", random_length = 8)
id
. Character string to be added in the front.
get_unique_id(base = id, random_length = 8)
Create a file describing details regarding jobs ids, submission scripts etc.
to_flowdet(x, ...)
## method for class 'rootdir'
to_flowdet(x, ...)
## method for class 'character'
to_flowdet(x, ...)
## method for class 'flow'
to_flowdet(x, ...)
The path provided should contain a flow_detaits.rds file (which is used to extract all the information).
Incase a parent folder with multiple flows is provided information regarding jobids is omitted.
if x is char. assumed a path, check if flow object exists in it and read it. If there is no flow object, try using a simpler function
Checks all the arguments in the parent function and makes sure that none of them are NULL
check_args(ignore, select)
This function has now been moved to params package.
get_wds(x)