scriptcwl.workflow module

class scriptcwl.workflow.WorkflowGenerator(steps_dir=None, working_dir=None)

Bases: object

Class for creating a CWL workflow.

The WorkflowGenerator class allows users to tie together inputs and outputs of the steps that need to be executed to perform a data processing task. The steps (i.e., command line tools and subworkflows) must be added to the steps library of the WorkflowGenerator object before they can be added to the workflow. To add steps to the steps library, the load method can be called with either a path to a directory containing CWL files:

from scriptcwl import WorkflowGenerator

with WorkflowGenerator() as wf:
    wf.load(steps_dir='/path/to/dir/with/cwl/steps/')

Or a single CWL file:

with WorkflowGenerator() as wf:
    wf.load(step_file='/path/to/cwl/step/file')

wf.load() can be called multiple times. Step files are added to the steps library one after the other. For every step that is added to the steps library, a method with the same name is added to the WorkflowGenerator object. To add a step to the workflow, this method must be called (examples below).

Next, the user should add one or more workflow inputs:

txt_dir = wf.add_input(txt_dir='Directory')

The add_input() method expects a name=type pair as input parameter. The pair connects an input name (txt_dir in the example) to a CWL type ('Directory'). Optionally, a default value can be specified using default=value.

The add_input() method returns a string containing the name that can be used to connect this input parameter to step input parameter names.

Next, workflow steps can be added. To add a workflow step, its method must be called on the WorkflowGenerator object. This method expects a list of (key, value) pairs as input parameters. (To find out what inputs a step needs call wf.inputs(<step name>). This method prints all the inputs and their types.) The method returns a list of strings containing output names that can be used as input for later steps, or that can be connected to workflow outputs.

For example, to add a step called frog-dir to the workflow, the following method must be called:

frogout = wf.frog_dir(dir_in=txt_dir)

In a next step, frogout can be used as input:

saf = wf.frog_to_saf(in_files=frogout)
txt = wf.saf_to_txt(in_files=saf)

Etcetera.

When all steps of the workflow have been added, the user can specify workflow outputs:

wf.add_outputs(txt=txt)

Finally, the workflow can be saved to file:

wf.save('workflow.cwl')

To list steps and signatures available in the steps library, call:

wf.list_steps()
add_input(**kwargs)

Add workflow input.

Args:
kwargs (dict): A dict with a name: type item
and optionally a default: value item, where name is the name (id) of the workflow input (e.g., dir_in) and type is the type of the input (e.g., ‘Directory’). The type of input parameter can be learned from step.inputs(step_name=input_name).
Returns:
inputname
Raises:
ValueError: No or multiple parameter(s) have been specified.
add_inputs(**kwargs)

Deprecated function, use add_input(self, **kwargs) instead. Add workflow input.

Args:
kwargs (dict): A dict with a name: type item
and optionally a default: value item, where name is the name (id) of the workflow input (e.g., dir_in) and type is the type of the input (e.g., ‘Directory’). The type of input parameter can be learned from step.inputs(step_name=input_name).
Returns:
inputname
Raises:
ValueError: No or multiple parameter(s) have been specified.
add_outputs(**kwargs)

Add workflow outputs.

The output type is added automatically, based on the steps in the steps library.

Args:
kwargs (dict): A dict containing name=source name pairs.
name is the name of the workflow output (e.g., txt_files) and source name is the name of the step that produced this output plus the output name (e.g., saf-to-txt/out_files).
get_working_dir()
inputs(name)

List input names and types of a step in the steps library.

Args:
name (str): name of a step in the steps library.
list_steps()

Return string with the signature of all steps in the steps library.

load(steps_dir=None, step_file=None, step_list=None)

Load CWL steps into the WorkflowGenerator’s steps library.

Adds steps (command line tools and workflows) to the WorkflowGenerator’s steps library. These steps can be used to create workflows.

Args:
steps_dir (str): path to directory containing CWL files. All CWL in
the directory are loaded.
step_file (str): path to a file containing a CWL step that will be
added to the steps library.
save(fname, validate=True, wd=False, inline=False, relative=False, pack=False, encoding='utf-8')

Save the workflow to file.

Save the workflow to a CWL file that can be run with a CWL runner.

Args:
fname (str): file to save the workflow to. encoding (str): file encoding to use (default: utf-8).
set_documentation(doc)

Set workflow documentation.

Args:
doc (str): documentation string.
set_label(label)

Set workflow label.

Args:
label (str): short description of workflow.
to_obj(wd=False, inline=False, pack=False, relpath=None)

Return the created workflow as a dict.

The dict can be written to a yaml file.

Returns:
A yaml-compatible dict representing the workflow.
to_script(wf_name='wf')

Generated and print the scriptcwl script for the currunt workflow.

Args:
wf_name (str): string used for the WorkflowGenerator object in the
generated script (default: wf).
validate(inline=False)

Validate workflow object.

This method currently validates the workflow object with the use of cwltool. It writes the workflow to a tmp CWL file, reads it, validates it and removes the tmp file again. By default, the workflow is written to file using absolute paths to the steps. Optionally, the steps can be saved inline.