In this chapter we will go more in-depth into workflow semantics and will address more complex workflow examples. We will also show the syntax and semantic differences between various scripting languages like groovy, jruby.
1 Variables
Job variables are workflow variables which can either be defined statically at
the workflow-level or dynamically inside a task.
They are similar to a dictionary (HashMap).
When they are defined statically, they are accessible in any task.
When they are defined dynamically, they are only accessible in children tasks.
Statically-defined job variables can as well be modified dynamically inside a task.
In that case, the modification will be visible in children tasks only.
Some variables such as PA_JOB_ID, PA_TASK_ID, PA_USER, are automatically set by the
system.
The following diagram illustrate this behavior:
When used inside tasks, Job variables are of type String.
It is possible to control the syntax of a variable, through the Model
attribute.
When the Model attribute is used, the variable will still be of type String, but the
variable definition will be controlled.
This can make sure that a user does not enter a wrong value which can fail the
workflow.
For example, create a new workflow with a groovy task and define the following job
variable (unselect the task before)
Task Variables are similar to Job Variables, with the following differences:
Another way to transfer information between tasks is by using the task result
variables.
Inside each task, it is possible to set a result by doing an affectation to a
variable called result.
The direct child task will be able to access this result by another variable called
results (with “s” since it can have multiple parents).
The exact type of the results variable is language-dependant, but always an
aggregate type such as array or list,
as it aggregate results from several parent tasks.
Let’s illustrate this by an example.
Create a new job and call it Job results.
Create two Groovy Tasks, position them on the same line, and replace the scripts by
the following:
2 Resource selection
Resource selection allows to choose specific ProActive nodes to execute tasks. It is
useful when the resource manager controls different machine groups,
with different libraries installed, or even different operating systems. It can be
especially useful, when heterogeneous machines are connected to the
scheduler. Selection is done by writing selection scripts able to determine
if a task
can be executed on the ProActive Node or not.
Let’s show by an example how we can select a specific machine for execution.
Create a new job in the studio named Selection job with a single groovy
task
Open the Node Selection panel and click on Add
This will open the following dialog:
3 Data management
When we create a file in a task, the file will be located in the working directory
of
the task. This directory is called in the ProActive terminology the Local
Space. This directory is volatile and will be deleted after the task is
finished, so it’s
mandatory to transfer any output file produced.
To illustrate this, let’s create a new job called LocalSpace job with a
single
Windows cmd task.
Replace the script content with the following and execute the job:
4 Control structures
As we already saw previously with the replicate example, Control structures allow to
build dynamic workflows with control flow decisions.
There are three kinds of control structures:
5 Multi-Node Tasks
We already saw briefly that a ProActive Task can reserve more than one ProActive
Node for its execution. The reason behind this feature is that not all tasks simply
execute a basic script, but often tasks will call an external program. That
program can be multi-threaded by using multiple cores on
the machine. In that scenario, it’s important to precisely match the number of
ProActive Nodes used by our task with the machine resources actually used by the program.
Otherwise, the scheduler could dispatch on the same machine more tasks that the machine
resources can handle.
Open again the Multi node job, click on the task, and open the Multi-Node
Execution panel. Open the Topology list, there are many choices in it,
but we are going to focus on the two most useful ones:
6 On-Error policies
Through this course, we run many failing jobs, and each time we observed that the
scheduler tries to execute a failing task several times, then continue the job
execution with other tasks. This is the standard behavior for failing tasks, but
each workflow can define its
own failover policy.
Let’s open again, the Result job which was creating an error.
Click on the desktop outside the task to open the job parameter panel.
Click on the panel Error Handling.
Here you can see the Maximum Number of Executions Attempts (2 by default),
and
two other settings:
7 Fork Environment
When a ProActive Task is executed on a ProActive Node, a dedicated Java Virtual
Machine is started to execute the Task.
The forked JVM parameters are automatically configured by the ProActive Node, but
sometimes it may be necessary to provide additional configurations to the JVM. This
configuration can be performed thanks to a Fork Environment or Fork
Environment Script.
Let’s demonstrate this by an example. Create a bash Task containing the pwd command
to
display the current directory.
Execute this task, you should see in the output something like:
8 Containers
ProActive natively supports containers (Docker, Kubernetes,..). As a first approach,
let’s run a simple bash command from a basic Linux
container.
From the Studio, drag’n drop a Dockerfile task
(Tasks->Containers->Dockerfile).
Open the Task Implementation view to see the script: It simply prints “Hello”
from a freshly running Ubuntu container.
Execute the workflow. In case the docker image is not present locally (here Ubuntu
18.04), you will see image-pull-related logs in the job output.
9 Generic Information
Generic Information key features:
10 Third-Party Credentials
Third-Party credentials key features: