As shown in the Quickstart tutorial, a job is a workflow of tasks to be executed. Workflows provide the ability to replicate tasks and transfer data between tasks.

In this tutorial, we will first upload an image to the user space then use the Studio to create a workflow with task replication that will process the image by applying an edge detection algorithm in parallel.

1 Upload the Image File to User Space


Using the Pydio Web Interface upload this image file to the user space:

  1. Login here. Use the login and password you received by e-mail when you first signed up.
  2. Navigate to the ProActive User Space, using the Workspaces menu at the top left.
  3. Click on the Upload button at the top right
  4. In the Upload box, click Select files on your computer. Choose the image file.

Your file will be stored in the user space:


You can also use ProActive Scheduler REST Interface to upload a file to the user space in the server. Below is an example of a commandset to upload file to the user space in the server by using the REST Interface with cURL. Use the login and password you received by e-mail, when you first signed up.

# first we login and retrieve a session id
$ sessionid=$(curl -d "username=LOGIN&password=PASSWORD" https://try.activeeon.com/rest/scheduler/login)

# then we push the image file into the USERSPACE
$ curl -H "sessionid:$sessionid" -F "fileName=neptune_triton_01_3000x3000.jpg" -F "fileContent=@neptune_triton_01_3000x3000.jpg;type=image/jpg" https://try.activeeon.com/rest/scheduler/dataspace/USERSPACE/

2 Create the Image Processing Workflow


We want to apply a Canny Edge Detector algorithm to the neptune_triton_01_3000x3000.jpg that is is too large to be processed on a single machine. So we will cut it into an equal number of parts and process each part separately on a different node in parallel.

For that we will use groovy script tasks and task replication mechanism. A first task will split the image, the following task will be replicated for each part of the image and produce a processed part, finally the last task will merge all processed parts into a final image.

Follow these steps to create the workflow:

  1. Open the ProActive Studio and Login using the user and password you received by e-mail when you first signed up.

  2. Fill in the name of the job in the left panel, call it image-processing.

  3. Define the following job variables in the Job Variables section of the left panel by clicking on Add button of Local Variables :

    • inputFilename with neptune_triton_01_3000x3000.jpg as value, the name of the image to split
    • outputFilename with processed.jpg as value, the name of the final processed image
    • nbParts with 4 as value, the number of parts
    Note that the variables are accessible from scripts using the variables map.

  4. Create a new replicate block by dragging and dropping the Replicate block from the Controls into the workspace, it will create a workflow of 3 tasks.

  5. Click on the Task1 and rename it into split-image by filling the Task Name field in the left panel.

  6. To make the image file accessible to the task go into the Data Management section in the left panel, click on the Add button of Input Files then specify the neptune_triton_01_3000x3000.jpg as Includes and set the Access Mode to transferFromUserSpace.

  7. Then in the Execution section in the left panel, select groovy as Script Engine and click on to open the Script Editor and paste the content of split-image.groovy.

    Note that after the image file is transferred from the user space into the task local space, that is referred by the built-in variable localspace. So to load the image from the local space the script contains the following code:

    // Load the image from local space
    File localspaceDir = new File(localspace)
    File imgFile = new File(localspaceDir, imgFilename)
    BufferedImage img = ImageIO.read(imgFile)

    Also note that the result variable is a java.util.ArrayList containing split parts to transmit to the next tasks.

  8. Rename the Task2 into process-part, this task will be executed only after the split-image task is finished.

  9. For this task we will use a java implementation of the Canny Edge Detector written by Tom Gibara. To make the class definition available to our script task, firstly the jar have to be available on the machine that hosts the node and then we need to add a JAR to the task's classpath: in the Fork Environment section, put /home/cperMaster/opt/tutorials/tutorials/canny-edge-detector.jar in additionnal classpath.

  10. As for the previous task, select groovy as Script Engine and paste the contents of process-part.groovy.

    Note that the replication index is provided as a system property, it is used to get the part of the image to process:

    int partIndex = variables.get("PA_TASK_REPLICATION")

    The results variable is an array of org.ow2.proactive.scheduler.common.task.TaskResult that contains the result of the previous task split-image, since there is only one parent task the array contains a single element the ArrayList of splitted parts, the index of the image part to process is given by the partIndex variable.

    The following code is used to process the image part:

    CannyEdgeDetector detector = new com.CannyEdgeDetector()
    detector.setLowThreshold(0.5)
    detector.setHighThreshold(1)
    detector.setSourceImage(partImage)
    detector.process()
  11. To specify how many times the process-part task will be replicated, click on the and paste the following code into the script section in the left panel:

    runs=variables.get("nbParts")
  12. Rename the Task3 into merge-parts, this task will merge the processed parts into a final image once all replicated parent tasks are finished.

  13. Paste the contents of the merge-parts.groovy

    Note that this time the size of the results array will be equal to 4; the number of replicated tasks. The processed parts are merged into the final image using the same way the image was split in the split-image task.

  14. The merge-part task will produce the final processed.jpg image in it's local space so the image needs to be transferred back into the user space. To do so, go into the Data Management section in the left panel, click on the Add button of Output Files then specify the processed.jpg as Includes and set the Access Mode to transferToUserSpace.

  15. Once your workflow is ready and you are logged in, click on to submit your workflow as a job to the Scheduler.

  16. Login to the Scheduler portal using the login and password you received by e-mail when you first signed up. Your job should appear in the job list panel

  17. Depending on the available nodes, you can try to execute the workflow by setting the nbParts variable from 4 to 9, 16, 25, 36 it will split the image into smaller parts, produce more tasks and reduce the tasks computation time.

When done with this tutorial, you can check out the User Guide for more details.

3 Download the resulting Image File from the User Space


Once the job is finished, check your user space with the Pydio, it should contain the processed image named processed.png.


You can also use ProActive Scheduler REST Interface to download the resulting file from the user space by using cURL. Use the login and password you received by e-mail, when you first signed up and then request the processed.jpg file

# first we login and retrieve a session id
$ sessionid=$(curl -d "username=LOGIN&password=PASSWORD" https://try.activeeon.com/rest/scheduler/login)

# then we push the image file into the USERSPACE
$ curl -k -H "sessionid:$sessionid" https://try.activeeon.com/rest/scheduler/dataspace/USERSPACE/processed.jpg > processed.jpg