R is a free and widely used programming language for statistical computation and graphics. ProActive PARConnector provides an API which makes it possible to write distributed R applications which execute over networks of machines.
In this tutorial, we will use ProActive PARConnector API to write a simple R applications which executes on four different machines (a.k.a ProActive Nodes) at the same time. We will use RGui as the software development environment.
1 Install and configure R environment
This tutorial was designed for and is tested with Ubuntu 16.04.
Install and configure R software environment also described here.
Add the R repository.$ sudo echo "deb http://cran.rstudio.com/bin/linux/ubuntu xenial/" | sudo tee -a /etc/apt/sources.listAdd R repository keys.
$ gpg --keyserver keyserver.ubuntu.com --recv-key 51716619E084DAB9
$ gpg -a --export 51716619E084DAB9 | sudo apt-key add -Update packages.
$ sudo apt-get updateInstall R and rJava.
$ sudo apt-get install r-base r-cran-rjava
Install Java JDK 8:
$ sudo apt-get install openjdk-8-jdk
Download the PARConnector:
wget https://s3.amazonaws.com/par-connector-tutorial/par-connector-tutorial-R-x86_64-pc-linux-gnu.tar.gz
Start your R environment:
$ R
Install the following additional R packages:
For example, you can type the following command in the R console to install them.
> install.packages(c('gtools', 'codetools', 'stringr'), Sys.getenv('R_LIBS_USER'), repo='http://cran.case.edu')
Attention:
Observe the command output to verify whether all packages have been installed successfully. In case of failure, google is your best bet to troubleshoot and find a solution.
Install the PARConnector package:
For example, you can type the following R command in the R console to install it.
> install.packages('<PATH-TO-par-connector-tutorial-R-x86_64-pc-linux-gnu.tar.gz>', Sys.getenv('R_LIBS_USER'), repos = NULL)
Install Docker also described here:
Install curl.$ sudo apt-get install curlAdd the GPG key for the official Docker repository to your system.
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -Add the Docker repository to APT sources.
$ sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
$ sudo apt-get updateInstall Docker.
$ apt-get install -y docker-ceCheck that Docker is well installed.
$ sudo docker -v
Start the ActiveEon par-connector-tutorial Docker container:
$ docker run -ti activeeon/par-connector-tutorial
2 Hello World job.
Start the R environment.
Type the following commands to load PARConnector and connect to the scheduler.
Replace 'login' and 'pwd' by the login and password you received when subscribing to the tutorial.
> library("PARConnector"); Le chargement a nécessité le package : rJava Le chargement a nécessité le package : gtools Le chargement a nécessité le package : codetools Le chargement a nécessité le package : stringr > PAConnect(url='https://try.activeeon.com/rest', login='login', pwd='pwd', insecure=TRUE); Connected to Scheduler at https://try.activeeon.com/rest [1] "Java-Object{org.ow2.proactive.scheduler.rest.SchedulerClient@108dacb}"
In this first example, we'll execute remotely a simple Hello World function, in one single machine.
We define the function hellow1 which prints Hello followed by the function argument :> hellow1 <- function(x) print(paste('Hello',x))We submit this function to the Scheduler using the function PASolve. PASolve returns an object which we store in a variable job. If displayed, this object describes the status of the job.
> job <- PASolve( hellow1, 'World') Job submitted (id : 2725) with tasks : t1 > job PARJob1 (id: 2725) (status: Running) t1 : Running at pacagrid.cloudapp.net (SSH-slice1-2) (0%)We wait for the job completion by calling the function PAWaitFor. This function returns the result of the job and prints the remote output.
> val <- PAWaitFor(job) t1 : [1630000@try.activeeon.com;13:37:12] [1] [1630000@try.activeeon.com;13:37:12] "Hello World" > val $t1 [1] "Hello World"
In this second example, we'll execute remotely the hellow1 function across several machines.
We use for that a list parameter as below. The syntax of PASolve is similar to the R function mapply, it will produce as many executions, as the size of its list parameters :> res <- PASolve( hellow1, list('World1','World2','World3')) Job submitted (id : 2726) with tasks : t1, t2, t3 > val <- PAWaitFor(res) t1 : [1640000@try.activeeon.com;13:40:52] [1] [1640000@try.activeeon.com;13:40:52] "Hello World1" t2 : [1640001@try.activeeon.com;13:40:52] [1] [1640001@try.activeeon.com;13:40:52] "Hello World2" t3 : [1640002@try.activeeon.com;13:40:52] [1] [1640002@try.activeeon.com;13:40:52] "Hello World3"
Explanation :
In this second example, instead of having a single string parameter, we have a list of string of size 3.
PASolve will interpret this list as multiple evaluations, just like mapply does.
It will evaluate in the cloud the following calls:
print(paste('Hello', 'World1'))
print(paste('Hello', 'World2'))
print(paste('Hello', 'World3'))
In this third example, we'll execute remotely a new Hello World function with two parameters, in one single machine.
We define the function hellow3 which prints its two arguments :> hellow3 <- function(x,y) print(paste(x,y))As the function takes two parameters instead of one, the corresponding PASolve call will contain one additional parameter :
> job <- PASolve( hellow3, 'Hello', 'World') Job submitted (id : 2802) with tasks : t1 > val <- PAWaitFor(job) t1 : [1650000@try.activeeon.com;13:43:38] [1] [1650000@try.activeeon.com;13:43:38] "Hello World"The execution scheme is similar to example 1, just with two parameters instead of one.
In this last example, we'll execute remotely the hellow3 function across several machines.
We use for that list parameters as below :> res <- PASolve( hellow3, list('Hello1', 'Hello2', 'Hello3'), list('World1','World2','World3')) Job submitted (id : 2727) with tasks : t1, t2, t3 > val <- PAWaitFor(res) t1 : [1660000@try.activeeon.com;13:45:21] [1] [1660000@try.activeeon.com;13:45:21] "Hello1 World1" t2 : [1660001@try.activeeon.com;13:45:20] [1] [1660001@try.activeeon.com;13:45:20] "Hello2 World2" t3 : [1660002@try.activeeon.com;13:45:21] [1] [1660002@try.activeeon.com;13:45:21] "Hello3 World3"
Explanation :
PASolve will match each elements of the first list to the corresponding elements of the second list.
Doing so, it will evaluate in the cloud the following calls:
print(paste('Hello1', 'World1'))
print(paste('Hello2', 'World2'))
print(paste('Hello3', 'World3'))
In this chapter, we will demonstrate how to submit simple jobs that use input and output files.
In this example, we will show a file transfer to one single machine.
We define a function mycopy which copies a file into another. It uses its parameter to determine the file name :
> mycopy <- function(i) file.copy(paste0("in",i,".txt"), paste0("out",i,".txt"))Check the current directory of your R session by using the command getwd:
> getwd() [1] "H:/Users/demo/Documents"Create a file in1.txt in this directory. You can put some text content in this file if you want.
> job <- PASolve( mycopy, 1, input.files="in1.txt", output.files="out1.txt") Job submitted (id : 2805) with tasks : t1 > val <- PAWaitFor(job) > val $t1 [1] TRUE
In this example, we will show a file transfer to multiple machines.
> job <- PASolve( mycopy, 1:3, input.files="in%1%.txt", output.files="out%1%.txt") Job submitted (id : 2804) with tasks : t1, t2, t3 > val <- PAWaitFor(job) > val $t1 [1] TRUE $t2 [1] TRUE $t3 [1] TRUE