mcparallel
Evaluate an R Expression Asynchronously in a Separate Process
Description
These functions are based on forking and so are not available on Windows.
mcparallel
starts a parallel R process which evaluates the given expression.
mccollect
collects results from one or more parallel processes.
Usage
mcparallel(expr, name, mc.set.seed = TRUE, silent = FALSE, mc.affinity = NULL, mc.interactive = FALSE, detached = FALSE) mccollect(jobs, wait = TRUE, timeout = 0, intermediate = FALSE)
Arguments
expr | expression to evaluate (do not use any on-screen devices or GUI elements in this code, see |
name | an optional name (character vector of length one) that can be associated with the job. |
mc.set.seed | logical: see section ‘Random numbers’. |
silent | if set to |
mc.affinity | either a numeric vector specifying CPUs to restrict the child process to (1-based) or |
mc.interactive | logical, if |
detached | logical, if |
jobs | list of jobs (or a single job) to collect results for. Alternatively |
wait | if set to |
timeout | timeout (in seconds) to check for job results – applies only if |
intermediate |
|
Details
mcparallel
evaluates the expr
expression in parallel to the current R process. Everything is shared read-only (or in fact copy-on-write) between the parallel process and the current process, i.e. no side-effects of the expression affect the main process. The result of the parallel execution can be collected using mccollect
function.
mccollect
function collects any available results from parallel jobs (or in fact any child process). If wait
is TRUE
then collect
waits for all specified jobs to finish before returning a list containing the last reported result for each job. If wait
is FALSE
then mccollect
merely checks for any results available at the moment and will not wait for jobs to finish. If jobs
is specified, jobs not listed there will not be affected or acted upon.
Note: If expr
uses low-level multicore functions such as sendMaster
a single job can deliver results multiple times and it is the responsibility of the user to interpret them correctly. mccollect
will return NULL
for a terminating job that has sent its results already after which the job is no longer available.
Jobs are identified by process IDs (even when referred to as job objects), which are reused by the operating system. Detached jobs created by mcparallel
can thus never be safely referred to by their process IDs nor job objects. Non-detached jobs are guaranteed to exist until collected by mccollect
, even if crashed or terminated by a signal. Once collected by mccollect
, a job is regarded as detached, and thus must no longer be referred to by its process ID nor its job object. With wait = TRUE
, all jobs passed to mccollect
are collected. With wait = FALSE
, the collected jobs are given as names of the result vector, and thus in subsequent calls to mccollect
these jobs must be excluded. Job objects should be used in preference of process IDs whenever accepted by the API.
The mc.affinity
parameter can be used to try to restrict the child process to specific CPUs. The availability and the extent of this feature is system-dependent (e.g., some systems will only consider the CPU count, others will ignore it completely).
Value
mcparallel
returns an object of the class "parallelJob"
which inherits from "childProcess"
(see the ‘Value’ section of the help for mcfork
). If argument name
was supplied this will have an additional component name
.
mccollect
returns any results that are available in a list. The results will have the same order as the specified jobs. If there are multiple jobs and a job has a name it will be used to name the result, otherwise its process ID will be used. If none of the specified children are still running, it returns NULL
.
Random numbers
If mc.set.seed = FALSE
, the child process has the same initial random number generator (RNG) state as the current R session. If the RNG has been used (or .Random.seed
was restored from a saved workspace), the child will start drawing random numbers at the same point as the current session. If the RNG has not yet been used, the child will set a seed based on the time and process ID when it first uses the RNG: this is pretty much guaranteed to give a different random-number stream from the current session and any other child process.
The behaviour with mc.set.seed = TRUE
is different only if RNGkind("L'Ecuyer-CMRG")
has been selected. Then each time a child is forked it is given the next stream (see nextRNGStream
). So if you select that generator, set a seed and call mc.reset.stream
just before the first use of mcparallel
the results of simulations will be reproducible provided the same tasks are given to the first, second, ... forked process.
Note
Prior to R 3.4.0 and on a 32-bit platform, the serialized result from each forked process is limited to 2^31 - 1 bytes. (Returning very large results via serialization is inefficient and should be avoided.)
Author(s)
Simon Urbanek and R Core.
Derived from the multicore package formerly on CRAN. (but with different handling of the RNG stream).
See Also
Examples
p <- mcparallel(1:10) q <- mcparallel(1:20) # wait for both jobs to finish and collect all results res <- mccollect(list(p, q)) ## IGNORE_RDIFF_BEGIN ## reports process ids, so not reproducible p <- mcparallel(1:10) mccollect(p, wait = FALSE, 10) # will retrieve the result (since it's fast) mccollect(p, wait = FALSE) # will signal the job as terminating mccollect(p, wait = FALSE) # there is no longer such a job ## IGNORE_RDIFF_END # a naive parallel lapply can be created using mcparallel alone: jobs <- lapply(1:10, function(x) mcparallel(rnorm(x), name = x)) mccollect(jobs)
Copyright (©) 1999–2012 R Foundation for Statistical Computing.
Licensed under the GNU General Public License.