2011年4月16日星期六

Parallel Processing - introduction for R users

here a good parallel processing introduction for R users

http://www.stat.umn.edu/~charlie/parallel/

I found the batch processing section in this introduction is really valuable for me.

I just copy/paste some lines here:
(1)
This is really old stu (from 1975). But not everyone knows it.
If you do the following at a unix prompt
nohup nice -n 19 some job &
where \some job" is replaced by an actual job, then
the job will run in background (because of &).
the job will not be killed when you log out (because of nohup).
the job will have low priority (because of nice -n 19).
(2)
For example, if foo.R is a plain text le containing R commands,
then
nohup nice -n 19 R CMD BATCH --vanilla foo.R &
executes the commands and puts the printout in the le foo.Rout.
And
nohup nice -n 19 R CMD BATCH --no-restore foo.R &
executes the commands, puts the printout in the le foo.Rout,
and saves all created R objects in the le .RData.
(3)
nohup nice -n 19 R CMD BATCH foo.R &
is a really bad idea! It reads in all the objects in the le .RData (if
one is present) at the beginning. So you have no idea whether
the results are reproducible.
Always use --vanilla or --no-restore except when debugging.
(4)
This idiom has nothing to do with R. If foo is a compiled C or
C++ or Fortran main program that doesn't have command line
arguments (or a shell, Perl, Python, or Ruby script), then
nohup nice -n 19 foo &
runs it. And
nohup nice -n 19 foo < foo.in > foo.out &
runs it taking input from the le foo.in and placing output in the
le foo.out. Regular output and error messages are interspersed
and not necessarily in order.
nohup nice -n 19 foo < foo.in > foo.out 2> foo.err &
puts the error messages in a separate fi le.
(5)
Don't omit the nice -n 19. If you omit it, and we notice it, you'll
be in trouble. Or if we got up on the wrong side of bed that
morning, we'll just kill your jobs.
(6)
We've got lots of computers, and each one has eight processors
(so eight jobs can run simultaneously).
That allows a lot of parallel processing without knowing anything
more than how to background a job.

没有评论:

发表评论