Large dataset problem
This question came to me via email. I don't have a good answer so I'm posting it here in case others can help:
I have been using R for sometime now and trying to push it in my organization as a viable alternative/parallel analytical workbench to SAS. I have created a process using Sweave to create pdf files of univariate exploratory analysis of datasets, where every column will be graphed, summary statistics collected, etc and all of this put in a pdf document.
My problem is that the datasets I deal with currently are small (few hundred MB) whereas I have been asked to execute this process on datasets upwards of 10 GB at times.
I know R is limited by the RAM in the system, and there are a few packages(filehash, bigmemory) to name a few that provide solutions to this problem, but they are not extensively tested. Creating SQL servers and having R connect with it is also an option, but I am not sure if such a solution can be used to gather summary statistics by calling each column from the DB.
I have also looked at the possibility of using R on a 64 bit Linux box with 8 GB of RAM but the cost involved is prohibiting that implementation and still does not take care of datasets upwards of 10 GB in size.
Is there a method by which I can create a swap space or a temporary disk space like SAS does during execution? Are you aware of possible solutions to this problem?

I am building a related
I am building a related package that is able to read a 3.5 GB file into R in a 2 GB laptop and process the columns individually. It is still in beta, though.
Please, write to me at cgb@datanalytics.com and I will send you further information.
Best regards,
Carlos J. Gil Bellosta
http://www.datanalytics.com