Large dataset problem - ProgrammingR

This topic has 2 replies, 3 voices, and was last updated 15 years, 8 months ago by chrisadam2.

Viewing 3 posts - 1 through 3 (of 3 total)

Author
Posts
April 12, 2009 at 7:20 pm #393
bryan
Participant
This question came to me via email. I don’t have a good answer so I’m posting it here in case others can help:
I have been using R for sometime now and trying to push it in my organization as a viable alternative/parallel analytical workbench to SAS. I have created a process using Sweave to create pdf files of univariate exploratory analysis of datasets, where every column will be graphed, summary statistics collected, etc and all of this put in a pdf document.
My problem is that the datasets I deal with currently are small (few hundred MB) whereas I have been asked to execute this process on datasets upwards of 10 GB at times.
I know R is limited by the RAM in the system, and there are a few packages(filehash, bigmemory) to name a few that provide solutions to this problem, but they are not extensively tested. Creating SQL servers and having R connect with it is also an option, but I am not sure if such a solution can be used to gather summary statistics by calling each column from the DB.
I have also looked at the possibility of using R on a 64 bit Linux box with 8 GB of RAM but the cost involved is prohibiting that implementation and still does not take care of datasets upwards of 10 GB in size.
Is there a method by which I can create a swap space or a temporary disk space like SAS does during execution? Are you aware of possible solutions to this problem?
May 13, 2009 at 7:22 pm #395
user111
Member
I am building a related package that is able to read a 3.5 GB file into R in a 2 GB laptop and process the columns individually. It is still in beta, though.
Please, write to me at cgb@datanalytics.com and I will send you further information.
Best regards,
Carlos J. Gil Bellosta
http://www.datanalytics.com
September 17, 2010 at 7:24 pm #397
chrisadam2
Member
I am working in an windows based application using SQL Server 2000 as
database. There are few tables (refer parent tables) in the
application which are uploaded by a separate application.
Author
Posts

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic.