R programming language resources › Forums › Data manipulation › Large dataset problem
- This topic has 2 replies, 3 voices, and was last updated 14 years, 7 months ago by
chrisadam2.
- AuthorPosts
- April 12, 2009 at 7:20 pm #393
bryan
ParticipantThis question came to me via email. I don’t have a good answer so I’m posting it here in case others can help:
I have been using R for sometime now and trying to push it in my organization as a viable alternative/parallel analytical workbench to SAS. I have created a process using Sweave to create pdf files of univariate exploratory analysis of datasets, where every column will be graphed, summary statistics collected, etc and all of this put in a pdf document.
My problem is that the datasets I deal with currently are small (few hundred MB) whereas I have been asked to execute this process on datasets upwards of 10 GB at times.
I know R is limited by the RAM in the system, and there are a few packages(filehash, bigmemory) to name a few that provide solutions to this problem, but they are not extensively tested. Creating SQL servers and having R connect with it is also an option, but I am not sure if such a solution can be used to gather summary statistics by calling each column from the DB.
I have also looked at the possibility of using R on a 64 bit Linux box with 8 GB of RAM but the cost involved is prohibiting that implementation and still does not take care of datasets upwards of 10 GB in size.
Is there a method by which I can create a swap space or a temporary disk space like SAS does during execution? Are you aware of possible solutions to this problem?
May 13, 2009 at 7:22 pm #395user111
MemberI am building a related package that is able to read a 3.5 GB file into R in a 2 GB laptop and process the columns individually. It is still in beta, though.
Please, write to me at cgb@datanalytics.com and I will send you further information.
Best regards,
Carlos J. Gil Bellosta
http://www.datanalytics.comSeptember 17, 2010 at 7:24 pm #397chrisadam2
MemberI am working in an windows based application using SQL Server 2000 as
database. There are few tables (refer parent tables) in the
application which are uploaded by a separate application. - AuthorPosts
- You must be logged in to reply to this topic.