Large dataset problem

Viewing 3 posts - 1 through 3 (of 3 total)
  • Author
    Posts
  • #393
    bryan
    Participant

    This question came to me via email. I don’t have a good answer so I’m posting it here in case others can help:

    I have been using R for sometime now and trying to push it in my organization as a viable alternative/parallel analytical workbench to SAS. I have created a process using Sweave to create pdf files of univariate exploratory analysis of datasets, where every column will be graphed, summary statistics collected, etc and all of this put in a pdf document.

    My problem is that the datasets I deal with currently are small (few hundred MB) whereas I have been asked to execute this process on datasets upwards of 10 GB at times.

    I know R is limited by the RAM in the system, and there are a few packages(filehash, bigmemory) to name a few that provide solutions to this problem, but they are not extensively tested. Creating SQL servers and having R connect with it is also an option, but I am not sure if such a solution can be used to gather summary statistics by calling each column from the DB.

    I have also looked at the possibility of using R on a 64 bit Linux box with 8 GB of RAM but the cost involved is prohibiting that implementation and still does not take care of datasets upwards of 10 GB in size.

    Is there a method by which I can create a swap space or a temporary disk space like SAS does during execution? Are you aware of possible solutions to this problem?

    #395
    user111
    Member

    I am building a related package that is able to read a 3.5 GB file into R in a 2 GB laptop and process the columns individually. It is still in beta, though.

    Please, write to me at cgb@datanalytics.com and I will send you further information.

    Best regards,

    Carlos J. Gil Bellosta
    http://www.datanalytics.com

    #397
    chrisadam2
    Member

    I am working in an windows based application using SQL Server 2000 as
    database. There are few tables (refer parent tables) in the
    application which are uploaded by a separate application.

Viewing 3 posts - 1 through 3 (of 3 total)
  • You must be logged in to reply to this topic.
Scroll to top
Privacy Policy