Large scale file analysis

R programming language resources Forums Data manipulation Large scale file analysis

Viewing 3 posts - 1 through 3 (of 3 total)
  • Author
    Posts
  • #495
    pepsimax
    Member

    Hello everyone,

    I’m relatively new to R and programming in general, as this question will show. I have a dataset of 250 files that I need to analyse. What I need to do is to divide it up into subsets of ~12 files each, and compare each file against a set of the other 11. By compare, I mean run a particular script that does a few things and hopefully ouputs a csv file. I have this script working but only on a low level of file against a defined set.

    I don’t mind writing a script to just do it the slower way of analysing a set of 12 a time (instead of the script working on all 250 at once), if it makes it much easier.

    So, my question is, how do I write a loop to do this? If I write something like

    x<- 1:12 filename<- paste ("first bit", x, "second bit".filetype) while { test(x) against (reference set, which i want to be everything but file X in this set of 12 files) series of functions comparing test against ref (this is the bit that works already) write.table "filename.csv" } Is that anywhere near right?

    #497
    bryan
    Participant

    You’re on the right track, but a couple of questions…

    Should the 11 other files be combined before they are compared to x? And will each file in the 12 file set be an x-file at some point?

    #499
    pepsimax
    Member

    Yes, once I have them in sets of 12, I use a function in the package Im using (ExomeDepth) to combine all 12 into a GenomicRanges object by parsing the BAM (a binary file type) files. Next, I choose the X I want to test and compare against the other 11. And you’re right in that every file will be an X in a separate round i.e. loop one: file 1 compared to files 2-12, loop two: file 2 against 1,3-12 etc. Comparing them involves running a few functions with the eventual output being a data.frame and a plot. What I’d like to do is write a loop to automate this, including the write.csv part, preferably for all 250 files.

    The files are all within a directory, but dispersed in various subdirectories. I’ve created a variable that lists all the files, so my next step is to split that into subgroups of ~12.

    I really appreciate your help!

Viewing 3 posts - 1 through 3 (of 3 total)
  • You must be logged in to reply to this topic.
Scroll to top
Privacy Policy