R programming language resources › Forums › Data manipulation › Large scale file analysis
This topic contains 2 replies, has 2 voices, and was last updated by pepsimax 1 year ago.
-
AuthorPosts
-
May 22, 2012 at 9:31 pm #495
Hello everyone,
I’m relatively new to R and programming in general, as this question will show. I have a dataset of 250 files that I need to analyse. What I need to do is to divide it up into subsets of ~12 files each, and compare each file against a set of the other 11. By compare, I mean run a particular script that does a few things and hopefully ouputs a csv file. I have this script working but only on a low level of file against a defined set.
I don’t mind writing a script to just do it the slower way of analysing a set of 12 a time (instead of the script working on all 250 at once), if it makes it much easier.
So, my question is, how do I write a loop to do this? If I write something like
x<- 1:12
filename<- paste (“first bit”, x, “second bit”.filetype)
while {
test(x) against (reference set, which i want to be everything but file X in this set of 12 files)
series of functions comparing test against ref (this is the bit that works already)
write.table “filename.csv”
}
Is that anywhere near right?
May 22, 2012 at 9:32 pm #497You’re on the right track, but a couple of questions…
Should the 11 other files be combined before they are compared to x? And will each file in the 12 file set be an x-file at some point?
May 23, 2012 at 9:35 pm #499Yes, once I have them in sets of 12, I use a function in the package Im using (ExomeDepth) to combine all 12 into a GenomicRanges object by parsing the BAM (a binary file type) files. Next, I choose the X I want to test and compare against the other 11. And you’re right in that every file will be an X in a separate round i.e. loop one: file 1 compared to files 2-12, loop two: file 2 against 1,3-12 etc. Comparing them involves running a few functions with the eventual output being a data.frame and a plot. What I’d like to do is write a loop to automate this, including the write.csv part, preferably for all 250 files.
The files are all within a directory, but dispersed in various subdirectories. I’ve created a variable that lists all the files, so my next step is to split that into subgroups of ~12.
I really appreciate your help!
-
AuthorPosts
You must be logged in to reply to this topic.
