R programming language resources › Forums › Data manipulation › Large scale file analysis
- This topic has 2 replies, 2 voices, and was last updated 12 years, 11 months ago by
pepsimax.
- AuthorPosts
- May 22, 2012 at 9:31 pm #495
pepsimax
MemberHello everyone,
I’m relatively new to R and programming in general, as this question will show. I have a dataset of 250 files that I need to analyse. What I need to do is to divide it up into subsets of ~12 files each, and compare each file against a set of the other 11. By compare, I mean run a particular script that does a few things and hopefully ouputs a csv file. I have this script working but only on a low level of file against a defined set.
I don’t mind writing a script to just do it the slower way of analysing a set of 12 a time (instead of the script working on all 250 at once), if it makes it much easier.
So, my question is, how do I write a loop to do this? If I write something like
x<- 1:12 filename<- paste ("first bit", x, "second bit".filetype) while { test(x) against (reference set, which i want to be everything but file X in this set of 12 files) series of functions comparing test against ref (this is the bit that works already) write.table "filename.csv" } Is that anywhere near right?
May 22, 2012 at 9:32 pm #497bryan
ParticipantYou’re on the right track, but a couple of questions…
Should the 11 other files be combined before they are compared to x? And will each file in the 12 file set be an x-file at some point?
May 23, 2012 at 9:35 pm #499pepsimax
MemberYes, once I have them in sets of 12, I use a function in the package Im using (ExomeDepth) to combine all 12 into a GenomicRanges object by parsing the BAM (a binary file type) files. Next, I choose the X I want to test and compare against the other 11. And you’re right in that every file will be an X in a separate round i.e. loop one: file 1 compared to files 2-12, loop two: file 2 against 1,3-12 etc. Comparing them involves running a few functions with the eventual output being a data.frame and a plot. What I’d like to do is write a loop to automate this, including the write.csv part, preferably for all 250 files.
The files are all within a directory, but dispersed in various subdirectories. I’ve created a variable that lists all the files, so my next step is to split that into subgroups of ~12.
I really appreciate your help!
- AuthorPosts
- You must be logged in to reply to this topic.