For many enterprise users, R is a programming language we acquired via process of serendipity. We were assigned a project that required graphing or statistical analysis and our manager suggested that we use R to deliver the goods. Or we were assigned into a working group which had previously decided to adopt R as their preferred tool. Our first contact with the language was usually in the context of performing a specific task – data aggregation or exploration – and focused on delivering a specific result. Formal training was rarely offered. Here’s a problem, here’s a book (or internet tutorial), the remainder is left as an exercise for the student….
This guide is intended to help the novice R user quickly get “up and running” in the sort of practical r programming tasks a commercial analyst is likely to encounter. This is intended as neither a formal R tutorial or reference manual on the language – there are many well written books on these topics available elsewhere. It is rather a focused cook-book of basic data manipulation tasks which the typical commercial analyst will be asked to perform that represents 80% of the code they will write and use on a daily basis. It is intended to help you quickly “get up the curve” and proficient in the basic operations for your job.
In approaching this topic, we assume you already have proficiency in two tools which are commonly used by corporate data analysts: Excel spreadsheets (for calculations and data presentation) and SQL queries to extract data from your corporate systems. Excel users are advised to learn how to save your spreadsheet as a Comma-Separated-Value file (aka a CSV) and import a CSV into a spreadsheet. This will simplify the process of moving data between R (where you can script manipulations) and Excel (manual calculations and presentation formats). We also recommend picking up basic knowledge of the SQL programming language – this will help you set up direct connections between your corporate database and your R programming environment; this allows you to script common data pulls and will vastly simplify your work. For a good short course in basic SQL syntax, including an online SQL testing tool, we recommend the materials at W3schools. For a database to test your skills, we recommend sqlite’s DBBrowser (free download) or Microsoft Access (likely loaded on your work PC).
Here’s a summary of the topics we want to address in this series:
- Moving data between Excel and R using CSV files
- SQl and R – using RODBC to automate moving data to R via SQL
- Merging datasets
- Setting Flags and Buckets
- Aggregating Data Using R
- Calculating Basic Statistics
- Exporting Data as a CSV file
- R Functions – organizing commands into basic scripts
And that it…. the “dirty dozen” of R programming operations that will get you up and running in a hurry.
Once you master these basic operations, we also recommend getting a good cookbook of R procedures. We use the R Cookbook from O’Reilly publishing (available on Amazon).