Open Government Data
There is a wealth of public data available on the web, much of which is highly suitable for data science projects. Here are a few sources which caught my eye as potentially interesting analysis projects. You should also check out our tutorial on using SPARQL to access open government data sets on data.gov.
- City of Chicago Public Employee Data: As part of their effort at transparency, the City of Chicago has published a large body of data about compensation and employee behavior. Probably could glean some interesting insights from this. Some of relevant files.
Current Employee Names, Salaries, and Position Titles
- US Treasury News Feed – Treasury’s official blog, featuring blog posts from Treasury’s senior officials and staff sharing news, announcements and information about the work done at the Treasury Department.
- Lost, Found, and Adoptable Pets: Data feed From King County, Washington (Seattle) about lost, found, and adoptable pets. Seems like something you could pull a couple of times and compare versions to spot trends in how these cases were resolved.
King County Lost, Found, Adoptable Pets
- Sunlight Foundation: Collection of open API’s and data-sets focused on making the government and politicians more accountable and transparent.
- College Scorecard: Large Department of Education data-set which monitors how well students and alumni of each school perform after they graduate.
College Scorecard – Detailed Data
- Bureau of Labor Statistics – Occupational Information: Survey data around wages and employment for different occupations. The entire BLS site is well worth exploring if this type of analysis interests you.
- Financial Services Consumer Complaints: Large dataset of consumer complaint data; available in multiple formats. Ample opportunity to data mine this for all sorts of insights. Warning: This one will test the size limits of your computer.
- Healthcare – Timely & Effective Care:
- California State Government Open Data Portal: One of many states which is making various data assets public. Their collection includes purchasing information, transportation, and environmental data.