Building a Retrosheet Database for the 2016 Season, Part 1

Baseball season is almost upon us. Soon, people will flood to ballparks in cities all over our great nation in search of entertainment and meaning, while baseball bloggers will continue their search for relevance and the mysterious Full Time Gig. If you fall into the latter camp (or if you just like having this kind of data handy), then it’s time to get your Retrosheet database installed/updated.

For those not in the know, Retrosheet is a magnificent project that essentially looks to turn box scores into computer records. And they’ve done a great job of it. They have all box scores from games since 1914, and play-by-play data since around 1940. What we’ll want to do is convert their records into an easily-searchable database that we can query for fun and profit.

Below is a video walking you through how to get your machine set up. We won’t actually be loading the data yet — that will come in Part 2 — but we’ll make sure your computer is prepped and has all the files and utilities is needs.

If you already installed a Retrosheet database using our instructions from last year, most of this won’t apply to you, but feel free to follow along. You’ll certainly need the links to the new packages that are now up on our GitHub page, but most of what you’ll need is in Part 2.

(Mac people: as I mentioned in the video, your instructions are coming)

Links mentioned in the video:

You Aren't a FanGraphs Member
It looks like you aren't yet a FanGraphs Member (or aren't logged in). We aren't mad, just disappointed.
We get it. You want to read this article. But before we let you get back to it, we'd like to point out a few of the good reasons why you should become a Member.
1. Ad Free viewing! We won't bug you with this ad, or any other.
2. Unlimited articles! Non-Members only get to read 10 free articles a month. Members never get cut off.
3. Dark mode and Classic mode!
4. Custom player page dashboards! Choose the player cards you want, in the order you want them.
5. One-click data exports! Export our projections and leaderboards for your personal projects.
6. Remove the photos on the home page! (Honestly, this doesn't sound so great to us, but some people wanted it, and we like to give our Members what they want.)
7. Even more Steamer projections! We have handedness, percentile, and context neutral projections available for Members only.
8. Get FanGraphs Walk-Off, a customized year end review! Find out exactly how you used FanGraphs this year, and how that compares to other Members. Don't be a victim of FOMO.
9. A weekly mailbag column, exclusively for Members.
10. Help support FanGraphs and our entire staff! Our Members provide us with critical resources to improve the site and deliver new features!
We hope you'll consider a Membership today, for yourself or as a gift! And we realize this has been an awfully long sales pitch, so we've also removed all the other ads in this article. We didn't want to overdo it.

TechGraphs GitHub: https://github.com/techgraphs/2016Ret…

MySQL Server: https://dev.mysql.com/downloads/mysql/

Wget: http://gnuwin32.sourceforge.net/packa…

7-Zip: http://www.7-zip.org/

SQLyog: https://github.com/webyog/sqlyog-comm…





David G. Temple is the Managing Editor of TechGraphs and a contributor to FanGraphs, NotGraphs and The Hardball Times. He hosts the award-eligible podcast Stealing Home. Dayn Perry once called him a "Bible Made of Lasers." Follow him on Twitter @davidgtemple.

4 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Shane Tourtellotte
9 years ago

Retrosheet’s box scores actually go back to 1913 now. They are that much more amazing, David, and only getting amazinger.

Gotta run now. Have an instructional video to watch.

Matt
9 years ago

I followed your series last year and set this up. After that I found it a bit difficult to understand the schema, and how to extract data. Any chance there’d be a followup video this year giving some quick examples on querying and merging tables to find useful information?

Cory
9 years ago

@Matt – I think that’s a great idea, perhaps David & Co. could setup some sort of forum where users could post code / ideas / findings so that others could replicate and/or add color to it as well?