If written words aren’t really your jam, then check out this fast-paced, edge-of-the-seat Tableau Public high-speed thriller of a film I made. For bonus points, listen for my cat in the background:
• Do I want to use data as evidence for a claim?
• Do I want to present my data in way other than a table?
• Do I want the user to be able to interact with my data?
If the answers to these questions are “yes,” then I cannot more highly recommend using the free program Tableau Public for creating a visualization for your data. Of course, the answer to these questions should not always be yes. Sometimes a contingency table is sufficient, and sometimes all you need is an Excel graph (which does not allow for user interaction). The temptation we have to resist is thinking: Well, Tableau makes it pretty and interactive, so let’s use Tableau every time.
I have used tables, GIFs, word diagrams, and interactive tables to communicate data-driven ideas, but nothing has been more fantastic — or easier to learn — than Tableau Public. The first thing you’ll need is not the program, but a question. If you don’t have a question, then you don’t need Tableau. Good questions lead to good articles and good Tableaus, but if you just want to dump data into a program, then the interest in your product will be limited.
Let’s, together, ask this question: What is the relationship between a team’s hitting and it’s win-loss record?
This is a great and basic baseball question — and it is one we can transmute into other sports easily, if’n baseball isn’t your thing (i.e. What is the relationship between field goal percentage / points per drive / shooting percentage to an NBA / NFL / NHL team’s W-L record?). We start by getting some data. For my baseball questions, I usually grab data from the FanGraphs SQL server (which is proprietary) as well as the FanGraphs Leaderboard.
The FanGraphs Leaderboards are magnificent and David Appelman (hi, boss!) has done a double-bang-up job making them highly functional. Sometimes the Sports-Reference Play Index tools answer some of my more unusual questions (like: What is the average team record in 1-run games?), but about 60 to 80 percent of my questions can be answered via the FanGraphs leaderboard (which has wOBA, WAR, wRC+, and PITCHf/x data, making it especially useful for advanced analytics questions). There’s a lot of great things you can do with drop-menus and filters on the FG leaderboards, but for now, let’s just use this handy Export Data link here:
Midway down the page, we find the magic Make a CSV button!
Your browser will now download a .csv file (a comma-separated values file, basically a text document version of an Excel spreadsheet). This is our data.
Now we have a question and we have data. Next we need Tableau Public. Download the appropriate file (Windows or Mac), and then install it. If you are using Linux, you’ll want to download the .exe file and install it using Wine. In order to save and to publish any of your work with Tableau, you will need to set up an account with their cloud server. They will probably send you and email from time to time, but in my experience, it has been the good kinds of email — y’know, actual people asking you what they think of their product.
Let me know in the comments if you are having any trouble at this stage, and I will go into more detail with regards of how to install and set up Tableau Public. I’m assuming most people intimidated by this program are not blocked by the installation process, but the data manipulation side of things, so I’ll focus there.
Once you install and open Tableau, you will land on a screen with a big orange button that says “Open Data.” That will take you to this screen, whereupon you can choose your data source:
I typically use Excel files because I will usually want to alter the data a bit before creating a visualization.
We can select our CSV file from our downloads folder, or we can open an Excel file from this screen. Personally, I like dropping the data in Excel first (as seen in the video above) to be sure I’ve got all the right data. Also, Tableau Public does not always love working with CSV files for whatever reason.
Anyway, once we select our data source, a window pops up asking us some specifics about the data. I’d suggest reading the options in here, but for the most part, we can just hit okay and go on living our lives.
With the data loaded, we finally reach Tableau’s sheet view. This is where we will construct charts and graphics, as well as embeddable HTML for blog posts and the like — this, in other words, is where the magic happens.
The Tableau sheet view has a great drag and drop interface.
Our main three areas, at least at first, will be the (1) Measures and Dimensions panel on the left, (2) the Marks panel in the middle, and (3) the Columns and Rows panels across the top. Just dragging and dropping items between these three areas, we can make a whole Tableau document.
Let’s start by dragging two measures into the columns and rows sections. When we do that, we — disappointingly — get this:
With the data types set to incorrect formats, we can end up with disappointing results. Trial and error is your friend here.
So we obviously didn’t want just a single point in our scatterplot. This is a side effect of wrong data types. Tableau is treating my two inputs (ISO and SB%, which I calculated in Excel as SB/PA) as continuous variables. That means it is summing up all the ISOs and SB-rates in the league, but I want each team to have it’s own individual point in the plot.
By clicking that little green arrow inside my variable icons, I can play around with the data types until I finally have a scatter plot that is scatter plottish.
The green arrow inside the variable icon allows us to tweak the data types.
Once we have both variables switched to, in this case, “dimension,” we can then see a proper scatter plot forming:
Scatter plots tend to be my favorite form of data representation, and with Tableau, we can cleanly add more than just two dimensions of information into a scatter plot.
Now we can play around with the presentation — and this is where Tableau really separates itself from Excel. In Excel, if I add labels or colors to my icon, I cannot do so with a third information element. In other words, if I have a plot comparing SB-rate and ISO and then ask Excel to add labels, it’s going to use the Y-axis to automatically populate the label names. That’s no good if I want my dots to represent specific teams.
With Tableau, I can just drag the Teams dimension into the Label square and then Presto-Magnifico, I’ve got my dots labeled appropriately:
Another nifty thing: Tableau does a great job of arranging labels to avoid annoying overlap.
I cannot recommend highly enough the value of playing around with the Marks section. Just drag and drop different Measures and Dimensions into those little rounded squares. As you get more comfortable with these tools, you will start to see the great depth of Tableau’s functionality.
When you have finished getting your plot to where you want it, you’ll want to create a new dashboard. The dashboards are where you can combine multiple graphics (say, a scatter plot and bar graph) as well as organize your keys and color scales and whatnots. To create a new dashboard, click on the new tab icon on the bottom that so happens to look like the Chinese character for field or farm, 田:
Click this little icon to create a new dashboard. You’ll probably want a dashboard if you’re planning on embedding your plot into an HTML blog post.
Like before, everything is click-and-drag in the dashboard view, and if you want any extra formatting options, just right-click something. When you have arranged your dashboard how you like it (and the video above goes into greater detail about this), then you will want to save you project. (Yikes! We waited this long to save?!)
When you save, you are not saving to your hard drive, but Tableau’s cloud. This is both a blessing and a bummer. This means you can access your Tableau biz from all manner of computers — which has proved handy for that last-minute correction — but also means Tableau kinda controls and distributes your work as it so pleases. So, in other words, don’t go around making charts of your friend’s personal cycles on Tableau, lest that kind of info go accidentally public.
Of course, if you’re using Tableau for just sports research, as I do, then you will probably like the extra exposure your hard work gets from, say, appearing on their occasional list of most popular Tableaus (a list I have appeared on a few times, thanks to FanGraphs readership, but would not have otherwise known about had someone not congratulated me). Moreover, the people at Tableau seem genuinely interested in improving their product and have in the past contacted me about questions I had. I imagine if I have serious concerns about my data going public (which, again, why would be using Tableau Public?) then the people Tableau would work with me to find a better fit.
Anyway, once you save your data, a new window will popup. I usually click the “Open in a Web Browser” button at the bottom of the screen and then grab the embed code from the bottom of the page that opens up. I can go into more detail on the embedding process in later articles.
I hope this was helpful! Let me know if you would like more of these or if you feel like I’ve just crushed your soul, wasted your life, or skipped too many steps.