Archive for How-To

How To Use R For Sports Stats, Part 2: Visualization and Analysis

Welcome back! In Part 1 of this series, we went over the bare bones of using R–loading data, pulling out different subsets, and doing basic statistical tests. This is all cool enough, but if you’re going to take the time to learn R, you’re probably looking for something… more out of your investment.

One of R’s greatest strengths as a programming language is how it’s both powerful and easy-to-use when it comes to data visualization and statistical analysis. Fortunately, both of these are things we’re fairly interested in. In this post, we’ll work through some of the basic ways of visualizing and analyzing data in R–and point you towards where you can learn more.

(Before we start, one commenter reminded me that it can be very helpful to use an IDE when coding. Integrated development environments, like RStudio, work similarly to the basic R console, but provide helpful features like code autocompletion, better-integrated documentation, etc. I’ll keep taking screenshots in the R console for consistency, but feel free to try out an IDE and see if it works for you.)

Look At That Data

We’ll be using the same set of 2013-14 batter data that we did last time, so download that (if you haven’t already) and load it back up in R:

fgdata = read.csv("FGdat.csv")

Possibly my favorite thing about R is how, often, all it takes is a very short function to create something pretty cool. Let’s say you want to make a histogram–a chart that plots the frequency counts of a given variable. You might think you have to run a bunch of different commands to name the type of chart, load your data into the chart, plot all the points, and so on? Nope:

hist(fgdata$wRC)

Basic R histogramThis Instant Histogram(™) displays how many players have a wRC+ in the range a given bar takes up in the x-axis. This histogram looks like a pretty normal, bell-curveish distribution, with an average a bit over 100–which makes sense, since the players with a below-average wRC+ won’t get enough playing time to qualify for our data set.

(You can confirm this quantitatively by using a function like summary(fgdata$wRC).)

The hist() function, right out of the box, displays the data and does it quickly–but it doesn’t look that great. You can spend endless amounts of time customizing charts in R, but let’s add a few parameters to make this look nicer.

hist(fgdata$wRC, breaks=25, main="Distribution of wRC+, 2013 - 2014", xlab="wRC+", ylab= NULL, col="darkorange2")

In this command, ‘breaks’ is the number of bars in the chart, ‘main’ is the chart title, ‘xlab’ and ‘ylab’ are the axis titles, and ‘col’ is the color. R recognizes a pretty wide range of colors, though you can use RGB, hex, etc. if you’re more familiar with them.

Anyway, here’s the result:

Visually appealing R histogramA bit better, right? The distribution doesn’t look quite as normal now, but it’s still pretty close–we can actually add a bell curve to eyeball far off it is.

hist(fgdata$wRC, breaks=25, freq = FALSE, main="Distribution of wRC+, 2013 - 2014", xlab="wRC+", ylab= NULL, col="darkorange2")
curve(dnorm(x, mean=mean(fgdata$wRC), sd=sd(fgdata$wRC)), add=TRUE, col="darkblue", lwd=2)

Visually appealing R histogram with curve

(In the first line above, “freq = FALSE” indicates that the y-axis will be a probability density rather than a frequency count; the second line creates a normal curve with the same mean and standard deviation as your data set. Also, it’s blue.)

You can also plot multiple charts at the same time–use the par(mfrow) function with the preferred number of rows and columns:

par(mfrow=c(2,2)) 
hist(fgdata$wOBA, breaks=25) 
hist(fgdata$wRC, breaks=25) 
hist(fgdata$Off, breaks=25) 
hist(fgdata$BABIP, breaks=25)

2x2 grid of R histogramsWhen you want to save your plots, you can copy them to your clipboard–or create and save an image file directly from R:

png(file="whatisitgoodfor.png",width=400,height=350)
hist(fgdata$WAR, breaks=25)
dev.off()

(It’ll show up in the same directory you’re loading your data set from.)

So that covers histograms. You can create bar charts, pie charts, and all of that, but you’re probably more interested in everyone’s favorite, the scatterplot.

At its most basic, the plot function is literally plot() with the two variables you want to compare:

plot(fgdata$SLG, fgdata$ISO)
Basic R scatterplot
Unsurprisingly, slugging percentage and ISO are fairly well-correlated. Results-wise, we’re starting to push against the limits of our data set–too many of these stats are directly connected to find anything interesting.

So let’s take a different tack and look at year-over-year trends. There are several ways you could do this in R, but we’ll use a fairly straightforward one. Subset your data into 2013 and 2014 sets,

fg13 = subset(fgdata, Season == "2013")
fg14 = subset(fgdata, Season == "2014")

then merge() the two by name. This will create one large dataset with two sets of columns: one with a player’s 2013 stats and one with their 2014 stats. (Players who only appeared in one season will be omitted automatically.)

yby= merge(fg13, fg14, by=("Name"))
head(yby)

Year-by-year dataAs you can see, 2013 stats have an .x after them and 2014 stats have a .y. So instead of comparing ISO to SLG, let’s see how ISO holds up year-to-year:

plot(yby$ISO.x, yby$ISO.y, pch=20, col="red", main="ISO year-over-year trends", xlab="ISO 2013", ylab="ISO 2014")

Visually appealing R scatterplot(The ‘pch’ argument sets the shape of the data points; ‘xlim’ and ‘ylim’ set the extremes of each axis.)

Again, a decent correlation–but just *how* decent? Let’s turn to the numbers.

Relations and Correlations

If you’re a frequent FanGraphs reader, you’re probably familiar with at least one statistical metric: r², the square of the correlation coefficient. An r² near 1 indicates that two variables are highly-correlated; an r² near 0 indicates they aren’t.

As a refresher without getting too deep into the stats: when you’re ‘finding the r²’ of a plot like the one above, what you’re usually doing is saying there’s a linear relationship between the two variables, that could be described in a y = mx + b equation with an intercept and slope; the r² is then basically measuring how accurately the data fits that equation.

So to find the r² that we all know and love, you want R to create a linear model between the two variables you’re interested in. You can access this by getting a summary of the lm() function:

summary(lm(yby$ISO.x ~ yby$ISO.y))

R linear model summaryThe coefficients, p-values, etc., are interesting and would be worth examining in a more theory-focused post, but you’re looking for the “Multiple R-squared” value near the bottom–turns out to be .4715 here, which is fairly good if not incredible. How does this compare to other stats?

summary(lm(yby$BsR.x ~ yby$BsR.y))
> Multiple R-squared:  0.4306
summary(lm(yby$WAR.x ~ yby$WAR.y))
> Multiple R-squared:  0.1568
summary(lm(yby$BABIP.x ~ yby$BABIP.y))
> Multiple R-squared:  0.2302

BsR is about as consistent as ISO, but WAR has a smaller year-to-year correlation than you might expect. BABIP, less surprisingly, is even less correlated.

Let’s do one more basic statistical test: the t-test, which is often used to see if two sets of numeric data are significantly different from one another. This isn’t as commonly seen in sports analysis (because it doesn’t often tell us much for the data we most often work with), but just to run through how it works in R, let’s compare the ISO of low-K versus high-K hitters. First, we need to convert the percentages in the K% column to actual numbers:

fgdata$K. = as.numeric(sub("%","",fgdata$K.))/100

then subset out the low-K% and high-K% hitters:

lowk = subset(fgdata, K. < .15)
highk = subset(fgdata, K. > .2)

Then, finally, run the t-test:

t.test(lowk$ISO, highk$ISO)

R T-test resultsThe “p-value” here is about 4.5 x 10^-11 (or 0.000000000045); a p-value less than .05 is generally considered significant, so we can consider this evidence that the ISO of high-K% hitters is significantly different than that of low-K% hitters. We can check this out visually with a boxplot–and you thought we were done with visualization, didn’t you?

boxplot(lowk$ISO, highk$ISO, names=c("K% < 15%","K% > 20%"), ylab="ISO", main="Comparing ISO of low-K% vs. high-K% batters", col="goldenrod1")

Visually appealing R boxplotSo now you can do some standard statistical tests in R–but be careful. It’s incredibly tempting to just start testing every variable you can get your hands on, but doing so makes it much more likely that you’ll run into a false positive or a random correlation. So if you’re testing something, try to have a good reason for it.

…And Beyond

We’ve covered a fair amount, but again, this only begins to cover the potential R provides for visual and statistical analysis. For one example of what’s possible in both these areas, check out this analysis of an online trivia league that was done entirely within R.

If you want to replicate his findings, though (which you can, since he’s posted the code and data online!), you’ll need to install packages, extensions for R that give you even more functionality. The ggplot2 package, for example, is incredibly popular for people who want to create especially cool-looking charts. You can install it with the command

install.packages("ggplot2")

and visit http://ggplot2.org/ to learn more. If R doesn’t do something you want it to out of the box, odds are there’s a package out there that will help you.

That’s probably enough for this week; here’s the script with all of this week’s code. In our next (last?) part of this series, we’ll look at taking one more step: using R to create (very) basic projections.


How To Use R For Sports Stats, Part 1: The Absolute Basics

If you’ve spent a sufficient amount of time messing around with sports statistics, there’s a good chance the following two things have happened, in order:

  1. You probably started off with Excel, because Excel does a lot of stuff pretty easily and everyone has Microsoft Office.
  2. At some point, you mentioned to someone that you use Excel to do statistical analysis and got a response along the lines of, “Oh, that’s cool, but you should really be using R.”

Politeness issues aside, they might well be right.

R is a programming language and software platform commonly used, particularly in research and academia, for data analysis and visualization. Because it’s a programming language, the learning curve is a bit steeper than it is for something like Excel–but if you dig into it, you’ll find that R makes it possible to do a wider variety of tasks more quickly. If you’re interested in finding interesting insights with just a few lines of code, if you want to easily work with large sets of data, or if you’re interested in using most any statistical test known to man, you should take a look at R.

Also, R is totally free, both as in “open-source” and as in “costs no money”. So that’s nice.

In this series, we’ll learn the basics of working in R with the goal of exploring sports data—baseball, in particular. I’m going to presume that you have no background whatsoever in coding or programming, but to keep things moving, I’ll try not to get too bogged down in the details (like how “=” does something different from “==”) unless absolutely necessary. This guide was made using R on Windows 7, but most everything should be the same on whatever OS you use.

Okay, let’s do this.

Getting Started

You can download R from https://cran.rstudio.com/.

You’ll have to click on a few links (you want the ‘base’ install) and actually install R, but once that’s done you should have a screen that looks like:

Screenshot #1: R consoleThe “R console” is where your code is soon going to run–but first, we need some data. Let’s take FanGraphs’ standard dashboard data for qualifying MLB batters in 2013 and 2014. Save it as something short, like “FGdat.csv”. (If you have a custom FG dashboard or just want to take a shortcut, you can just download the data we’ll be using here.)

In R, we’ll be focusing mostly on functions (that look like, say, function(arg1, arg2)), which are what actually do things, and naming the output of these functions so we can refer back to it later. For example, a line of R code might look like this:

fgdata = read.csv("FGdat.csv")

The function here is the read.csv(), which basically means “read this CSV file into R”, and the argument inside is the file that we want to read. The left part (fgdata =) is us saying that we want to take the data we’re reading and name it “fgdata”.

This is, in fact, the first line we want to run in R to load our data, so type/paste it in and hit Enter to execute it.

(You may get an error like cannot open file ‘FGdat.csv’: No such file or directory; if you do, you likely need to change the directory that R is trying to read files from. Go to “File” -> “Change dir”, and change the working directory to the folder you saved the CSV in, or just move the CSV to the folder R has listed as the working directory.)

If you didn’t get an error and R simply moves on to the next line, you should be good to go!

Basic Stats

The head() function returns the first 6 rows of data; since our data set is named “fgdata”, we can try this out with the line of code:

> head(fgdata)

R Screenshot #2: head(fgdata)And to get a basic overview of the entire data set, there’s the summary() function:

> summary(fgdata)

R Screenshot #3: summary(fgdata)See! Already, data on 20 variables in the blink of an eye.

“1st Qu.” and “3rd Qu.” are the first and third quartiles; the mean, median, minimum and maximum should be self-explanatory. So we can see that the average player in this data set had roughly a .270 average with 17 dingers and 10 steals in 146 games–not far from Alex Gordon’s 2014, basically.

Want to compare how the 2013 and 2014 stats stack up? R makes it pretty easy to pick out subsets of data. It’s called, reasonably, the “subset” function, and all you need to include is the data set you’re taking a subset of and the criteria the subset data should conform to.

Since we have “Season” as a field in the table, we just need to say “Season == “2013”” to get the 2013 players and “Season == “2014”” to get the 2014 players. We’ll name these new data sets ‘fg13’ and ‘fg14’:

> fg13 = subset(fgdata, Season == "2013")
> fg14 = subset(fgdata, Season == "2014")

A quick check should confirm that, yes, the data did subset correctly:

> summary(fg13)

R Screenshot #4: summary(fg13)and now we can do some basic statistical comparisons, like comparing the mean BABIPs between 2013 and 2014. (To single out a specific column in a data set, use the $ symbol.)

> mean(fg13$BABIP)
> mean(fg14$BABIP)

You can do whatever basic statistical tests you like–sd() for the standard deviation, et cetera–and pull out different subsets of the data based on whatever criteria you like. So “HR > 20” for all players who hit more than 20 home runs, or “Player == “Mike Trout”” to get data for all players named Mike Trout:

> fgtrout = subset(fgdata, Name == "Mike Trout")
> fgtrout

R Screenshot #5: fgtroutLastly, it’s not too common to need to reorder your data in R, but if you do, you can do so with the order() function. This line sorts the data by wRC+, ascending order:

> fgdata = fgdata[order(fgdata$wRC.),]

then returns the top 10 rows:

> head(fgdata, n = 10)

You can sort in descending order by placing a minus sign before the column:

> fgdata = fgdata[order(-fgdata$wRC.),]

R Screenshot #6: head(fgdata, n = 10)And, as you’ve probably noticed, most of these functions can be tweaked or expanded depending on the different arguments you use–adding “n = 10” to head(), for example, to view 10 rows instead of 6. One of the more fascinating and infuriating things about R is that pretty much every function is like that–but at least they’re all documented!

And, of course, you can access the documentation through a function. Use help() (help(head), help(summary), etc.) and a page will pop up with the arguments, and more additional details than you probably ever wanted.

Wrap-up

One final note: typing code directly into the console is fine, but it gets a bit annoying if you want to write more than a line or two. Instead, you can create a new window within R to load, edit and run scripts. In Windows, use “Ctrl+N” to open a new script window. Type some code; to run it, highlight the lines you want to run and hit “Ctrl+R”.

You can also use these windows to save your R script in R files–as I’ve done here for all the code used in this article. Feel free to download and start tinkering.

So those are the basics of R; not enough to really show its potential, but enough to start experimenting and exploring as you wish. For Part 2, we’ll start some data plotting and correlation tests, and in Part 3 we’ll try to recreate some basic baseball projection models. I actually haven’t done this before in R, so it should be interesting. Stay tuned!

(Thanks to Jim Hohmann for helping test this article.)


How to Make a Poor Man’s Heat Map

Presumably you’ve got some data. Maybe it’s PITCHf/x data, maybe it’s just a bunch of data points, and you want those represented in a heat map. How do you make this happen? Well, in the spirit of leading with the lede:

Just make a scatterplot and reduce the opacity of the dots.

Yeah, it’s not elegant and it’s not truly a heat map — not algorithmically calculated like the fancy stuff Jeff makes at Baseball Heatmaps or the complex zone-grids over at FanGraphs. But, hey, for many people, it should be enough.

Let’s go ahead and make a faux heat map together!

Let’s start by ripping out some Expanded Tabled Data from Brooks Baseball and pasting that data in Excel.

For the sake of this post, we’re looking at Chris Heston’s no-hitter from June 9, 2015. Once we plop that data into Excel, we’ll want to cruise on down to the columns titled “px” and “pz.” Highlight these columns and plop in a scatterplot:

It's hideous right now, but we'll make it beautiful.
It’s hideous right now, but we’ll make it beautiful.

Right click on those blue diamonds and choose “Format Data Series.” Then, in the ensuing popup, browse to the “Marker Options” tab and change the markers to circles:

Circle icons are not only best for faux heat maps, they are also just more appropriate (especially for baseball) in general.
Circle icons are not only best for faux heat maps, they are also just more appropriate (especially for baseball) in general.

Then go to the “Marker Fill” section and choose “Solid fill.” We can then crank that transparency down:

You will want the dots pretty close to fully transparent, though this will depend on the amount of data points you have. Here, we only have about 100 and they are pretty widely dispersed, so we'll go with 60% transparent.
You will want the dots pretty close to fully transparent, though this will depend on the amount of data points you have. Here, we only have about 100 and they are pretty widely dispersed, so we’ll go with 60% transparent.

Now we need to get rid of the pesky blue outline on each of the data points. Head to “Marker Line Color” and change that to “No line.”

Without the marker outlines, the chart looks a lot more heat map-ish.
Without the marker outlines, the chart looks a lot more heat map-ish.

So we’ve effectively got a heat map now. With a little bit of tweaking (moving the Y-axis to the left, changing the background color, making the markers bigger, and adding a quick little strikezone box — pretty much all of which you can accomplish through the Format Data Series window), we can get something like this:

The transparency causes overlapping data points to appear darker, which is the key component of a heat map.
The transparency causes overlapping data points to appear darker, which is the key component of a heat map.

I should mention the strike zone is just a rectangle shape I inserted, and is not necessarily accurate. I set the fill to No Fill and made the border a hashed line. For specific guidance on where to put the strike zone, I suggest using Mike Fast’s article here, and then just eyeballing a shape like I did here.

Heat maps are really most useful when you have many hundreds of data points, not just one hundred. For instance, here’s a little heat map (took less then three minutes to make) that shows the correlation between HR totals and wRC+ (offensive production) for individual hitters’ seasons from 2000 through 2014. I threw in a reference line for 100 wRC+ (league average) and a linear regression trendline (in the Chart Tools > Layout section):

The heat map style here let's us see how the majority of hitters are clustered 6 and 32 homers, and between 74 and 148 wRC+.
The heat map style here let’s us see how the majority of hitters are clustered 6 and 32 homers, and between 74 and 148 wRC+.

Here’s another one. This one took more time to cook up, but it’s 1494 records — every NFL quarterback season since 1920 who’s thrown at least 10 touchdowns. So that’s a line for every QB for every season, thus Drew Brees — for his 2009 and 2011 seasons — occupies the top two spots in the completion percentage column.

Anyway, a heatmap here really helps us see the clustering around the middle:

The trend line show a weak correlation, but the heat map tells us where the data is clustered -- and how small sample sizes and random data points might be throwing  off the calculation.
The trend line show a weak correlation, but the heat map tells us where the data is clustered — and how small sample sizes and random data points might be throwing off the calculation.

The transparency on the markers is at 91% (you’ll want higher transparencies for certain colors and for more data).

It’s not technically a heat map, but if you need to throw together some data to make or display a point, the built-in tools in Excel will certainly work in a pinch.


All the Things You Wanted to Know About Conditional Formatting in Excel

What are conditional formats? Simply put: They make a wall of data more readable. Tell me which dataset can you more quickly identify the best player:

Not only do the conditional formats catch the reader's attention, they also help us see the outlier data more easily.
Not only do the conditional formats catch the reader’s attention, they also help us see the outlier data more easily.

Here’s a great example of a time where conditional formating helps a lot. We’re looking at the 2005 NFL Draft. It makes sense to organize it by the order the players were picked, but our emphasis is on Career Average Value (CarAV) and Drafted Team Average Value (DrAV) — two simple, but useful stats that Pro-Reference-Football.com provides to estiamte a player’s total worth.

The conditional formatting immediately draws our attention to DeMarcus Ware, Aaron Rodgers, and Logan Mankins — the three most valuable players from the first round. And the conditional formatting draws our attention to these players while also helping us notice — immediately — that neither Rodgers nor Mankins were top picks.

The NFL Draft is a great example of when conditional formatting helps most. If I just wanted to compare the Career AV numbers of all the players who entered the league in 2005, irrespective of draft location, then I’d probably make a simple list and sort it large to small. But when we want to preserve a specific order of the data — or want to represent multiple components at once — conditional formatting does a stalwart job.

Here’s another instance, this one unrelated to sports — my spreadsheet for looking for a second car on Craigslist:

Conditional formatting can also stay within a boundary, so to speak, so unlike denominations can be compared side-by-side.
Conditional formatting can also stay within a boundary, so to speak, so unlike denominations can be compared side-by-side.

Here, I care mostly about the Kelley Blue Book value of the vehicle (“KBB Value”), but I also want to know about gas mileage and the other facts of the car. I’ve set up the color coding, though, so that I don’t have to worry about 28 MPG throwing off a $4,000 asking price. Also, the higher the value of the car the more green it is, but the more expensive the asking price, the more red it is.

In other words, I just look for as much green as possible, and that’s my best bet.

So let’s talk about setting up our own table with conditional formatting. First, we need data pertinent for conditional formatting. Let’s go with 2014 NFL Team Efficiency stats from Football Outsiders.

First, I scrape the data with a little copy/paste action. Just highlight the table area in the middle and paste it into your Excel document. I recommend pasting without formatting. To do this, just right click on the spreadsheet and choose the special paste icon:

Pasting without formatting keeps the spreadsheet simple and readable.
Pasting without formatting keeps the spreadsheet simple and readable.

Now, after a little cleaning up — deleting those extra headers in the middle of the data, combining the two-line headers into a single line, and moving those headers above the appropriate column — we have something like this:

Now our data is neater, but it's still too much to digest in one glance.
Now our data is neater, but it’s still too much to digest in one glance.

This is another ordinal setup — except we’re not looking at draft positions, but DVOA* rankings.

*DVOA stands for Defense-adjusted Value Over Average. It’s Football Outsider’s total value measurement, much like WAR is for Baseball-Reference and FanGraphs.

But one of the big problems with an ordinal ranking is that the space between No. 1 and No. 2 may not be the same as between No. 2 and No. 3. So conditional formatting helps us see tiers and groupings much more easily.

In order to add a conditional format to this data, we just need to highlight the C column and choose Conditional Formatting > Color Scales > the appropriate color scale.

Color scales are the most typical conditional format -- and they tend to be the most useful.
Color scales are the most typical conditional format — and they tend to be the most useful.

Even though we highlighted the entire column, only the rows with data in them show the conditional format.

So that’s how you set up a basic conditional format! You can play around with the different format types and see which ones you like. There’s no harm in slapping a conditional format on top of another conditional format — as long as you have the same cells selected, it will just delete the old format and apply the new one.

But let’s say you have an issue like we have in Column I:

A negative DVOA on defense is actually a good thing. So I'd rather have those bars appear green.
A negative DVOA on defense is actually a good thing. So I’d rather have those bars appear green.

If you ever have a format that’s not working quite right, just click wherever the format is looking weird, and choose “Manage Rules…” from the Conditional Formatting drop down:

If a format is giving you guff, head to the "Manage Rules..." area.
If a format is giving you guff, head to the “Manage Rules…” area.

If you have the delinquent cells selected (and your Conditional Formatting Rules Manager is set to show “Current Selection” rules), you should see the formatting rules in the window:

This is kind of the go-to place for adjusting conditional formats and making really fun and unique formats.
This is kind of the go-to place for adjusting conditional formats and making really fun and unique formats.

If you double-click on the format name (“Data Bar” in this instance), you will open the “Edit Formatting Rule” window. This window (and the windows nested inside it) allow us to do a lot of fun stuff.

I’m presently happy with most of what’s going on in Column I, so I really only want to change two things: The positive color and the negative color. So in the Edit Formatting Rule window, I will change the bar color to red:

This allows me to change the default bar color. I want red because a positive defensive DVOA is a bad thing.
This allows me to change the default bar color. I want red because a positive defensive DVOA is a bad thing.

And then, right beneath that, I’m going to click the “Negative Value and Axis…” in order to change the red bars to green:

After you apply these changes on the Conditional Formatting Rules Manager window, the column should update.
After you apply these changes on the Conditional Formatting Rules Manager window, the column should update.

But let’s say I want to do something even more complicated. Let’s say I want to highlight the teams with bad special teams — and I don’t want to just highlight the special teams column. I want the whole row to broadcast the shame of their punters and kickers.

So I’m going to create a special conditional format. First, I’ll highlight the entire table other than the titles (from A2 to L33). Then, I’ll click Conditional Formatting > “New Rule…” to open the New Formatting Rule window.

More complicated conditional formatting will often require formulas.
More complicated conditional formatting will often require formulas.

So, because we want everything to look at the K column and change its format based on what’s in the K column, I’m going to write a formula that says:

=IF($K<0,1,0)

All this formula says is:

  • =IF: This creates an IF formula. The syntax asks for (1) a formula that can be proved true or false, (2) a value for if the formula proves true, and (3) a value for if the formula proves false.
  • $K2<0: I’m telling Excel to stay in Column K — that’s what the $ in front of $K means. So if the K columns is negative (<0), then the formula is true. The conditional formatting is going to start with the highest row in the selected area, so since our selection begins with A2, we’ll reference $K2 because that’s on the same row. (If we put $K3, it would look at the row beneath the current row.)
  • 1,0: If the formula is true, then 1. If false, then 0. This tells Excel to apply the conditional format (1) if the formula is true (if Column K is negative), and to not format (0) if the formula is false.

After I enter the desired formula, I will set the formatting. Do whatever you want here. Change the fill. Change the font color. Make it bold. The pop up in the “Format…” window is just a typical Excel formatting window, so it should be easy to navigate.

I went ahead and set the format to a dark red fill and a bold white font. This will make the formatted rows very obvious, but also make the the table really busy visually — but I’m doing this for the learning, not for the beauty.

Anyway, we get this:

NOTE: You will need to hit "OK" and then "Apply Changes" or "OK" before the new format appears.
NOTE: You will need to hit “OK” and then “Apply Changes” or “OK” before the new format appears.

Let’s do one final this: Change format priorities. Notice how our conditional formatting in Column C is getting smushed by our new special teams formatting? Well, that’s no good.

So let’s open the “Manage Rules…” dialogue. then, look at the formatting rules for “This Worksheet”:

Regardless of what cells you have selected, looking at "This Worksheet" will show all the conditional formats on the present tab.
Regardless of what cells you have selected, looking at “This Worksheet” will show all the conditional formats on the present tab.

Then, with the special teams format select, let’s press the down button (note the second red arrow above) until it’s at the bottom of the list. Hit Apply or OK and you should get something like this:

Uh oh! The special teams format changed the fonts in Column C too!
Uh oh! The special teams format changed the fonts in Column C too!

This is a good cautionary tale about changing font colors and formats. Since we’re using a simple conditional format for Column C (as in, not a formula-based format), we can’t edit the fonts in that column. So the only real solution is to change the special teams font — or to selectively apply that format.

The second option is simple enough. Just open the “Manage Rules…” window and change the selection area for the format:

We can type in the selected areas and separate the selections with a comma, or we can click and drag to select the first area, then -- hold CTRL -- click and drag to select the second area.
We can type in the selected areas and separate the selections with a comma, or we can click and drag to select the first area, then — hold CTRL — click and drag to select the second area.

All we need to do is change the “Applies to” textbox. Once we select the A2 to B33 area and the D2 to L33 area, the weird formatting disappears from Column C.

The final step to using conditional formatting is presenting that data. Microsoft has done a good job catching up with Google Drive and the sort, allowing us to post table and the sort online. Using File > “Save & Send” should direct you to the SkyDrive services that will enable you to post the spreadsheet online. If there’s enough interest, I can walk through the process for this as well.

Anyway, I hope this has been interesting and useful. Happy Exceling!


Presenting Three or More Dimensions Using Tableau

Whoa! What’s a Tableau, you ask? Well, I have an even more basic Tableau Public tutorial for the likes of questioning minds such as yours. Although, this article is pretty basic too, so either should be decent starting places.

Tableau is a powerful, unique visualization tool. The fact it’s also free is a little unbelieveable. One of the reasons I love Tableau so much is that it allows me to present multiple dimensions of data in a single chart — and do so without ungodly 3D charts.

What do I mean? Well, there’s a great example in the newest addition to the FanGraphs suite of data — the contact rate (Soft%, Medium%, and Hard%) numbers. So let’s say I want to present this data* for the Rays hitters. I’m mostly interested in the contact rates, so I could put together a scatterplot of Hard% versus Medium%.

*If you want to play around with the exact same data that I’m using, download this CSV. Otherwise, you data will be different than mine because you, sir or madame, live in the future.

What’s neat about this chart is that, since Soft%, Medium%, and Hard% are mutually exclusive (a batted ball can’t be both hard and medium) and they are collectively exhaustive (there’s no other hit type, only these three; combined, we called this data MECE, mutually exclusive and collectively exhaustive), we can essentially present the three dimensions in a single scatterplot:

With Excel, we can kind of represent three data dimensions (Soft%, Medium% and Hard%), but that's only a happenstance of MECE data.
With Excel, we can kind of represent three data dimensions (Soft%, Medium% and Hard%), but that’s only a happenstance of MECE data.

So let’s say we wanted to add another component of information to this graph. Let’s say I want you to know which dot is which player. Or, perhaps, the amount of plate appearances corresponding to each of these players. In Excel, we could add a data label, but we would need to go through, by hand, and add each player’s name to the corresponding dot. Excel only inherently gives three label options: X labels, Y labels, or Series labels — and none of those are really useful here.

Let’s try this same chart in Tableau. First, though, we’ll need to work on the CSV a little. I’ll show you what I mean.

Open Tableau Public (preferably the latest version; I think that’s version 9), and you’ll be prompted to open your data source. Choose “Text File” and then navigate to your CSV file.

If you have a CSV or TXT file, this is the option you want.
If you have a CSV or TXT file, this is the option you want.

I should mention at some point that, in Tableau Public, we rarely create data. It’s more about manipulating and presenting what’s already made. So there’s no option for “Blank Spreadsheet” like there is Excel.

Anyway, after connecting to our CSV, Tableau is going to confirm our data has the right settings. And thank goodness for that, because something’s awry:

A space between the number and the percent sign caused Tableau to think this was a string (that is, like a word or something). So we need to fix that.
A space between the number and the percent sign caused Tableau to think this was a string (that is, like a word or something). So we need to fix that.

The system sees the space between “33.3” and “%” and thinks it’s a word (because spaces can’t fit into data). That’s what the blue “ABC” icon means.

This problem is easily fixed a variety of ways. One way: You could open the CSV in Excel and save it as an Excel file. That’s a pretty simple fix. Another alternative is just to scrape all those pesky spaces out of there. I prefer to do this with Notepad (or any similar stripped down word processor).

For that method (which is handy if you’re on a computer without Excel), all we need to do is open the file with notepad, hit CTRL+H (to open the “Replace” dialogue) and then choose to replace a space with nothing.

Open Notepad, then open a file and set the file types too "All Files."
Open Notepad, then open a file and set the file types too “All Files.”
Then I type space (" ") then percent sign ("%") and choose to replace all.
Then I type space (” “) then percent sign (“%”) and choose to replace all.

Save it, then bing, bang, bongo, the file is ready to do work. Head back into Tableau, and then ensure the data is showing up correctly. Once again, our data is not defaulting to decimal, so we’ll quickly change these items to decimal numbers (just click the blue “ABC” and choose “Decimal” from the drop down menu).

You will still need to go through and make sure the pertinent columns are being treated as decimal numbers.
You will still need to go through and make sure the pertinent columns are being treated as decimal numbers.

After you’ve got your data looking correct, head on over to Sheet 1 (the automatically generated tab in the lower left of the screen). You will now be in the basic worksheet interface.

For our purposes, go ahead and drag Med% to the Columns section and Hard% to the Rows. Then, pull the Names dimension onto the Detail section.

From here out, it's pretty much click and drag.
From here out, it’s pretty much click and drag.

NOTE: You may need to click on the “Show Me” button on the top right to change to a scatter plot.

The resulting scatter plot looks pretty similar to — and essentially has the same pieces as — the previous Excel chart we made:

It takes only a few actions to recreate the basic scatter plot we made in Excel.
It takes only a few actions to recreate the basic scatter plot we made in Excel.

But now let’s expand it with more information. For one, I want the users to know the sample size of each of these dots. I’m looking at all position players on the Rays roster, but that includes even Curt Casali who — at the time of pulling this data — had only 2 PA. To express these differences, we need merely drag the “PA” measure to the “Size” button.

Likewise, I can present how well each of these players is hitting by dropping the “wRC+” measure into the color section. And for even more clarity, I can name each dot with the corresponding player it represents:

Adding different visual manifestations of the data is a simple process in Tableau Public.
Adding different visual manifestations of the data is a simple process in Tableau Public.

None of these dimensions are feasible with an Excel plot, chart, or graph. We would need to make these size, color, and label changes by hand in Excel. But in Tableau, it’s a flick of the wrist.

What’s more, we can clean up this data with the addition of a filter — and a quick filter to allow users to manipulate the filter too:

Adding a filter allows not only the Tableau creator, but also the end product user to adjust the featured data.
Adding a filter allows not only the Tableau creator, but also the end product user to adjust the featured data.

When we get the chart to basically where we want it, we can then throw it into a dashboard. A dashboard is the final shape the worksheet will take. Sometimes I combine multiple worksheets into a single dashboard to present a single idea. Other times I use a single worksheet for a single dashboard. We’ll do the latter in this instance:

Putting the chart into a dashboard will ultimately give us something to embed into a blog post or website. It also gives us keys for the sizes and colors.
Putting the chart into a dashboard will ultimately give us something to embed into a blog post or website. It also gives us keys for the sizes and colors.

The most beautiful thing about using Tableau, of course, is that the end product doesn’t have to be a static image. This allows us to embed even more information into the system — for instance, anything we add to the Detail section will appear in the popup when users hover their mouse over given data points.

After a little spicing up with the formats (such as fixing the dimensions for the X and Y axes, adding a linear regression line, and adding a few text boxes to indicate the general Soft% areas), we get a final version like this:

When we combine all these data points together, we can see interesting oddities in the data. For instance: Rookie Joey Butler is having a great year, hitting a 156 wRC+. But looking at his placement on the graph, we see he has a lot of non-hard contact for a guy with such a high wRC+. Likewise, Tim Beckham — the light blue dot in the top right — has crushed the ball this season, but is not showing a strong wRC+.

I should note the limited forecast value of this kind of data. While fascinating (and a great sample for Tableau to flex some muscles), this data does a much worse job predicting future results than a simple glance at these player’s ZiPS or Steamer projections.

That said: How fun is this chart? I think it’s a blast, and I hope it inspires you to present more dimensions of data — in a neat and understandable way — in your next visualization.

Happy Tableauing!


How to Create a Great GIF (By Not Actually Doing So)

Facebook recently made headlines by announcing that they would start supporting the embedding of animated version of the Graphics Interchange Format, more widely known as the GIF. This was a big deal, apparently, because people still care about GIFs. But they shouldn’t — at least not in a way specific to the actual file format.

To me, the GIF has become something bigger than itself. Its name has become ubiquitous with the idea of a short video clip that people find entertaining. Just like how we refer to all facial tissues as Kleenex or all large trash bins as Dumpsters (yes, it’s a registered trademark), a GIF doesn’t really need to be a .gif file per se, it just needs to be a short and shareable video. The problem is, the actual GIF format is pretty outdated and inefficient when it comes to embedding video clips. It’s a dinosaur. It’s Windows 95. We don’t need it anymore. We can make GIFs better. We have the technology.

Back in the day, the main pull of the GIF format was that it was supported on all browsers (e.g. Internet Explorer and Netscape Navigator). You didn’t need any special plugins or software to see a GIF on a web page because it was an image file, like a .PNG or .JPG. GIFs were special in that one could layer multiple images on top of one another to create an animation effect. As expected, this still lead to large files and long load times for web pages. But it’s all we had.

(Not so) long ago, creating GIFs was a pretty big pain. You had to finagle some software to record the video from your screen, then split up the video into frames with a program like Photoshop or GIMP. The results were rarely the same, and there was usually some sort of stuttering or ghosting or another weird issue that made the thing look off a bit. They were usable, but not always pretty.

Today, Flash is pretty much dead and every major browser supports the most common video codecs. HTML5 ushered in a new way to display video — a way that was baked in to the HTML standard. Long story short — if your computer (or mobile device) has any kind of modern browser installed, it can support HTML5 video. It’s leaner and loads quicker, and we all should be adopting it.

We here at TechGraphs have gotten requests to give a how-to on creating GIFs. But I’m not going to do that. Because, as Managing Editor, I’m making a statement on behalf of the site. We’ve pussyfooted around the issue before, but we’re making a stand. The GIF is dead. It’s time to put in on an ice flow floe and let it float on into the Great Beyond. It’s been supplanted by younger and stronger technology. No, I will not be showing you how to create a GIF. I will be showing you how to create short video clips that are easily shareable on Facebook, Twitter, your Tumblr page or your blog. We’ll be utilizing two different sites for this tutorial — Gfycat and something we’ve featured here before, Streamable.

To create these videos, we’ll first need, well, some video. There are a couple ways to go about this. Both Streamable and Gfycat allow you to simply input the URL of your video and let them do the converting. However, Gfycat only allows 15 seconds worth of upload. Streamable does allow larger uploads and the ability to trim time off, but long videos still lead to long upload and conversion times. It’s probably best to get the video on your computer and work with it from there. To do so, I’ve had good success with a Chrome Extension called Video Downloader Professional. This extension will search the code of the page you are on and will find the source file for any video used in that page. You just click the little green arrow next to your address bar and it will give you the option to download the video. A couple notes: you may have to start playing the video for the extension to pick it up. Also, if the video has an ad, it might show up in the list of available downloads. Just make sure you are grabbing the right video. The file sizes are displayed on this list. You almost always want the longer one.

So, let’s say I want to make a clip of Joey Gallo’s first career home run from last night. (I’m using an MLB clip here, but this will work in many sites, including YouTube). I use the extension to download the video into my Downloads folder. Now, it’s fairly long, so I probably should edit it down a little. How you do this depends on your OS. If you have a Mac, you can use the built-in Quicktime Player to quickly trim down a video. Windows users can download the free Windows Movie Maker and use that to edit the clip. Read the tutorials and get comfortable with the software. You’re not re-making Jaws here. You’re just cutting a single video clip. Remember, Gfycat only allows 15-second clips, so edit accordingly if you’re planning to use that service. Once the clip is edited, save it somewhere. It is now time to upload.

Uploading to Gfycat

Go to Gfycat and click the Upload button in the upper right. Click Browse and find the video on your computer. It will begin uploading and converting.

gfycat1

Click the newly-created link to go the clip page. Here, when you move your mouse to the right, you’ll see a couple buttons. One is to create an actual .gif file (IF YOU MUST), and the other is to grab links.

gfycat2

You can get a straight link, and some code to embed the clip in a webpage. Like this! You can tweak that embed link a little to change the height and width of the clip if your blog has certain size constraints. This creates your standard GIF-looking clip. This is good enough for most applications, but what if you want more an actual video experience? What if you want a little longer clip? What if you want sound? Sure you could use your phone to take a Vine of your computer screen, but we’re better than that. Let’s use Streamable to create a quality clip that’s still shareable.

Uploading to Streamable

Go to Streamable and click the Upload button right in the middle. Again, find your video and Streamable will start uploading and converting. streamable1 Much like Gfycat, you get options to add a title and share the link.

streamable2

If you click Share, you again get options to copy a regular link or an embed code. Streamable offers some options to change the height and width of the video as well as enabling autoplay and muting. Clicking both will give you a GIF-like experience, but leaving them off will make it feel more like a video. The choice is yours, but it’s certainly nice to have the choice.

These clips will load faster and won’t bog down the browser. They won’t take up space on your servers. They aren’t quite GIFs, but that’s OK. They’re better. You can have the best of both worlds. Short, shareable clips will be around for a long time, but not in the aging format we’ve been using. You now have the knowlege. The tools are free and easy to use. Be a part of the future.

(Header image via Eric Norris)

Streamable Has a Simple and Quality Clip Maker

My love of GIFs stemmed from the early reaction shot days, but has since morphed into a variety of topics ranging from bat flips to goals, to, okay, still some reaction shots. The GIF is being phased out across the board as HTML5 and GIFV services are better equipped to handle a longer clip in higher quality, and recently the site Streamable has been popping up on my radar. As an inquisitive nerd, I dove right in.

Right off the bat you’re given the option upload from your computer or via URL. According to Streamable, the clips utilize HTML5, or in the rare case where that is unavailable, Flash. You’re not forced to create an account to view a clip or even to upload one, though in order to browse your own uploads it is encouraged to make a free account.

Unfortunately at this time there is no search or index option for other clips, so if you come across something you really like, you’ll need to favorite it. Say, maybe a clip of Leo Messi stealing Jerome Boateng’s soul, then chipping Manuel Neuer in the opening leg of the UEFA semi-finals.

(Editor’s note: Some Ad-Blocking extensions seem to block these embeds. If you see nothing, disable ad blocking.)

Where Gfycat doesn’t have a size limit, Imgur limits animated uploads to 200 MB. Streamable also has a limit, but it’s a 2 GB and 10 minute clip, which at 10 minutes, is longer than an average trailer and some short movies.

The interface is concise and uncluttered. If you choose to parse down a YouTube clip of a go-ahead shot from Game 7 of an NBA playoff series, it’s very easy. Once you have the URL, simply click “Paste from URL” and allow the clip to upload. As soon as it is in the Streamable editing system, feel free to trim it down from either side as the picture below shows.

streamable

After settling on the desired starting and ending points, click create clip and voila, you’re all set.

Streamable offers a fast and simple service and if you’re looking for a GIF-type clip but with sound — it’s the best option short of a full YouTube upload. While I’d certainly love to see a searchable index of clips like Imgur and Gfycat offer, I’ve already began to use Streamable as my go-to option for highlights, even if the highlight is against my favorite baseball team:


How to Split Strings of Sports Data into Individual Cells in Excel

This is a really simple trick that I think a lot of people will never need, but a few people will love forever. It’s this problem:

Year   Age  Tm  ERA   G     IP  FIP
1906    29 CHC 1.04  36  277.1 2.08
1907    30 CHC 1.39  34  233.0 1.97
1908    31 CHC 1.47  44  312.1 1.87

Here’s Mordecai Brown’s 1906 through 1908 stats from Baseball-Reference. But some nimcompoop has pasted them as pre-formatted values. Paste this biz into Excel and you get something like this:

This is very unhelpful data.
This is very unhelpful data.

Well, you’re in luck. Because there’s a super easy way to make this into legit data. Go to the appropriately-named “Data” tab and you’ll find the “Text to Columns” function:

Hello there, handsome!
Hello there, handsome!

With the data selected, click on this “Text to Columns” button. A pop up window will then prompt you whether you want to treat the text as “Fixed width” or “Delimited.” These concepts are weird-sounding, but actually quite simple:

  • Fixed-width: The data has spaces inserted into it so that it visually lines up. When you’re pasting from pre-formatted text, this is almost always the right choice.
  • Delimited: This is text that has special characters within it to help signify the differences between columns. So, if you’re pasting something from a CSV file (a file that uses commas between each data point), then delimited would be the right choice. You can also use delimited if there are spaces between data points — and this can be very helpful when translating a list not intended for spreadsheet use into a spreadsheet.

For this instance, let’s use Delimited to see what happens:

Highlight the text and then open the conversion wizard.
Highlight the text and then open the conversion wizard.

Step 2 asks us what our delimiters are. We can choose one of the usual delimiters (like tabs and commas) or we can use spaces or even insert our own characters (this can be usual for when, say, we scrape data off a Korean baseball site and special Korean characters divide the columns).

With “Tab” selected, you’ll probably notice the data in the preview box doesn’t change — as in, there are no vertical lines indicating the creation of new columns. If you click the “Space” box though, you should see something like this:

Hy! Look! New columns!
Hey! Look! New columns!

Note the check next to “Treat consecutive delimiters as one.” That is important, because without it, we end up with a lot of columns we don’t need:

Yup, that's too many columns.
Yup, that’s too many columns.

In the third step, we have the option to change the data formats and choose an alternative location for the new separated data to appear — both of which are handy — but in general, our work his here is done. Hit “Finish.”

That should give us something like:

Now the data is separated into unique columns. Note how Excel automatically identified the numbers as numerical despite originating within a text string.
Now the data is separated into unique columns. Note how Excel automatically identified the numbers as numerical despite originating within a text string.

So, the alternative to delimited data is fixed-width. Let’s go Pro on this thing and used the fixed-width setting. But this time, I’m going to use data from my Scoresheet league. I encourage any other Scoresheet users to play along with their own data.

I really like Scoresheet. It is one of the very few fantasy baseball sites to include defense. In fact, it might be the only one. But it has problem. The site’s interface was designed by DARPA to fend off the Soviets, or at least that’s what it feels like:

The 1980s called, and they want to warn us about a time-fissure in the land-line system of the 1980s that allows them to communicate with the future. They also browsed our internet, saw the Scoresheet interface, and they want their design principles back.
The 1980s called, and they want to warn us about a time-fissure in the land-line system of the 1980s that allows them to communicate with the future. They also browsed our internet, saw the Scoresheet interface, and they want their design principles back.

So, here’s my undrafted players list (and before you league-mates start thinking you’ve got a peak at my undrafted players list, just know: I’m no longer in the market for a second catcher or extra bat, so ogle all you want). In the middle, big frame, I have a list of players organized first by position and then by playing time accrued (i.e. their real world stats). But what I want to do is combine this data with, say, some projection data like the FanGraphs Depth Charts leaderboard.

But in order to match these undrafted player with the FanGraphs DC projections, I need to first suss out the names of the players. I can achieve that through scraping the numbers of available players here in this undrafted screen and then combine them with the Scoresheet master roster (downloadable here).

For those keeping track at home, I’m combining data like this:

The undrafted list will connect to the master roster which will connect to the FanGraphs  projections which will connect to my  league championship.
The undrafted list will connect to the master roster which will connect to the FanGraphs projections which will connect to my league championship.

We need to scrape the data from the undrafted list. This amounts to clicking in the middle frame of that previous screen, pressing CTRL+A (select all) and then CTRL+C (copy).

Remember, you only need the stuff in the middle frame.
Remember, you only need the stuff in the middle frame.

Pasting that into Excel gets something like this (note, I deliberately pasted using source formatting):

It's so beautiful!
It’s so beautiful!

First, we can get rid of those top two rows. Really don’t need them. BUT DON’T DELETE THAT THIRD ROW. We need that.

Let’s go ahead and split our data. Select the entirety of Column A (just click the “A” at the top). Then run our handy converter (Data > Text to Columns). This time, let’s go with Fixed Width (though, once again, we could probably make it work with either).

We see reach a screen like this:

We can click and drag the arrows to change where the columns start. We can also click on spaces in the ruler section to create a new column break.
We can click and drag the arrows to change where the columns start. We can also click on spaces in the ruler section to create a new column break.

The key here is that the numbers on the far left are isolated into their own column. It’s fine that the word “Pitchers” is getting sliced into multiple columns; we don’t really need it anyway.

Scrolling down the preview panel, I can confirm that all the numbers are indeed isolated into the far left column. Now I’m going to actually use the Step 3 of the conversion wizard.

In Step 3, I can choose an alternative destination for the conversion. I want to do this. So I’m going to change the “Destination” section to the next available column — in other words, change it from $A$1 to $B$1:

We can change the destination by either typing in a new one or just clicking on where we want the first row and first column to start.
We can change the destination by either typing in a new one or just clicking on where we want the first row and first column to start.

Great! Now I’ve got my data separated out:

Now I've got my original data (on the left) sitting next to my split data (on the right).
Now I’ve got my original data (on the left) sitting next to my split data (on the right).

Now I want to get rid of all the blank or white-font lines. These are leftovers from the column titles on the Scoresheet website. First I select all the data (CTRL+A and then CTRL+A again to select everything), and then I sort by the first column.

Sorting Z to A, I find — midway down the data — this blank spot where white-font titles have gather together:

Let's get rid of these now useless titles.
Let’s get rid of these now useless titles.

After I purge those rows, the data should be clean. Row B should be nothing but numbers. Beautiful, delicious numbers.

So let’s go back to the top and add a title row. I really only need the first two columns for this, so I’m going to name them Original and Number and then delete the rest. Then I’m going — once again in the “Data” tab — add filters (Data > Filter). If you had any one of the data cells selected (i.e. not a blank cell selected), then Excel should add a neat little pair of gray arrows at the top like such:

We can now sort this data in more fun ways.
We can now sort this data in more fun ways.

You may also notice my data is already sorted in the preceding image (note the little up arrow inside the gray box). But this data is clearly not sorted numerically or alphabetically — No. 444 Greg Bird is near the top. It is sorted, however, by color.

See how the items with gray fonts are now all in the top together? These are players already drafted or selected for my queue. This is why we need the original formatting in the first pasting job.

So now I’ve got my numbers isolated. Using VLOOKUP (or INDEX and MATCH), matching these players to their full names should be a cinch (also, a CONCATENATE formula will help join the first and last names of the players). And with first and last names, we can do one more VLOOKUP / INDEX-MATCH to join them to their projections.

The net result, with some formatting and careful data pollination, can look something like this:

WONDERFUL, SEXY DATA!
WONDERFUL, SEXY DATA!

Now the real trick will be for me to find a way to hide this article from the guys in my league.


How to Make a Sports Podcast That Doesn’t Suck

Podcasts – like blogs, I suppose – are one of those things that get harder to make well the easier they are to produce. Yes, technology has made it MUCH easier to produce serialized audio content for the Internet, but it’s made it that way for everybody (well, everybody with a couple hundred bucks to spend). In the matter of a few years, the amount of people dipping toes into this pool has exploded. Businesses have been built around podcast hosting. There are now advertising houses that cater specifically to podcasters. There are podcasts about other podcasts. There are podcasts about making podcasts. Earlier this month, I paid $30 to watch three dudes tape a podcast live. Slowly yet surely, this medium is penetrating popular culture.

Recently, a true crime podcast called Serial breached the news cycle. I won’t make my opinions about that program known here, because it’s honestly too late to do that. We’ve all moved on to … I don’t know, something else. But a combination of podcasting’s obscurity and general lazy reporting lead many people to believe that Serial was not only telling an interesting story, but telling it on an entirely new medium. This, of course, is silly. Tell this to Jimmy Pardo or Leo LaPorte or Adam Curry or Jesse Thorn. Hell, most of the people who worked on Serial came from WBEZ’s This American Life, a public radio show that had also been a very successful podcast for years.

Podcasting has served different purposes for different content creators. Popular radio shows will often just release recordings of an episode as a podcast so that people can listen any time they want. Tech journalists got it on podcasting early to enhance their cred by talking nerdy things with other nerds over a nerdy distribution channel. Comedians have used it as an alternative means of creation and self-promotion – sort of a SoundCloud for comedy before SoundCloud existed. And then, there were the sports podcasts.

Podcasting lends itself to sports well for the same reason blogging does – new stuff always happens and everyone has opinions about it. There is never a want for content. There are certainly #HotTakes to be considered, but there is so many other things that can fall within the spectrum. Podcasts can cover a whole sport, a specific team, a specific league, or can attempt to cover all major sports if they so choose. The best thing about sports podcasting is the best thing about podcasting in general – you can kind of do whatever the hell you want.

But just because you can doesn’t mean you should, or, more specifically, doesn’t mean you should without giving it some thought. Yes, podcasts are pretty easy to make now, but anything worth doing is worth a little extra effort. To be honest, there are some pretty sucky sports podcasts out there. Here’s how to make one that doesn’t suck.

A note to begin: While I’m not a famous podcaster by any means, I do kind of know what I’m talking about. I have worked at radio stations. I have created material for national air. I have my own sports podcast that, while episodes come far too sporadically, still gets well-reviewed on iTunes. I’m not an authority by any means. But I listen to a lot of podcasts and make my own and think about it quite a bit. These are suggestions, but they’re still suggestions from a somewhat qualified source.

1. Come Up With a Good Idea

Sports podcasts, not unlike comedy podcasts, are chock full of “two white guys talking”-styled offerings. This is not to say that if you are Caucasian or male that you should just give up, but it’s important to consider what’s out there and what you can do to differentiate. It could be a simple fact of the only current podcasts about Local Sports Team aren’t very smart or well produced or whatever. There’s nothing wrong with just making a better mouse trap. But if you want to discuss or cover a popular topic, you might want to try to break from norms a bit.

Comedy podcasts already saw this coming, and many new ones centralize (even if loosely) around some sort of bit or structure. Professor Blastoff mixes intellectuals and comedians to answer some of life’s biggest questions. Who Charted deals with pop culture charts and a funny and irreverent way. The Adventure Zone features three brothers playing Dungeons and Dragons with their dad.

None of these are really applicable to sports podcasting, but it’s something to think about. Maybe you want to feature reoccurring segments or regular guests or something I haven’t even thought of. It’s about information, but it’s also about entertainment. Never forget that.

Segments and pre-determined topics also help eliminate the awkward “so … yeah”s and “um … what else?”s. Those are show killers. There’s nothing wrong with some natural dead air, but it’s heartbreaking to hear two people dance around the fact that they don’t know what they should be talking about. A little research and a little planning goes a long way.

That being said, if you do something that you later realize isn’t really working, don’t be afraid to jump ship. There’s nothing wrong with ditching a segment or bit that wasn’t really jiving in the first place. Change the format little by little until you find something that works.

2. Make Sure You Really Want to Do This

So you’ve come up with an idea and maybe found a cohost. The plan is to have a one-hour podcast every week. Here’s how a one-hour podcast breaks down.

  1. The recorded conversation will be at least 90 minutes.
  2. There will be at least 10 minutes of technical difficulties that need to get ironed out before recording (at least for the first two dozen episodes).
  3. It will take time to lock down a guest. Between all the emails back and forth, let’s say that counts for 30 minutes of work.
  4. You’ll need to listen to the whole 90 minutes to make sure no audio weirdness (mic dropouts, lawn mowers, loud cars, babies crying) happens and to find where and when to cut.
  5. Add 30 minutes to edit, add the intro and outro music, and compress to MP3 form.
  6. Another 20 minutes to upload and write the description.

Your simple one-hour podcast now takes four and a half hours a week to make, and I didn’t even count the time that goes into research and finding topics. This pushes it to around six hours. My podcast, due to its style and format, takes about three hours just to record and edit a 20-minute episode, and about 10 hours total since there’s a lot of research involved.

This is not meant to be discouraging. It’s meant to show you how much effort this project will take. I’m sure some people don’t spend this amount of time on their sports podcasts, and it honestly shows in most cases. Good content takes time, whether it’s graphic art or writing or podcasting. Be prepared to put in the work. If you’re not prepared to do so, maybe think about doing something else.

3. Have a Decent “Studio”

If this list weren’t in a quasi-chronological order, this section would go first. Even if your format is just two-dudes-rambling-about-Local-Sports-Team, you still have a chance if your show sounds decent.

Microphones

The first thing you need is a decent microphone. You don’t need to break the bank, but a good mic is a MUST if you want your thing to be listenable at all. There are two ways to go about this, and they both deal with analog-to-digital conversion.

The first way is to get a good analog mic and an digital audio interface. The mic goes into the interface via mic cable, the interface goes into your computer via USB. I like this method because it gives you a little more flexibility. For up to two people recording at once, the M-Audio M-Track Plus is a great and affordable option for an interface. You don’t need to splurge on a mic either, especially if you’re just starting out. The SM-57 is a good starting point. I started with (and still occasionally use) the MXL 990 which is even cheaper. I’m not going to get into a whole lecture about microphones, but it’s worth doing a little research. People have tested lots of mics for podcasting and even included audio samples. Google “Best Podcast Mic” and look around.

For a simpler (and most likely cheaper) option, you can also consider USB microphones. These do all the converting in one unit, so all you need to do is plug the mic right into your computer via USB. USB mics have gotten much better as far as sound goes, and it’s hard to go wrong even with a mid-priced one. Again, Googling is recommended, but I’ve had success with the Blue Yeti as I know some other people have.

DO NOT use a cheap gaming headset. DO NOT use the headphone/mic cable that came with your iPhone. DO NOT RECORD USING YOUR WEBCAM SPEAKER. This sounds like garbage.

If it’s hard to hear you, or your mic blows out, people will get tired of listening to you and stop downloading. It’s that simple. I’m not trying to be harsh, but if you don’t invest in marginally decent audio equipment, don’t even bother.

Studio Space

No, you don’t need to rent space in a radio studio or anything, but be aware of your surroundings. You probably will be recording at home, so pick the room in your house with the best chance of giving you good audio. If you are using Skype to record conversations (more on that later) picking a room with hard-line access to your router or at least a strong WiFi signal is encouraged. Dealing with garbled audio and Skype dropouts is annoying. If the neighbor’s dog barks all the time, try and pick a room where that doesn’t leak in too much. Finished basements tend to have a nice sound-deadening quality to them. If the room has windows, thicker curtains are better. This is little stuff and sometimes unavoidable, but try and consider it if you can.

Record Skype Calls the Right Way

Skype has its flaws, but it’s one of the better options for recording podcasts if the hosts aren’t in the same city. (If you are in the same city, please record together in person). The technology to record these calls is fairly straightforward. I use a program called Piezo. It’s easy and affordable. It’s Mac-only, but there are certainly options for PC. It may sound complicated at first, but it’s not too bad. Read the instructions. Look for how-tos on YouTube. Get a friend to help you test it so that you have the optimal settings for good sound quality. Whatever you choose, make sure it has the capability to split the recorded call into two tracks, one for you and one for the guest or cohost. This will allow you to edit both people independently. It’s most useful for cutting out the hiss from the guest’s side when you’re talking. It also makes it easy to get rid of coughs, sneezes, dogs, sirens, etc.

If you have your druthers, have the guest use a decent mic as well. You probably can’t force people to buy stuff, but hopefully they have something that isn’t the webcam microphone. Anything is better than the webcam microphone. A gaming/VoIP headset or even (gulp) iPhone earbuds are a step up.

Depending on the guest, the phone might be the only option. Phones don’t sound great, but a little EQ can make them usable (don’t worry about EQ in recording, it can be done in post). The easiest way is to use Skype to call a phone. It costs a little money ($2.99/month), but it will save you a lot of headaches. If you are interested in the free option, consider signing up for a Google Voice account. You can dial right from Gmail, and still use a program like Piezo to record. It’s slightly less elegant, but it’s pretty workable.

If you are going to have two people talk every episode, it’s a really great idea to have both hosts record separately on their own (nice) mics, then combine the two tracks in post.

Say you and Johnny are cohosts of the podcast. You record both ends of the call (you and him), and Johnny uses simple recording software to record just his voice. When you’re done recording, he plops his track into a Dropbox folder or something, and you copy it to your computer. Now, you have a good-sounding you, an OK-sounding Johnny (from the other end of the Skype call), and a good-sounding Johnny. Use Skype Johnny to match up the tracks, but insert Good Johnny instead. It’s easier than I’m making it out to be, and in the end you’ll have a recording that sounds like both of you were in the same room even though you were talking over Skype. This is what professional podcasts do. This is the poor-man’s version of what public radio does. Otherwise, it always sounds like one host is in the room and the other host is talking through a plastic jug.

Learn Basic Audio Editing

You don’t need a $10,000 ProTools rig to make a decent podcast. If you have a Mac, you already have GarageBand, which actually works pretty nicely for basic stuff. I use a program called Reaper. It works on Mac or PC, comes in at $60 for recreational use (so long as your podcast doesn’t gross $20,000), and has a GREAT support section and user forum for helping you figure everything out. There’s also ProTools, Adobe Audition, and a slew of other options. Many have trials you can play around with.

Whatever you end up with, give yourself time to learn it properly. Learn the keyboard shortcuts to save you time in editing. Take a look at the built-in processing options (EQ, compression, etc.) and which settings make your podcast sound best. EQ will help balance the highs and lows, compression helps handle high volumes and even everything out. These are important things to know, don’t disregard them. One solitary hour of playing around with these will help you a great deal.

Learn the proper way to create fades – fade ins, fade outs, crossfades, etc. Learn how to properly mix in music. Audio program companies spend a ton on R&D to make these programs easy to use. Don’t be scared. Save your files a lot and remember that Ctrl-Z is your friend. Think of ALL the people that do this somewhat successfully. They can’t all be smarter than you, right?

Pick Your Hosting

So you’ve worked out all the bugs, and you have a quality-sounding MP3. Now, you just need to post it somewhere. There are basically two schools of thought on this.

There are companies that will host your podcast for you and give you an RSS feed to use (more on this later). For a fee, you can upload your file and let them do all the rest. Libsyn and PodBean are two popular choices, though SoundCloud just got into the game as well. These products sell you on ease of use, analytics, and reliable uptime. There is cost involved, however. Usually bandwidth is included in any package, you just need to pick the storage space you need. Read the options carefully and start small. You can always upgrade later if you want to post more episodes per week/month.

Your other option is to store the files on your own server and create the RSS feed yourself through a platform like WordPress. PowerPress is a popular plugin for creating podcast RSS feeds through WordPress. It’s what we use at The Hardball Times for my podcast. It’s customizable and reliable. The downside here is that you need to host your own files. If you plan to have a corresponding website to go along with your podcast (you should) then you are probably already paying for hosting. Unless your podcast explodes in popularity, whatever hosting package you have should be fine. But beware, if your hosting provider decides that too many people are pulling down files from your server, you may be in line for additional cost. Again, this would most likely be later down the road, but it’s something to think about.

The elephant in the room here is the question of who owns your RSS feed. Your RSS feed is what iTunes and other podcatchers use to see when there are updates to your show. When the program sees an update in the feed, it downloads the new episode. If you host on Libsyn, for example, your feed will be something along the lines of libsyn.com/feeds/yourawesomesportsshow.xml. If you self-host, it will be something like yourawesomesportsshow.com/feeds/yourawesomefeed.xml. If you use the former and decide to move on from Libsyn to another host or self-hosting option, life may become difficult for you. There are ways to “force” podcast apps to update the feed if you change RSS addresses, but it’s not always reliable, and it’s a pain in the ass.

Do your research and make your best educated decision. This is a big question in the world of podcasting, believe it or not, and there’s really no one answer.

And while we’re talking about it, make sure to add your show to iTunes. There are lots of guides for doing so. iTunes is pretty terrible, but it’s still what most people use. In fact, if you’re starting a show, don’t even share via Facebook or Twitter until it’s on iTunes. Wait until people can download the episode. It takes about 48 hours or less to get into the iTunes store assuming you followed all the directions. It’s worth the wait. Hold off on the reveal until you get your podcast into the biggest podcasting platform around.

Listen and Listen and Listen and Listen

If you’re interested in doing this, then chances are you at least have a cursory knowledge of the medium. You probably have listened to a few sports podcasts here and there or may even be a rabid follower of a handful. But even though you want to do a sports podcast, you shouldn’t limit your listening to only that genre. There are lots of great podcasts out there, and some should inspire you and change the way you think about the platform. Don’t limit yourself. Inspiration can come from all sorts of avenues. This is by no means an exhaustive list, but here are some suggestions for non-sports podcasts you should at least check out. See how the other side lives, and all that.

Serial – My reservations not withstanding, podcasters need to hear to at least be part of the conversation. If you say you have a podcast, this is what most people will think of. You should at least know what it is. One million downloaders can’t be wrong, right?

Radiolab – In my opinion, the absolute best radio put out there today. It’s engaging and gorgeous and interesting. And also gorgeous. Please give a listen. One of the hosts won a MacArthur Genius grant for his work on it.

This American Life – The granddaddy of Serial, this show still produces wonderful journalistic storytelling. It’s on the Mount Rushmore of podcasts.

99% Invisible – A podcast about design. It was also behind one of the most successful Kickstarter campaigns of all time – successful enough that they started their own podcast network. It’s a great source of “did you know…” conversation starters.

Love + Radio – From the same network as 99% Invisible. Just immaculate radio. It’s hard to explain, but give it a listen.

Bullseye with Jesse Thorn – originally called The Sound of Young America, Bullseye went from college radio station show to NPR program. If you want a introduction in how to interview a guest, Jesse Thorn will be your professor. He also has a knack for getting the best and brightest in pop culture.

My Brother, My Brother, and Me – This is a great example of what even a loose premise can do for a show. MBMBaM describes itself as an advice show, but it’s really three brothers (Justin, Travis, and Griffin McElroy) riffing and spoofing and transitioning from one bit to another. They are funny guys, but their chemistry is what makes the show. They are cheating because they share DNA, but that’s the way it goes sometimes.

The Todd Glass Show – A master course on how chemistry can help a podcast. There is nothing buy silly banter to go along with Glass insisting they redo bits until they get them right. It’s frantic and disjointed, but it’s always fun and a it’s a good reminder of how strong personalities can steer an otherwise rudderless show.

Good luck to you, future podcaster. It’s a rough world out there, but with some hard work and attention to detail, you could be climbing up the iTunes charts in no time. Put in the time. It will seem dubious at first, because it is. But the finished product will be so much better.

(Image via curtis.kennington)

A Newbz Guide to Daily Fantasy

I don’t play fantasy baseball. Baseball simulations are more my thing. In 1998 I started a Strat-O-Matic league which grew to a league of 24 of us that play each other online. I’m completely in on Out of the Park Baseball, trying to lead my 2017 Miami Marlins – sans Jose Fernandez, Giancarlo Stanton and Carlos Santana (acquired) following season-ending injuries – in my second season as general manager/manager after starting my career managing the Pawtucket Red Sox. For whatever reasons, fantasy baseball just never caught on with me.

Then came advertising for Draft Kings and FanDuel. Lots and lots and lots and lots of it. My curiosity grew and finally, after a year of brain washing, the mad men won. Yesterday I threw down a whopping $10 and signed up for Draft Kings to see what the big deal was. It was a fact-finding mission, mostly. But a part of me, say my right pinky toe, wanted it to become a source of income for my beer fund. After my first 25 cent game, I realized that beer fund was going to stay dry.

And now I bring the experience to you, the TechGraphs readers. Learn from me, what and what not, to do.

Grab a promo code

Before you sign up for an account, make sure you have a promo code. It’s free money. I Googled and ran with the first bigger-looking site that didn’t seem sketchy. It promised a matching bonus up to $600 for the first deposit. The one I signed up for isn’t free money, though. It’s contingent upon me earning Frequent Player Points. I earned one point for my one game, and only have 99 left points left before my bonus kicks in. Geesh. So, search around a bit, see if you can find a better deal.

PayPal?!? YES

I wanted a royalty-related username, since the site is about kings and such, and discovered that landgrave is a German title. My last name is quite German, so you can find me at LandgraveK on the site. My wallet sits in the glove compartment of my car, because I roll dangerously. Fortunately PayPal is an option to make a deposit, as are major credit cards. Otherwise I would’ve had to walk outside … where there are people … that might want to wave or talk to me.

Take a deep breath

I was anxious to find a game once they had my money. I landed in the lobby, skimmed my options, and felt completely overwhelmed. It was like walking in to a major casino for the first time. Bright, bold colors illuminated a dark background. My eyes fixated on the big ads. I gathered my over-stimulated self and decided to proceed cautiously.

Slugfest, Perfect Game, Moonshot, Gold Glove. What is going on here? Guaranteed, Qualifiers, Head-to-Head, 50/50 Leagues, Multipliers, Steps. And then I found it. My peoples. The Beginner games.

draftkingscapture

The lowest entry fee was a dollar. I’m stupid cheap and looked around for quarter games. In my search, I found some free games that may appeal to those that want to try it out and keep a buck. One I came across sported a $10 prize spread among five players, so $2 winning each.

Bam. Quarter Arcade. This is where I belong. I joined a game that maxed out at 14,100 players. The first place prize was $150, a 500 percent return. With a $3,000 prize pool, those that finished in the top 2,800 would minimally double their entry fee, or better. Optimism warmed my belly.

Prepare, prepare, prepare

Things started off well, actually. Fellow TechGraphs writer David Wiers is a Draft Kings boss and Tweeted a tip.

Dan Haren was my first pick. And then things went totally wrong. I analyzed the pitching match ups, looking for the worst pitchers to draft a lineup against. Trevor Cahill’s a bum, so I drafted Kevin Plawecki in his major league debut and Lucas Duda. For fun I took Mike Trout, because I’m a huge homer, and Jose Bautista. A handy tool is that after each pick, it calculates the cost for average player remaining. For example, if I had five slots left to fill and $24,000 remaining (my funds started at $50,000), then I had an average of $4800 to spend on each player. I filled the rest of my roster utilizing this tool, my own instincts and figuring out best matchups. Without analytics.

Our brethren over at RotoGraphs do a fantastic job providing readers with quality research to help you select players for traditional and daily fantasy sports. Roto Riteup, which Wiers contributes to, and The Daily Grind are daily musts if you want to make educated decisions with your lineup. Or you can trust your guts and guile, like I did, and have Brett Anderson whiff in your face.

Pick your crew

Selecting players to add to your lineup is as easy as clicking a plus button. Each day’s games are listed with starting time and weather. If lineups have been officially announced, a check mark appears next to the player’s name. It’s no fun picking Buster Posey if it’s his off day. Each player’s profile features stats, updates and analysis for easy-access info.

Once you’ve set your lineup, you get the option of joining other games with that same lineup, which is handy if you’re looser with your pocket change than I am, or join other contests with a different lineup.

So how’d I do?

I was dreadful. I finished 11,527 out of those 14,100 players with a final score of 75.9. The aforementioned Anderson went negative on me against a crap Giants offense for -2.4. The Wiers pick, Haren, netted 15.3 points and was only bested by Bautista’s 16. I don’t believe he earned me any extra machismo points, however, after Tuesday night’s game.

I’d be remiss if I didn’t add that I downloaded the DraftKings iOS app, which performed as I’d hoped.

So what say you, TechGraph readers? Do you play? Are you curious? Comment below. All tips/tricks/advice are appreciated as well.