A Roundup Of My Favorite Stats Books

Ryan Day
9 min readAug 19, 2021

--

These are a few books that have helped me understand the role of stats and analytics in life and business. I have ended up reading these books multiple times in different formats, and I often find myself going back to them to think through a data-related topic.

The books on my list:

  • How To Measure Anything: Finding The Value Of Intangibles In Business, Douglas W. Hubbard
  • The Signal and the Noise: Why So Many Predictions Fail — but Some Don’t, Nate Silver
  • Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics, Gary Smith
  • Fooled By Randomness: The Hidden Role of Chance in Life and in the Markets, Nassim Nicholas Taleb

These books all share a thread of caution or skepticism about how stats, models, predictions, and evidence are used in modern culture (as you might guess by all the ‘fooled’, ‘flawed’, and ‘failed’ in the titles). However, the authors believe that statistics and probabilistic thinking are still worthwhile when done well.

At their heart, they make an argument to the individual: the way you understand probabilities has a fundamental effect on your life.

At their heart, they make an argument to the individual: the way you understand probabilities has a fundamental effect on your life.

These books generally skewer the works of pop-science authors, journalists, and speakers that present unlikely and often invalid claims (think Freakonomics, Malcolm Gladwell, and various Ted Talkers). They reveal logical fallacies and biases that undermine business best-sellers such as Good To Great and In Search Of Excellence. And they uncover the deceptive or tricky ways that people use numbers or charts to obfuscate facts in politics or business.

Along with their skewering, the books provide practical advice about how to use valid statistical methods to make decisions and predictions in a complex world. Some of these methods have been major influences for me as I try to “think probabilistically”.

Stickiest Ideas From Multiple Books

Reproducibility Crisis John P. A. Ioannidis and others have identified persistent gaps in the validity of many research findings in major peer-reviewed scientific journals. The root causes are broad and include statistical problems (p-hacking, data grubbing) as well as more systemic problems in the business of science (publication bias, researcher degrees of freedom). This problem has major implications for the way that we evaluate evidence in public policy and business. And it suggests that we should be especially skeptical when authors or speakers spin tales of surprising and entertaining research results. (The crisis ironically implicated some of the books the authors recommend, such as Thinking Fast And Slow by Daniel Kahneman.)

Limitations of Statistical Significance The p-value is a concept from frequentist statistics that is used to determine if a finding in data is real, or could be random noise. To be publishable, a finding often needs a p-value of .05, which is supposed to roughly mean that there is only a 1 in 20 chance that an effect at least as powerful could occur by chance alone. If a research finding meets this threshold, it is called “statistically significant” (I’m simplifying somewhat). However, the use of p-value assumes that a researcher establishes a hypothesis ahead of time, then conducts an experiment and evaluates the results. In that order.

The 1 in 20 chance sounds like a rare occurrence, until you realize that researchers can mine the data after the fact to find any correlation that meets the .05 value. By examining 20 different features (or slicing the data into 20 sub-groups), a “statistically significant” correlation is expected to be found in random noise, even if no predictive pattern exists. (This after-the-fact search for a hypothesis is known as “p-hacking”.)

Expected Outcomes On the topic of predictions, both Silver and Taleb propose evaluating future outcomes based upon the expected result of multiple possible paths. It is a way of ‘summing the outcomes’ instead of simply picking a winner.

For instance, Silver gives the example making a bet for Team A to win a game basketball game:

  • Team A wins — expected probability 25% — Payout $520,000 — expected profit: $130,000
  • Team B wins — expected probability 75% — Loss: $80,000 — expected loss: $60,000

Expected outcome (over time) of betting the less probable outcome: $70,000 (Note: this assumes that the bettor has a different view of the odds than ‘the house’, otherwise this wouldn’t be profitable.)

Taleb uses almost the same language to describe using expected outcome for options trading (insert your own joke about options trading being a form of gambling). He gives a similar formula to Silver’s, with the added nuance that the amount the market goes up or down may affect the outcome (profit) of a decision. Taleb made his money by looking for rare events that would be losers most of the time, but pay off big when they occurred (his black swans). By betting on various market crashes, he lost a small amount on many trades, then won big on a few.

Silver and Taleb argue that this type of probabilistic thinking is valuable in many decision-making situations, and they suggest:

  • Don’t just make a decision based only upon the most likely outcome.
  • Instead, calculate the expected result (profit) of all outcomes and decide based on net gain.

A Breakdown Of Each Book

How To Measure Anything: Finding The Value Of Intangibles In Business, Douglas W. Hubbard

In Hubbard’s consulting, he found that businesses often made decisions without gathering any additional data because their topic was considered ‘intangible’ such as product quality. Other times they decided not to gather data because an item was a ‘must have’ such as security. Instead of gathering data, they used another unspoken model for the decision, such as gut feel, status quo, or HIPPO (highest paid person’s opinion). This book argues that decision-makers usually ‘have more information than they think’ and ‘need less information than they realize’ to make a good decision. Measurements (data gathering) make for better decisions.

Biggest Takeaways From How To Measure Anything

  • A measurement only needs to reduce uncertainty, not remove it. And the more uncertain you are about a decision, the more valuable a measurement will be. When you are very uncertain about a subject, it can be very quick and cheap to measure enough to remove a large amount of uncertainty. This is often all you will need to make a decision.
  • Confidence Intervals Instead of trying to predict an exact value for a decision with false precision (e.g. we will make $51 million in sales next year), a more useful goal is to identify a 90% Confidence Interval, which is the range where you’re 90% sure the value will fall into (e.g. the 90% CI for next year’s sales is $28 million to $60 million). And human experts can be trained to calculate a 90% confidence interval.

My full review of How To Measure Anything

The Signal and the Noise: Why So Many Predictions Fail — but Some Don’t, Nate Silver

This is somewhat of a memoir from Silver, the celebrity prognosticator famous for his FiveThirtyEight blog. He examines professions that attempt to predict complex phenomena like the economy and the weather. He examines some failed predictions and shares his more successful approach to predicting things like political elections. He also includes personal stories from his varied careers as an online poker player and baseball statistician.

Biggest Takeaways from The Signal and the Noise

  • Bayes’s Theorem This was my introduction to Bayes’s theorem, which can be used to calculate the probability of an event or hypothesis, given another event has occurred. (It is also the basis for a some common ML algorithms.) The book provides an entertaining example of calculating the probability that your partner is having an affair, given that you came home and found someone else’s underwear on the floor. With Bayes’s theorem, you can calculate a percentage for this, with just a few inputs. One of these inputs is the all-important prior probability of your partner having an affair. This introduction of a common-sense sanity check distinguishes it from the frequentist statistics that are commonly taught. Bayesians also progressively refine their probabilities as new evidence is uncovered, getting less and less wrong.
  • Silver recommends making predictions as a range of outcomes (with an associated probability) instead of a single predicted value or win/lose outcome. He calls this ‘thinking probabilistically’ and it communicates the best information you have available along with your confidence in the answer. (Notice the similarity to the 90% confidence intervals from Hubbard.)

Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics, Gary Smith

Smith is a good storyteller, and this amusing book covers a variety of ways that statistics, evidence, and analytics are misused in everyday life. It covers a lot of ground in small doses, but with color and clarity. If you’re looking for a starter book about stats in everyday life, this would be a good one.

Biggest Takeaways From Standard Deviations

  • Smith is very critical of data mining, and I suspect he would come down equally hard on some machine learning techniques. He makes a strong case that in the era of big data, it is tempting to torture historical data ‘until it confesses’ a pattern or spectacular finding that may just be random noise. These patterns can often be ‘statistically significant with a p-value of .05 or lower. But this kind of ‘data without theory’ can be dangerously misleading. Smith’s solution is: 1) examine if there is a reasonable or common sense cause that could lead to the effect found in the historical data. If not, demand strong evidence. 2) Test the theory with new data. If the effect is real, it should be just as powerful in a fresh data set. If the effect disappears with new data (or is much less powerful), it is probably random noise.
  • In a fairly short section, Smith very clearly explains the concept of regression to the mean. He explains that many performances or outcomes in life such as a football season, a standardized test score, or a mutual fund return are an imperfect reflection of the ability of the underlying actor. So future performances will ‘fluctuate randomly about the actual ability’. One fascinating example is that a person’s height is an imperfect measure of the underlying genetic ‘ability’. The ‘observed’ height of a person is just one possible random occurrence of their genetics. (It was kind of mind-blowing for me to think of the event that occurred in real life as simply the ‘observed’ incidence, not the real thing. That is an idea that flows through several of these books.)

Fooled By Randomness: The Hidden Role of Chance in Life and in the Markets, Nassim Nicholas Taleb

Like Silver’s book, this one combines a lot of personal stories with thoughts on probability. Taleb was a successful options trader, profiting big on several economic crashes. He makes a strong argument that we underestimate the role of random events in business and life. We often assume that success (such as profitable options trading) is caused by skill or talent, when it is just as likely to be caused by random chance.

Biggest Takeaways from Fooled By Randomness

  • Taleb uses a familiar story of monkeys typing on typewriters and one of them ending up with a famous novel. Is the monkey who typed it special? You don’t know until you count the monkeys that started typing. If the number is sufficiently large, the novel could be the product of chance. Taleb explains that when a large number of starting candidates (traders, companies, athletes, mutual funds) begin a process, a fair number of them will end up successful, simply due to random chance. (An idea he demonstrates with several convincing thought experiments.) We call these “stars”. But often the most important factor in noteworthy outcomes is not the abilities of the participants, but the number of participants that entered the process. To know if their accomplishment is impressive, we have to count the starting participants. I love his quote “I have rarely seen anyone count the monkeys”.
  • He proposes the idea of ‘alternative accounting’ when valuing success that was accomplished through a very random or risky process. When weighing all the possible ‘invisible histories’ that could have occurred by randomness in the process, a risky success (options trading through risky trades) should be valued less than a predictable success (such as money gained practicing dentistry). Taleb has spent so much time using Monte Carlo simulations to consider alternative paths, that he views actual outcomes as only ‘realized histories’ that can’t stand alone without reference to their companion ‘nonrealized histories’. These ideas reminded me a lot of Smith’s use of the terms ‘observed abilities’ (which we might call real-life occurrences) and the ‘true abilities’ that randomness operated on.

Naked Statistics: Stripping the Dread From The Data, Charles Wheelan

Of all the books here, this one spends the most time explaining statistical methods. Wheelan has a wry wit, and uses some comical examples to demonstrate how dry-sounding topics like the central limit theorem, null hypotheses, and regression analysis have relevance to everyday life.

Biggest Takeaways from Naked Statistics

  • The book’s explanation of multiple regression analysis (finding the relationship between two variables while controlling for other factors) is useful, along with the cautions about what can go wrong with it.
  • He gives the best explanation of the Monty Hall problem, which is a fun counter-intuitive game show example of probability that finds its way into many stats books.

Full review of Naked Statistics

--

--

Ryan Day

Data scientist writing an O'Reilly book titled Hands-On APIs for AI and Data Science.