Archive for February, 2014

Note: After you read the Woody Allen example, please read the note below it, which describes how I screwed up the analysis!

One of the most important concepts in science, and sometimes in life, involves something called Bayesian Probability or Bayes Theorem. Since you are reading a sabermetric blog, you are likely at least somewhat familiar with it. Simply put, it has to do with conditional probability. You have probably read or heard about Bayes with respect to the following AIDS testing hypothetical.

Let’s say that you are not in a high risk group for contracting HIV, the virus that causes AIDS, or, alternatively, you are randomly selected from the adult U.S. population at large. And let’s say that in that population, one in 500 persons is HIV positive. You take an initial ELISA test, and it turns out positive for HIV. What are the chances that you actually carry the disease?

The first thing you need to know is the false positive rate for that particular test. It is also around one in 500. We’ll ignore the fact that there are better, more accurate tests available or that your blood specimen would be given another test if it had a positive ELISA. You might be tempted to think that your chances of carrying the virus is 99.8% or one minus .002, where .002 is the one in 500 false positive rate.

And you would be wrong. Enter Bayes. Since you only had a 1 in 500 chance of being HIV+ going in, there is a prior probability which must be added “to the equation.”

To understand how this works, and to avoid any semi-complex Bayesian formulas, we can frame the analysis like this:

In a population of 500,000 persons, there would be 1,000 carriers, since we specified that the HIV rate was one in 500. All of them would test positive, assuming a zero false-negative rate. Among the 499,000 non-carriers, there would be 998 false positives (a one in 500 chance).

So in our population of 500,000 persons, there are 1,998 positives and only 1,000 of these truly carry the virus. The other 998 positives are false. If you are selected from this population, and have a positive ELISA test, you naturally have a 1,000 in 1,998, or around a 50% chance of having the disease. That is a far cry from 99.8%, and should be somewhat comforting to anyone who fails an initial screening. That is basically how Bayes works, although it can get far more complex than that. It also applies to many, many other important things in life, including the guilt or innocence of a defendant in a criminal or civil prosecution, which I will address next.

Another famous, but less well-known, illustration of Bayes with respect to the criminal justice system, involves a young English woman named Sally Clark who was convicted of killing two of her children in 1999. In 1996, her first-born son died, presumably of Sudden Infant Death Syndrome (SIDS). In 1998, she gave birth to another boy, and he too died at home shortly after birth. She and her husband were soon arrested and charged with the murder of the two boys. The charges against her husband were eventually dropped.

Sally was convicted of both murders and sentenced to life in prison. By the way, she and her husband were affluent attorneys in England. At her trial, the following statistical “evidence” was presented by a pediatrician for the prosecution:

He testified that there was about a 1 in 8,500 chance that a baby in that situation would die of SIDS and therefore the chances that both of her children would perish from natural causes related to that syndrome was 1/8500 times 1/8500, or 1 in 73 million. Sally Clark was convicted largely on the “strength” of the statistical “evidence” that the chance of those babies both dying from SIDS, which was the defense’s assertion, was almost zero.

First of all, the 1 in 73 million might not be accurate. It is possible, in fact likely, according to the medical research, that those two probabilities are not independent. If you want to know the chances of two events occurring, multiplying the chances of one event by the other is only proper when the probability of the two events are independent – Stats 101. In this case, it was estimated by an expert witness for the defense in an appeal, that if one infant in a family dies of SIDS, the chances that another one also dies similarly is 5 to 10 times higher than the initial probability.

So that reduces our probability to between one in 15 million and one in 7 million. In addition, the same expert witness, a Professor of Mathematics who studied the historical SIDS data, argued that the 1 in 8,500 was really closer to 1 in 1,300 due to the gender of the Clark babies and other genetic and environmental characteristics. If that number is accurate, that brings us down to 1 in 227,000 for the chances of her two boys both dying of SIDS. While a far cry from 1 in 73 million, that is still some pretty damning evidence, right?

Wrong! That 1 in 227,000 chance of dying of SIDS, or the inverse, a 99.99955 chance of dying from something other than SIDS, like murder, is like our erroneous 99.8% chance of having HIV when our initial AIDS test is positive. In order to calculate the true odds of Mrs. Clark being guilty of murder based solely on the statistical evidence, we need to know, as with the AIDS test, what the chances are, going in, before we know about the deaths, that a woman like Sally Clark would be a double murderer of her own children. That is exactly the same thing as us needing to know the chances that we are an HIV carrier before we are tested, based upon the population we belong to. Remember, that was 1 in 500, which transformed our odds of having HIV from 99.8% to only 50%.

In this case, it is obviously difficult to estimate that a priori probability, the chances that a woman in Sally Clark’s shoes would murder her only two children back to back. The same mathematician estimated that the chances of Sally Clark being a double murderer, knowing nothing about what actually happened, was much rarer than the chances of both of her infants dying of natural causes. In fact, he claimed that it was 4 to 10 times rarer, which means that out of all young, affluent mothers with two new-born sons, maybe 1 in a million or 1 in 2 million would kill both of their children. That does not seem like an unreasonable estimate to me, although I have no way of knowing that off the top of my head.

So, as with the AIDS test, if there were a population of one million similar women with two newly born boys, around 4 of them (1 in 227,000) would suffer the tragedy of back-to-back deaths by SIDS, and only ½ to 1 would commit double infanticide. So the odds, based solely on these statistics, of Sally Clark being guilty as charged was around 10 to 20%, obviously not nearly enough to convict, and just a tad less than the 72,999,999 to 1 that the prosecution implied at her trial.

Anyway, after spending more than 3 years in prison, she won her case on appeal and was released. The successful appeal was based not only on the newly presented Bayesian evidence, but on the fact that the prosecution withheld evidence that her second baby had had an infection that may have contributed to his death from natural causes. Unfortunately, Sally Clark, unable to deal with the effects of her children’s deaths, the ensuing trial and incarceration, and public humiliation, died of self-inflicted alcohol poisoning 4 years later.

Which brings us to our final example of how Bayes can greatly affect an accused person’s chances of guilt or innocence, and perhaps more importantly, how it can cloud the judgment of the average person who is not statistically savvy, such as the judge and jurors, and the public, in the Clark case.

Unless you avoid the internet and the print tabloids like the plague, which is unlikely since you’re reading this blog, you no doubt know that Woody Allen was accused around 20 years ago of molesting his adopted 7-year old daughter, Dylan Farrow. The case was investigated back then, and no charges were ever filed. Recently, Dylan brought up the issue again in a NY Times article, and Allen issued a rebuttal and denial in his own NY Times op-ed. Dylan’s mother Mia, Woody Allen’s ex-partner, is firmly on the side of Dylan, and various family members are allied with one or the other. Dylan is unwavering in her memories and claims of abuse, and Mia is equally adamant about her belief that their daughter was indeed molested by Woody.

I am not going to get into any of the so-called evidence one way or another or comment on whether I think Woody is guilty or not. Clearly I am not in a position to do the latter. However, I do want to bring up how Bayes comes into play in this situation, much like with the AIDS and SIDS cases described above, and how, in fact, it comes into play in many “he-said, she-said” claims of sexual and physical abuse, whether the alleged victim is a child or an adult. If you have been following along so far, you probably know where I am going with this.

In cases like this, whether there is corroborating evidence or not, it is often alleged by the prosecution or the plaintiff in civil cases, that there is either no reason for the alleged victim to lie about what happened, or that given the emotional and graphic allegations or testimony of the victim, especially if it is a child, common sense tells us that the chances of the victim lying or being mistaken is extremely low. And that may well be the case. However, as you now know or already knew, according to Bayes, that is often not nearly enough to convict a defendant, even in a civil case where the burden on the plaintiff is based on a “preponderance of the evidence.”

Let’s use the Woody Allen case as an example. Again, we are going to ignore any incriminating or exculpatory evidence other than the allegations of Dylan Farrow, the alleged victim, and perhaps the corroborating testimony of her mother. Clearly, Dylan appears to believe that she was molested by Woody when she was seven, and clearly she seems to have been traumatically affected by her recollection of the experience. Please understand that I am not suggesting one way or another whether Dylan or anyone else is telling the truth or not. I have no idea.

Her mother, Mia, although she did not witness the alleged molestation, claims that, shortly after the incident, Dylan told her what happened and that she wholeheartedly believes her. Many people are predicating Allen’s likely guilt on the fact that Dylan seems to clearly remember what happened and that she is a credible person and has no reason to lie, especially at this point in her life and at this point in the timeline of the events. The statute of limitations precludes any criminal charges against Allen, and likely any civil action as well. I would assume however, that hypothetically, if this case were tried in court, the emotional testimony of Dylan would be quite damaging to Woody, as it often is in a sexual abuse case in which the alleged victim testifies.

Now let’s do the same Bayesian analysis that we did in the above two situations, the AIDS testing, and the murder case, and see if we can come up with any estimate as to the likely guilt or innocence of Woody Allen and perhaps other people accused of sexual abuse where the case hinges to a large extent on the credibility the alleged victim and his or her testimony. We’ll have to make some very rough assumptions, and again, we are assuming no other evidence, for or against.

First, we’ll assume that the chances of the victim and perhaps other people who were told of the alleged events by the victim, such as Dylan’s mother, Mia Farrow, lying or being delusional are very slim. So we are actually on the hypothetical prosecution or plaintiff’s side. ‘How is it possible that this victim and/or her mother would be lying about something as serious and traumatic as this?’

Now, even common sense tells is that it is possible, but not likely. I have no idea what the statistics or the assumptions in the field are, but surely there are many cases of fabrication by victims, false repressed memories by victims who are treated by so-called clinicians who specialize in repressed-memories of physical or sexual abuse, memories that are “implanted” in children by unscrupulous parents, etc. There are many documented cases of all of the above and more. Again, I am not saying that this case fits into one of these profiles and that Dylan is lying or mistaken, although clearly that is possible.

Let’s put the number at 1 in a 100 in a case similar to this. I’m not sure that any reasonable person could quarrel too much with that. I could easily make the case that it is higher than that. The population that we are talking about is this: First we have a 7 year-old child. The chances that the recollections of a young child, including the chances that those recollections were planted or at least influenced by an adult, might be faulty, have to be greater than that of an adult. The fact that Woody and Mia were already having severe relationship problems and in a bitter custody dispute also increase the odds that Dylan might have been “coached” or influenced in some manner by her mother. But I’ll leave the odds at 100-1 against. So, Allen is 99% guilty, right? You already know that the answer to that is, “No, not even close.”

So now we have to bring in Thomas Bayes as our expert witness. What are the chances that a random affluent and famous father like Woody Allen, again, not assuming anything else about the case or about Woody’s character or past or future behavior, would molest his 7-year old daughter? Again, I have no idea what that number is, but we’ll also say that it’s 100-1 against. I think it is lower than that, but I could be wrong.

So now, in order to compute the chances that Allen, or anyone else in a similar situation, where the alleged victim is a very credible witness – like we believe that there is a 99% chance they are telling the truth – is guilty, we can simply take the ratio of the prior probability of guilt, assuming no accusations at all, to the chances of the victim lying or otherwise being mistaken. That gives us the odds that the accused is guilty. In this case, it is .01 divided by .01 or 1, which means that it is “even money” that Woody Allen is guilty as charged, again, not nearly enough to convict in a criminal court. Unfortunately, many, perhaps most, people, including jurors in an actual trial, would assume that if there were a 99% chance that the alleged victim was telling the truth, well, the accused is most likely guilty!

Edit: As James in the comments section, Tango on the Book blog, and probably others, have noted, I screwed up the Woody Allen analysis. The only way that Bayes would come into play as I describe would be if we assumed that 1 out of 100 random daughters in a similar situation would make a false accusation against a father like Woody. That seems like a rather implausible assumption, but maybe not – I don’t really know. In any case, if that were true, then while my Bayesian analysis would be correct and it would make Allen have around a 50% chance of being guilty, the chances that Dylan was not telling the truth would not be 1% as I indicated. It would be a little less than 50%.

So, really, the chances that she is telling the truth is equal to the chances of Allen being guilty, as you might expect. In this case, unlike in the other two examples I gave, the intuitive answer is correct, and Bayes is not really implicated. The only way that Bayes would be implicated in the manner I described would be if a prosecutor or plaintiff’s lawyer pointed out that 99% of all daughters do not make false accusations against a father like Woody, therefore there is a 99% chance that she is telling the truth. That would be wrong, but that was not the point I was making. So, mea culpa, I screwed up, and I thank those people who pointed that out to me, and I apologize to the readers. 

I should add this:

The rate of false accusations is probably not significantly related to the rate of true accusations or the actual rate of abuse in any particular population. In other words, if the overall false accusation rate is 5-10% of all accusations, which is what the research suggests, that percentage will not be nearly the same in a population where the actual incidence of abuse is 20% or 5%. The ratio of true to real accusations is probably not constant. What is likely somewhat constant is the percentage of false accusations as compared to the number of potential accusations, although there are surely factors which would make false accusations more or less likely, such as the relationship between the mother and father.

What that means is that the extrinsic (outside of the accusation itself) chance that an accused person is guilty is related to the chances of a false accusation. If in one population the incidence of abuse is 20%, there is probably a much lower chance that a person who makes an accusation is lying, as compared to a population where the incidence of abuse is, say, 5%.

So, if an accused person is otherwise not likely to be guilty but for an accusation, a prosecutor would be misleading the jury if he reported that overall only 5% of all accusations were false therefore the chance that this accusation is false, is also 5%.

If that is hard to understand, imagine a population of persons where the chance of abuse is zero. There will still be some false accusations in that population, and since there will be no real ones, the chances that someone is telling the truth if they accuse someone is zero. The percentage of false accusations is 100%. If the percentage of abuse in a population is very high, then the ratio of false to true accusations will be much lower than the overall 5-10% number.

* And why I am getting tired of writers and analysts picking and choosing one or more of a bushel of statistics to make their (often weak) point.

Let’s first get something out of the way:

Let’s say that you know of this very good baseball player. He is well-respected and beloved on and off the field,  he played for only one, dynastic, team, he has several World Series rings, double digit All-Star appearances, dozens of awards, including 5 Gold Gloves, 5 Silver Sluggers, and a host of other commendations and accolades. Oh, and he dates super models and doesn’t use PEDs (we think).

Does it matter whether he is a 40, 50, 60, 80, or 120 win (WAR) player in terms of his HOF qualifications? I submit that the answer is an easy, “No, it doesn’t” He is a slam dunk HOF’er whether he is indeed a very good, great, or all-time, inner-circle, great player. If you want to debate his goodness or greatness, fine. But it would be disingenuous to debate that in terms of his HOF qualifications. There are no serious groups of persons, including “stat-nerds,” whose consensus is that this player does not belong in the HOF.

Speaking of strawmen, before I lambaste Mr. Posnanski, which is the crux of this post, let me start by giving him some major props for pointing out that this article, by the “esteemed” and “venerable” writer Allen Barra, is tripe. That is Pos’ word – not mine. Indeed, the article is garbage, and Barra, at least when writing about anything remotely related to sabermetrics, is a hack. Unfortunately, Posnanski’s article is not much further behind in tripeness.

Pos’ thesis, I suppose, can be summarized by this, at the beginning of the article:

[Jeter] was a fantastic baseball player. But you know what? Alan Trammell was just about as good.

Here are Alan Trammell’s and Derek Jeter’s neutralized offensive numbers.

Trammell: .289/.357/.420
Jeter: .307/.375/..439

Jeter was a better hitter. But it was closer than you might think.

He points out several times in the article that, “Trammell was almost as good as Jeter, offensively.”

Let’s examine that proposition.

First though, let me comment on the awful argument, “Closer than you think.” Pos should be ashamed of himself for using that in an assertion or argument. It is a terrible way to couch an argument. First of all, how does he know, “What I think?” And who is he referring to when he says, “You?” The problem with that “argument,” if you want to even call it that, is that it is entirely predicated on what the purveyor decides “You are thinking.” Let’s say a player has a career OPS of .850. I can say, “I will prove that he is better than you think, assuming of course that you think that he is worse than .850, and it is up to me to determine what you think.” Or I can say the opposite. “This player is worse than you think, assuming of course, that you think that he better than an .850 player. And I am telling you that you are thinking that (or at least implying that)!”

Sometimes it is obvious what, “You think.” Often times it is not. And that’s even assuming that we know who, “You” is. In this case, is it obvious what, “You think of Jeter’s offense compared to Trammell?” I certainly don’t think so, and I know a thing or two about baseball. I am pretty sure that most knowledgeable baseball people think that both players were pretty good hitters overall and very good hitters for a SS. So, really, what is the point of, “It was closer than you think.” That is a throwaway comment and serves no purpose other than to make a strawman argument.

But that is only the beginning of what’s wrong with this premise and this article in general. He goes on to state or imply two things. One, that their “neutralized” career OPS’s are closer than their raw ones. I guess that is what he means by “closer than you think,” although he should have simply said, “Their neutralized offensive stats are closer than their non-neutralized ones,” rather than assuming what, “I think.”

Anyway, it is true that in non-neutralized OPS, they were 60 points apart, whereas once “neutralized,” at least according to the article, the gap is only 37 points, but:

Yeah, it is closer once “neutralized” (I don’t know where he gets his neutralized numbers from or how they were computed ), but 37 points is a lot man! I don’t think too many people would say that a 37 point difference, especially over 20-year careers, is “close.”

More importantly, a big part of that “neutralization” is due to the different offensive environments. Trammell played in a lower run scoring environment than did Jeter, presumably, at least partially, because of rampant PED use in the 90’s and aughts. Well, if that’s true, and Jeter did not use PED’s, then why should we adjust his offensive accomplishments downward just because many other players, the ones who were putting up artificially inflated and gaudy numbers, were using? Not to mention the fact that he had to face juiced-up pitchers and Trammell did not! In other words, you could easily make the argument, and probably should, that if (you were pretty sure that) a player was not using during the steroid era, that his offensive stats should not be neutralized to account for the inflated offense during that era, assuming that that inflation was due to rampart PED use of course.

Finally, with regard to this, somewhat outlandish, proposition that Jeter and Trammell were similar in offensive value (of course, it depends on your definition of “similar” and “close” which is why using words like that creates “weaselly” arguments), let’s look at the (supposedly) context-neutral offensive runs or wins above replacement (or above average – it doesn’t matter what the baseline is when comparing players’ offensive value) from Fangraphs.


369 runs batting, 43 runs base running


124 runs batting, 23 runs base running

Whether you want to include base running on “offense” doesn’t matter. Look at the career batting runs. 369 runs to 124. Seriously, what was Posnanski drinking (aha, that’s it – Russian vodka! – he is in Sochi in case you didn’t klnow) when he wrote an entire article mostly about how similar Trammell and Jeter were, offensively, throughout their careers. And remember, these are linear weights batting runs, which are presented as “runs above or below average” compared to a league-average player. In other words, they are neutralized with respect to the run-scoring environment of the league. Again, with respect to PED use during Jeter’s era, we can make an argument that the gap between them is even larger than that.

So, Posnanski tries to make the argument that, “They are not so far apart offensively as some people might think (yeah, the people who look at their stats on Fangraphs!),” by presenting some “neutralized” OPS stats. (And again, he is claiming that a 37-point difference is “close,” which is eminently debatable.)

Before he even finishes, I can make the exact opposite claim – that they are worlds apart offensively, by presenting their career (similar length careers, by the way, although Jeter did play in 300 more games), league and park adjusted batting runs. They are 245 runs, or 24 wins, apart!

That, my friends, is why I am sick and tired of credible writers and even some analysts making their point by cherry picking one (or more than one) of scores of legitimate and semi-legitimate sabermetric and not-so-sabermetric statistics.

But, that’s not all!  I did say that Posnanski’s article was hacktastic, and I didn’t just mean his sketchy use of one (not-so-great) statistic (“neturalized” OPS) to make an even sketchier point.


By Baseball Reference’s defensive WAR Trammell was 22 wins better than a replacement shortstop. Jeter was nine runs worse.

By Fangraphs, Trammell was 76 runs better than a replacement shortstop. Jeter was 139 runs worse.

Is an abomination. First of all, when talking about defense, you should not use the term “replacement” (and you really shouldn’t use it for offense either). Replacement refers to the total package, not to one component of player value. Replacement shortstops, could be average or above-average defenders and awful hitters, decent hitters and terrible defenders, or anything in between. In fact, for various reasons, most replacement players are average or so defenders and poor hitters.

And then he conflates wins and runs (don’t use both in the same paragraph – that  is sure to confuse some readers), although I know that he knows the difference. In fact, I think he means “nine wins” worse in the first sentence, and not, “nine runs worse.” But, that mistake is on him for trying to use both wins and runs when talking about the same thing (Jeter and Trammell’s defense), for no good reason.

Pos then says:

You can buy those numbers or you can partially agree with them or you can throw them out entirely, but there’s no doubt in my mind that Trammell was a better defensive shortstop.

Yeah, yada, yada, yada. Yeah we know. No credible baseball person doesn’t think that Trammell was much the better defender. Unfortunately we are not very certain of how much better he was in terms of career runs/wins. Again, not that it matters in terms of Jeter’s qualifications for, or his eventually being voted into, the HOF. He will obviously be a first-ballot, near-unanimous selection, and rightfully so.

Yes, it is true that Trammell has not gotten his fair due from the HOF voters, for whatever reasons. But, comparing him to Jeter doesn’t help make his case, in my opinion. Jeter is not going into the HOF because he has X number of career WAR. He is going in because he was clearly a very good or great player, and because of the other dozen or more things he has going for him that the voters (and the fans) include, consciously or not, in terms of their consideration. Even if it could be proven that Jeter and Trammell had the exact same context-neutral statistical value over the course of their careers, Jeter could still be reasonably considered a slam dunk HOF’er and Trammell not worthy of induction (I am not saying that he isn’t worthy). It is still the Hall of Fame (which means many different things to many different people) and not the Hall of WAR or the Hall of Your Context-Neutral Statistical Value.

For the record, I love Posnanski’s work in general, but no one is perfect.