Bayes Theorem Part 1 — Interpreting Unlikely Test Results
Bayes theorem calculates the probability of something that is conditional upon the probability of something else. Maybe that doesn’t sound too interesting, but think instead about these very common questions that come up in life:
- What is the chance an unlikely result is actually true?
- What is the chance a future prediction based on the past will actually happen?
Now imagine you have a tool, that few other people know how to use, which can answer these questions. That’s Bayes!
Most explanations of Bayes theorem start with its mathematical expression. I’ll save that for later and instead begin by illustrating a classic application using a spreadsheet!
This example is not related to finances, but I’ll get there in Part 4 by showing how I successfully reduced the downside risk of my portfolio from 2020 to 2022 using Bayes.
Imagine the world is in the early stages of a pandemic (if only this idea seemed far-fetched ☹).
The current rate of asymptomatic cases in the general population is 5%. You have no symptoms, but must take a rapid test to travel on business. The rapid test, carefully evaluated via clinical trials, has a positive result accuracy (sensitivity) of 70% and a negative result accuracy (specificity) of 90%. You test positive. What is your chance of actually being sick? Hint: it’s not 70%.
The correct answer from the spreadsheet above is around 27%. The actual math behind the spreadsheet is quite simple (shaded cells are inputs, other values are calculated as shown in red), but applying this math correctly is a little tricky.
The first step is to make up a number to represent a random population. In this case, I chose a sample size of 1000 (you can make this number anything you want; the answer will be the same). For the chosen sample size and community infection rate, 50 people (5%) are positive. The remaining 950 (95%) are negative. Of the 50 people who are actually positive, only 35 of them (70%) will test positive. Of the 950 people who are actually negative, 95 of them (100%-90% = 10%; 10% of 950 = 95) will falsely test positive. So, the chance of actually being positive if you test positive is (35/(35+95)), or around 27%.
Let’s say you test negative. What is the chance of actually being negative?
The spreadsheet now includes actual negatives that test negative (90% of 950 =855). Also shown are actual positives that test negative (30% of 50 =15). So, the chance of actually being negative if you test negative is (855/(855+15)), or around 98%.
To get a good feel for the basic math behind Bayes, you should take the time to duplicate everything you see and make sure your results match exactly.
Now let’s tweak a few things and see what happens.
What if the current rate of asymptomatic infection in the general population is higher?
With 25% percent of the community infected, a positive test result indicates a 70% chance of infection. Much higher!
What if the positive accuracy of the test (sensitivity) is 100%?
Because of the low general infection rate (I moved it back to 5%), a positive result is still not very conclusive. A negative result, however, is conclusive. It really and truly means negative given that no positive cases are falsely identified as negative.
What if the sensitivity and specificity are both 50%?
This result is quite important to make note of. A 50% test is by definition random. Our result confirms that this test does nothing to our original probability estimate. There is still a 5% chance of being positive and a 95% chance of being negative.
But the really, really important thing to notice is that 50% test sensitivity and 50% test specificity add up to 100%. Of course, you know that, but I want you to notice it! The math of Bayes theorem works such that any combination of sensitivity and specificity that adds up to 100% will not change the original probability estimate. That test is random. Try it for yourself and see. Make sensitive/specificity = 60/40. Or 70/30, or 30/70. In each case in our spreadsheet, the new Positive result will still be 5% and the new Negative result will still be 95%. The math of Bayes goes on to conclude that if sensitivity plus specificity is less than 100%, the test is worse than random and therefore misleading. If the sum is 200%, the test is perfect and provides the exact answer regardless of prior probability.
Bayes Theorem Part 2 — Developing a Feel for Bayes Math (Without a Spreadsheet!)
A really excellent way to think about Bayes theorem is with a “grains of sand” approach. I learned about this technique from a book by science author Sean Carroll. Here is how it applies to the medical test example above. Imagine starting with 1000 grains of sand. In your mind’s eye, separate these grains into two buckets based on the original positive likelihood estimate of 5%. That means 50 grains in one bucket and 950 grains in the other.
We obtain new information relevant to our concern and estimate its sensitivity (accuracy of a positive result, or chance of confirming the concern) at 70%. This tells us that 35 of the 50 grains of sand in our positive bucket are True Positives. We also estimate a specificity (accuracy of a negative result, or chance of ruling out the concern) of 90%. This tells us that 855 grains (90% of 950) of the sand in our negative bucket are True Negatives. That means 950 – 855, or 95, are False Positives. Now we can calculate a New Positive Estimate from the True and False Positives.
This is the same positive result obtained from the first medical test example in this article. Note that I did not show a new negative estimate. The data is available (True and False Negatives), but if I wanted to use this technique for that calculation, I would do it separately with a rearranged “grains of sand” model.
You can use this technique iteratively when new information is obtained to make math-based educated estimates in many areas. The next time you make a sports bet (or perhaps first time given what you are about to learn), consider this 100 “grains of sand” approach:
- In the last couple of years, my favorite team has won only 33% of their games.
- Their next game is at night. They don’t play at night very often, but when they do, they seem to win around half of the time. I don’t think this is a fluke. There is something about my team and night games! In my mind this doesn’t change their chance of losing a day (non-night) game change much. It might go up slightly to make up for fewer losses occurring at night. In Bayes terms, that means a “night game” test sensitivity of 50% and a test specificity (losing a non-night game) of 68%.
- Note: This is where many people get Bayes reasoning wrong…I have seen several Bayes sports-betting examples where specificity is set equal to sensitivity by default…this is often incorrect. Each needs separate reasoning. In this case, for example, we know that setting both to 50% would be a random premise providing no added insight
Our observation about past night games leads us to predict a 44% chance of winning the upcoming night game. But there is more to consider.
- My team recently acquired a new star player. He is battling injuries and hasn’t played much, but he is playing in the next game. So far, they won 45% of their games with him in the lineup. Even when he is out, the team seems to be playing better (slightly less likely to lose than before). I attribute this to his overall positive effect on the team’s morale. For this Bayes iteration, sensitivity is 45% and specificity is 65%.
- This 2-iteration Bayes assessment predicts your team has a 50/50 chance of winning their next game (even-money). If you find a sports-gambling facility offering more than even-up odds (on either a win or a loss), and you have a little fun-money, take the odds and bet!
Some things to note:
- The order of these two iterative steps does not matter. The result is the same either way.
- You need to consider the sum of sensitivity and specificity related to any premise. If it is 100%, the premise is random and does not add any insight. If less than 100%, it is worse than random and generates meaningless results.
Bayes Theorem Part 3 — Crazy Claims and Bayes Thinking.
“Extraordinary claims require extraordinary evidence”
This quote is attributed to the famous scientist Carl Sagan, though other scientists before him said similar things. This is actually a great way to summarize an important conclusion of Bayes Theorem. We’ve seen how Bayes works. If something is really unlikely, like a rare disease, a test for that condition needs to be extraordinarily accurate to indicate it is more likely than not. Here are some examples:
The last column shows that a positive result from a test that is 99.9% accurate (both sensitivity and specificity) predicts only a 50/50 chance of something that is nominally a 1 in 1000 event.
Let’s say someone claims something extremely unlikely has happened. For example, “The election in our State was stolen!”. You are of course skeptical, but he has evidence!? He directs you to a copy of an affidavit signed by a guy in Italy. This document describes secret CIA software that years ago was used to flip electronic votes in Venezuela. More recently, according to the Italian, this software was modified by a Spanish company to target American elections!!!
You do some research on election procedures in your State. These include paper ballots, multiple levels of independent audits during the counting of those ballots, and automatic hand recounts if the results are close. In addition, vote counting software has been used in your country thousands of times with no known cases of electronically flipping cast votes. Given this knowledge, you estimate there is less than a 1 in 1000 chance of this happening.
Still, the person making the claim seems truly convinced. He has probably not applied Bayes thinking, but you can. Imagine a test for “electronically flipped ballots” that is 99.9% accurate. Think of this test as some sort of evidence presented to 1000 truly unbiased (we are dealing with politics) experts on computer software used in elections. To get a positive test result, 999 of the 1000 experts need to agree the evidence indicates votes were flipped. This same number of experts also has to agree that without the evidence, or prior to being shown the evidence, votes were not flipped. Even then, given the original low likelihood estimate, agreement from those 999 experts only gets you to a coin-flip.
Bayes Theorem Part 4 — Designing a Useful Test
In actual medical or other scientific applications, tests designed using Bayes techniques are evaluated for usefulness based on their sensitivity and specificity. This can be an extremely complex task. Especially if one of these values is less than 50%. Nonetheless, I came up with something that approximates the material I reviewed. So, for entertainment purposes only, here is my highly subjective, unscientific ranking scale for Bayes test usefulness:
- Note that this scale does not apply to Bayes thought (e.g. “grains of sand”) experiments. It only applies to tests developed from collected data. In thought experiments, anything over 100% for the sum of sensitivity and specificity is a good goal to help inform odds estimates.
Of course, to apply this scale, we need to come up with a Bayes test to evaluate. One method is to collect some data you think is conditionally related in a spreadsheet and postulate a true/false premise (a proposed test) for that data. Then you can calculate both the sensitivity and specificity of your proposed test from the results. Here is an example using some historical financial data (I said I would get here).
This test looks at the relationship between the annual returns of the S&P 500 and inflation starting in 1928. Two formulas compare inflation to a test value, separating the S&P returns into different columns based on whether 12 month inflation at the end of that year is above or below the test value. The bottom of the spreadsheet contains Bayes-related statistics derived from the data above. Many of these calculations involve the “COUNT” and “COUNTIF” Excel formulas. The grey box below shows an example. The blue boxes show the equations for calculating sensitivity and specificity. With this knowledge, you can (and should) recreate the results you see.
Changing the Test Value yields different results for these parameters. Unfortunately, for this particular test, no reasonable Test Value provides a combination of sensitivity and specificity deemed “Useful” per my ranking scale.
However, an interesting thing happens by changing the polarity of the test.
Now we have a value for sensitivity which is just below my scale’s threshold of being useful. If your scale is a little more lenient, you might interpret this result as follows: if inflation is likely to be higher than 5% this year, then the chance of a negative stock market return increases from 27% to 44%.
It might seem odd that the sum of sensitivity and specificity is different after changing the polarity of the test, but not the value of the test variable. However, in order to make the sum the same, we also need to change the polarity of positive/negative results in the appropriate formulas. In other words, “negative market returns” become “positive conditions” of the test. It sounds more confusing than it is, and in the end any useful conclusions from the test are the same. It’s worth the effort to generate this modified spreadsheet if you feel so inclined. I won’t do it here. The takeaway, however, is make sure to use both polarities, like I did, to evaluate Bayes test usefulness. For the comparison of annual S&P 500 returns to inflation, the table below shows the results at different inflation levels.
The red-shaded cells indicate information that is not useful per my rating scale. So, this tells me I can’t draw any conclusions from this test about stock market performance (compared to the baseline sample results) given projected inflation levels less than 6%. Above 6% however, I can reasonably estimate that the chance of a negative stock market return increases from 27% to 40%. Above 7%, the increased chance of a negative returns goes up slightly less, from 27% to 36%.
Of course, there are unlimited ways to test (or analyze) data in a spreadsheet. But with a Bayesian approach, there is also an indication of test quality.
Here is another analysis similar to the one for S&P returns, but instead comparing inflation to real (after inflation) returns from US Intermediate Term Bonds.
This test produces much more useful data per my scale. Though if you understand how bonds work, you probably don’t need a test to tell you that their real returns are more likely to do well when inflation is low and poorly when inflation is high. But this test helps quantify that conclusion at specific inflation levels.
In these examples, I purposely chose a range of returns that ended in 2020. That year, in response to the COVID pandemic, the US and other governments printed a lot of money. It did not take a genius to figure out that big inflation would result relatively soon. So, what was a Bayesian investor (like me!) to do?
Let’s look at bonds first. The 3% inflation level is an interesting result. Below that level, the chance of a positive returns increases from 66% to 76% (this 76% figure implies that the chance of a negative return is 24%). But this result also finds that above 3%, the chance of a negative return is 49%. So, what happens if inflation is predicted to be exactly 3%? In that case, since the sensitivity of the test is higher than the specificity, we would put more credence the 49% negative return prediction.
But the real usefulness of the bond test results is at the higher inflation levels. Starting with inflation above 5%, the chance of a negative real return from bonds increases from 34% to 78%! As inflation goes higher, the chance of negative bond returns increases further. This test is pretty clear that bonds are not a great place to invest if you see inflation above 5% on the horizon!
Do you move money from bonds to stocks?
As seen earlier, we don’t start getting useful Bayesian results about stocks until inflation is above 6%. At that level, the chance of a negative annual return from stocks increases from 27% to 40%. At 7% inflation the chance of a negative return from stocks is 36%, slightly less than it was at 6% inflation. This pattern indicates that high inflation makes stocks riskier, but unlike the situation for bonds, this effect lessens as inflation goes even higher. Since you don’t know how high above 5% inflation could go, leaving the stock percentage of your portfolio alone seems prudent.
If not bonds to stocks, what then?
Given a decision against increased equity exposure, is there somewhere better than Intermediate Term Bonds to put the “safe” part of our portfolio? I looked back at three possibilities: 1-year Treasuries, 3-month Treasuries and my 401k Stable Value fund. Interestingly, my table of Bayes data for these alternatives looked almost identical to the original bond analysis. So, I went a step further. I averaged the negative real returns in the negative test results column at a 6% inflation level. In the original bond analysis, the average was -7%. For 1-year Treasuries, the average was -6.5%. For both 3-month Treasuries and Stable Value, the average was -5%. These last two choices are definitely better for the safe part of a portfolio when a big inflation threat is looming.
I chose Stable Value. Over the next two years, inflation exceeded 6% per year. Bonds cumulatively lost around -13%. Stable Value gained around 4%. In real terms, Stable Value lost money, but did much better than bonds.
Bayes Theorem Part 5 — The Formula
I said I would get to a mathematical expression for Bayes Theorem and here it is:
If we define the “Probability of a Positive Test Result” as P(B), then the “Chance of Actually Being Positive Given a Positive Test Result” is P(A|B). But learning the specific symbology of this formula is not all that important for us. If you understand the spreadsheet and thought experiments we went through, you know Bayes Theorem!
September 2022