Friday, April 23, 2021

An application of Bayes' Rule to daily life during the modern plague

Disclaimer: I am a mathematician with interest in public health, not a doctor. Please speak to a medical professional to get medical advice. This website cannot replace medical professionals. If you rely on information here, you do so at your own risk. I will update this information as needed, but I make no guarantee that the anything is correct. There is no warranty of any kind, and I make no guarantee that this website will function for you. (e.g. The calculations require javascript functions, and some browsers will have difficulty if the functions are no longer supported in the future).

This is a probability calculator that depends on your input.

When you take a rapid test for COVID-19, what information do you gain? Click the "Calculate" button below to see what happens with some typical values, then try playing with the values for sensitivity and specificity of tests (explained just below) as well as your general risk of having COVID-19, given the situation in your area. The text below the calculate button will change depending on your input.

Mathematics is written in blue, and you can ignore these sections unless you want to understand my methods.

Sensitivity: %
If you have COVID-19, how often will your test detect it? Timing is important, and so is test quality, so the answer is complicated. This research review article* gave a range of sensitivites from 0 to 80% for PCR tests based on days post exposure (0% on the first day past exposure, 67% at 4 days past exposure, 80% at 8 days past exposure and a gradual decrease thereafter). The danger zone appears to be from days 2 to 5, when the tests often fail to detect the virus and symptoms are absent or just starting, but infected people are likely to be contagious. This article assumed that day 5 was the day of symptom onset, so the timeline may be shifted for people with longer incubation times. Sensitivity is lower for rapid tests than for PCR tests, especially if storage and usage instructions are not followed precisely. A Cochrane review** that ignores the timing of testing gives 58% to 75% as rapid test sensitivies for asymptomatic individuals. Manufacturers of rapid tests regularly claim to get much higher sensitivity (98% on mine), but it is difficult for me to comprehend why hospitals would still use PCR tests as their standard if the rapid tests were better, cheaper, quicker, and required no special equipment. Some clarity comes from the Lancet's article "Buyer Beware: inflated claims of sensitivity for rapid COVID-19 tests" by Fitzpatrick et al., which states that manufacturers aren't reporting the sensitivity at all; they are stating the percent positive agreement and calling it sensitivity. The percent positive agreement tells you what fraction of positive PCR tests, for example, were also caught by this rapid test.

Specificity: %
If you don't have COVID-19, how often will the test correctly come back negative? This is around 98.8% to 99.9% for all tests. The lowest value I saw anywhere was 97%.

Prevalence: per 100,000

How likely are you to have COVID-19? This will be the incidence in your area, adjusted based on your individual situation if it's likely to be very different. Did you come into contact with someone who tested positive? Has everyone in your household decided to be extremely cautious with contacts?

*Variation in False-Negative Rate of Reverse Transcriptase Polymerase Chain Reaction–Based SARS-CoV-2 Tests by Time Since Exposure by Kucirka et al. in Annals of Internal Medicine

**Rapid, point‐of‐care antigen and molecular‐based tests for diagnosis of SARS‐CoV‐2 infection by Dinnes et al. in the Cochrane database of Systematic Reviews (accessed 27 April, 2021)

Click "Calculate" to display the answers to the questions based on your inputs.

What are the results when 100,000 random people take tests?

To get that, we multiply 100,000 by the probability of each possible event. The probability of each event is given in the parentheses.

people have COVID-19, and their test is positive. (prevalence times sensitivity)
people have COVID-19, and their test is negative. (prevalence times [1 minus sensitivity])
people do not have COVID-19, and their test is positive. ([1 minus prevalence] times [1-specificity])
people do not have COVID-19, and their test is negative. ([1 minus prevalence] times specificity)

For those interested in the math: One rule I just used here is that the probability of something not happening is 100% (or 1) minus the probability of it happening. For example, if a test senses COVID-19 20% of the time, then it won't sense it 80% of the time. You just subtract 20% from 100% to get 80%. This rule will come up a lot later as well. Another rule is multiplication of probabilities. If you want the probability of two independent things happening (e.g. you flip a coin and get heads AND you pick one of four doors and get the prize hiding behind only one of them), you just multiply their probabilities. In this case 50% times 25% (heads is on one out of two sides of the coin. One of the four doors contains the prize.). If the events aren't independent, you have to use conditional probability. For example, we want the probability that someone has COVID-19, and their test is positive. Since the test detects COVID-19 and has a higher probability of being positive if someone has the infection, these events are not independent. In this case we multiply two things (1) the probability that someone has COVID-19 and (2) the probability that a test will be positive IF someone has COVID-19. People who study probability say "given" instead of "if".

How much does my risk of having COVID-19 change if I take a test?

Your risk of having COVID-19 was the prevalence you listed above before the test. This means that % of people in your area and situation have COVID-19.

Once you get a negative test, it changes to . This is the number of people who had COVID-19 and a negative test divided by the total number of people with a negative test in the calculations above. This means that % of people who got a negative test have COVID-19. With realistic values, the chances for you as an individual aren't much different with or without a test: probably somewhere around one-tenth of a percent unless the hospitals are overflowing.

I've snuck in Bayes' Rule here. Bayes' rule is a way to switch conditional probabilities you know for ones you don't know. Here, we wanted to know what your risk of having COVID-19 was IF you had a negative test. But from the sensitivity, we only knew what the probability of getting a negative test was IF you had COVID-19. To switch the order, you can use the formula Bayes concocted, or it helps to draw something called a Bayes tree if you are new to the concept. If you look up how to do that and try it here, you'll notice that all of the branches on the tree are the probabilities we used to calculate how many people out of 100,000 had each of the possible conditions (have COVID-19 and a positive test, etc.). Be sure to put COVID-19/no COVID-19 on the first set of branches and positive/negative tests on the second set. We finished using the Bayes tree method in this last step, where we found the proportion of COVID-19-infected people with a negative test to the total number of people with a negative test (this is the probability of COVID-19 given a negative test).

If I have COVID-19, what is the likelihood that the test will detect it?

That's just the sensitivity you gave above (%). So out of every 100 people who have COVID-19 will be detected by the test, and won't. This is where things become important from a public health perspective. If the tests detect some COVID-19 cases, but people go out more often or do more high-risk activities due to faith in the negative tests, this can result in more infections. Given the values here, people need to increase their risk of becoming infected/infecting others by a factor of before the tests become useless to them. If they increase their risk by more than that, the tests are causing more cases than they prevent. A more nuanced treatment is given below. You can change the sensitivity to explore how this factor changes.

If I get a positive test, what is the likelihood that I actually have COVID-19?

It changes drastically depending on all three values you entered. It's the number of COVID-19-infected people with a positive test divided by the total number of people with a positive test. In this case, that probability is %.

How much does my risk of spreading COVID-19 change when I take rapid tests?

This is a simple question with an extremely complicated answer. Because the prevalence changes over time and by location, and because different events take place in various conditions (indoors/outdoors, with/without masks, etc.), this is impossible to answer precisely. However, with a few unrealistic simplifications, we can get a ballpark estimate with minimal effort.

Below is a list of the number of people present at five events I might have attended before tests were widely available:
Event 1A:
Event 2A:
Event 3A:
Event 4A:
Event 5A:

But maybe if tests are available, I feel comfortable enough to go to a wedding with 50 people in addition to my usual interactions. So here is a list of the number of people present at five events I might attend if everyone who attended was required to take a test, myself included:
Event 1B:
Event 2B:
Event 3B:
Event 4B:
Event 5B:

Now let's make a few unrealistic assumptions:
(1) The prevalence has been, is, and always will be exactly what we entered in the box at the top of this page. And that is true for every location, including where the gathering takes place and where the participants originate.
(2) If we want some estimate of relative transmissibility risk, then every event occurs under the same conditions. (e.g. all of them are outdoors with the same proportion of maskless guests, there is no food served, and people aren't maintaining distance from each other). It doesn't matter what the conditions are. It just matters that the risk of the virus being passed from one infected person to another is the same.
(3) Everyone at the second group of events uses a test with the sensitivity entered at the box at the top of this page.
(4) The events occur infrequently, and the guests don't have contact with each other outside of these events.
(5) The people at the events are vaccinated at the same proportions as people in the population used to measure the incidence.

Even with these assumptions, we can't answer the questions "How likely am I to get COVID-19 at these events?" or "How many people am I expected to infect at these events?" I am not equipped to answer those questions, and I doubt even epidemiologists would be confident in any estimates they could give you. But we can answer a related question to get a rough idea: "How likely is it that someone with COVID-19 was present at an event I attended?"

You can also use this calculator to calculate risks for individual events with and without required testing. Just enter the number of guests for Event 1A and 1B, and enter 0 as the number of guests for the rest of the events.

Click "Find out." to get the answers.

For the first set of events, the probability of COVID-19 being present for at least one of the events is %.

For those interested in the math, the formula is based on the prevalence and number of guests. If g is the number of guests at a given event, then the probability that no one has COVID-19 there is (1-p)^g. This follows from an expansion of the multiplication rule mentioned above. (1-p) is the probability that any given individual guest is not infected. So, the probability that the first guest does not have COVID-19 is 1-p. The probability that both the first guest and the second guest don't have it is (1-p)(1-p) or (1-p)^2. The probability that the first three don't have it is (1-p)(1-p)(1-p) or (1-p)^3. You can do this as many times as you like, so (1-p)^g is the probability that none of the guests are infected. If r1 is the probatility that no one at the first event is infected, (1-p)^g, and we calculate r2, r3, r4, and r5 for the rest of the events, then we can multiply r1 by r2, r3, r4, and r5 to get the probability that no one is infected at any of the events. Finally, to get the probability that anyone is infected at any one of the events, we just subtract that from 1.

For the second set of events, we may be willing to take a chance with more guests, but the tests are filtering out some of the people who had COVID-19 and hopefully offsetting the risk. In this case the probability is %.

This time, if s is the sensitivity, then the formula is (1-p(1-s))^g for the probability of no one being infected at a single event with g guests. In this case, each guest who brings COVID-19 to a gathering would have to be infected and also not be sensed by the test in order to attend. p(1-s) is probability that any individual guest is both infected and in the proportion of infections not sensed by the test.

What happens if I attend events regularly (e.g. daily work, weekly dinner with a friend)?

This has a big effect. I had friends who were worried about their risk during peaks in new daily infections. With a little bit of mathematical modeling, I was able to show them that their usual activities over the past several months of low prevalence had actually put them at higher risk for catching COVID-19 than the contact they needed to endure over the next few weeks at their jobs. If you are feeling stressed about high prevalence in your area, it may be helpful to assess your risk for previous regular activities to put things in perspective. For a quick and dirty approximation, you can still use the calculator above. Just add up all of the probabilities for every activity you do in a given time frame. Keep in mind that the prevalence may change during that time, and you will need to guess what the average prevalence might be to get a better estimate. This method could give you a result of more than 100%, though, if the activities are particularly high risk (many guests, high prevalence, or many repetitions). This is because adding the probabilities like this ignores the fact that COVID-19 could be present at multiple gatherings. The difference between this approximation and your actual risk of exposure to COVID-19 increases as your risk increases, because this makes it more likely that COVID-19 is present at multiple events.

To get a satisfactory answer to a question like this, you will have to do only a little work yourself with three steps.

(1) For each event, use the calculator to calculate the probability that no one there has COVID-19. Just use zeros for the number of guests in all but one event. then the value you need for the first event is: , and for the second event, it is (this one includes testing). Recalculate as many times as you need for all of your events and write down the results along with how many times the event will happen. The "Find out" button also registers new prevalence and sensitivity values, so you don't need to click the "Calculate" button again if you change those.

As an example, Let's say I have a weekly dinner with 4 friends from different households, where no one is tested. And I go on business trips twice this month, meeting with 8 clients per trip, with everyone including myself tested with a test at 70% sensitivity. Let's also say that 160 per 100,000 people has COVID-19 in my area. Each dinner has probability 0.993615 that no one is infected, while each business trip has 0.996166. I get these values if I enter 160 for prevalence, 70 for sensitivity and then 4 for 1A, 8 for 1B, and 0 for the number of guests at every other event. Keep in mind that if the prevalence changed over time, you can enter different prevalences for your calculations and customize everything to your liking. Just keep track of how many occurrences of the events took place with each set of values for the next steps. Maybe make a table or spreadsheet document.

(2) Multiply all of those probabilities together.

For this month, the dinner happens four times, and the business trip two, so for this month I get: 0.99362*0.99362*0.99362*0.99362*0.99617*0.99617 (or 0.99362^4 * 0.99617^2) = 0.967. (Hint: It always works to use the number of occurences as an exponent like this.) In this case, the result is 0.967.

(3) The probability of COVID-19 being present for at least one of your events is then just 1-X, where X was what you got in the last step. Multiply by 100 if you want a percentage.

For the example, we get 1-0.967=0.032, which is 0.032*100%=3.2%. Notice that each single event had probabilites of only about 0.5%, but they add up over time.

Update: At the time this was written, vaccination was just starting in Germany. I've added an assumption to the model for vaccination uptake. If vaccine uptake is similar to the general population used to measure the incidence, then the model will give good estimates of the risk, perhaps underestimating risk if the incidence is measured only by confirmed cases (rather than an estimate of actual cases). If vaccine uptake proportions are much better than expected at an event, the risk is overestimated by the model. If vaccine uptake proportions are worse than expected at an event, then the model will underestimate the risk.

1 comment: