# Probabilities Address Your Burning Questions About COVID-19

## By Jen Ciarochi on Apr 15, 2020

The timeline of the COVID-19 chaos began roughly on December 31, 2019, when the Chinese city of Wuhan first reported an outbreak of a novel coronavirus. Within 3 months, a small number of infections—linked to a local seafood market—had ballooned into a pandemic.

About a third of the global population is now subject to mandatory quarantines or border closures. In the United States, which accounts for more than a quarter of the world’s confirmed infections, most states have implemented stay-at-home orders^{[1]}.

**Consequently, many highly anticipated events across the globe have been postponed or cancelled until further notice—including the Summer Olympics, Cannes Film Festival, March Madness, the Masters Golf Tournament, the Kentucky Derby, and the CNN Democratic debate.**

Those who have been economically impacted (or just generally disappointed) by this turn of events may now be wondering:

- Could these events really perpetuate the pandemic?
- What are the risks at small gatherings?
- What are the implications of state-by-state epidemic differences?
- Does social distancing just delay the inevitable?

Probabilities can be used to address these questions and illustrate the reasoning behind event cancellations and social distancing policies.

## Crunching the Numbers

Let’s say you’re a Belieber—a Justin Bieber stan^{[2]}. You’ve scored a ticket to see the Biebs perform during his upcoming 2020 Changes tour. You’ll be sharing an arena packed with about 16,612 other fans^{[3]}.

What is the likelihood of one of your fellow Beliebers being infected with COVID-19?

First, you need to know the probability of a single person in the United States being infected:

- The total population in the United States is around 327,200,000 people.
- As of April 13, the WHO has reported 524,514 confirmed infections in the United States.

Assuming these numbers are correct (more on this in the Caveats section):

- 524,514 infections / 327,200,000 (total US population) = 0.0016, the ratio infected.
- Thus, the ratio uninfected is 0.9984 (1 - 0.0016).

Percentage-wise, this equates to around two tenths of a percent of the population being infected, and the other 99.8% being uninfected. Taken at face value, this sounds pretty promising so far—but wait, there’s more!

To determine the probability of at least one person at the Bieber concert being infected:

- Raise the uninfected ratio (0.9984) to the power of the crowd size (16,613)
- 0.9984
^{16,613}= 0.0000000000027980808, or 2.7980808e-12

- 0.9984
- Then, subtract the result from 1
- 1 - 0.0000000000027980808 = 0.99999999999

In other words, there is a 99.999999999% likelihood of an infected stan being in attendance. Yikes!

Once they’ve infiltrated the crowd, the infected stan in the relatively confined, dense concert population would likely come well within the risky 6-foot radius^{[4]} of many other concertgoers. After all, crowd-surfing, snack purchasing, hitting up the merchandise table, and screaming “I LOVE YOU, JUSTIN!” are just a few of the infection-promoting activities to account for.

A Justin Bieber concert is actually on the smaller side of the high-profile events that have been postponed due to COVID-19. Consider Summerfest, which attracts about 800,000 people. Not only is the probability of an infected person attending essentially 100%, but about 1,282 infected people would be expected to attend (based on the national infection rate).

Expected audience sizes (far right) at various cancelled events in the United States

## A Look at CDC Recommendations

In communities with minimal to moderate transmission levels, the Centers for Disease Control and Prevention (CDC) are now advising against gatherings of more than 250 people. For organizations serving the elderly and other high-risk populations, the recommended cap is only 10 people. In communities with substantial transmission levels, current CDC guidelines discourage gatherings of any size.

Following the same procedure used to plot the likelihood of an infectious attendee at various events:

- At a 250-person event, there is around a 33% chance of at least one attendee being infected, which is almost certainly unacceptable for a high-risk person.
- For groups of 10 people, the odds are about 2%. This may still be too risky for people older than 80, as the mortality rate for patients in this group is thought to exceed 10%.

~~Safety~~ Danger in Numbers: Are Small Gatherings Safe?

At this point, avoiding a large event—like March Madness—may seem obvious and intuitive. Multiple trips to the grocery store, on the other hand, may still feel inherently justifiable. Are such small gatherings really benign?

Because the COVID-19 data change every time one blinks, it can be helpful to examine the relationship between risk and event size across a large range of infected population sizes.

Let’s take a look at smaller gatherings in the United States, those of less than 100 people. The lines in the chart below represent different likelihoods. For example, the green line indicates a 0.5% chance of at least one attendee being infected.

If there are 600,000 infections and you are comfortable with a 0.5% risk of an infected attendee, you should avoid events exceeding about 3 people. If you can tolerate a 10% risk, this crowd cap increases to about 58 people.

While a 10% risk may seem modest, remember that your expected risk increases with each gathering you attend.

For example, if you attend three events and there is a 10% chance of an infected attendee at each event, the expected likelihood of an infected attendee at one of these events is about 27.1% (1 – 0.9^{3}). Attending 10 or 20 such events equates to an expected likelihood of about 65.1% or 87.8%, respectively.

Therefore, it is arguably wise to limit even outings that are individually low-risk—like grocery store runs.

## State-by-State Comparisons

So far, these risk calculations are based on the number of infections in the United States as a whole, but what if you live in New York or California? What about Wyoming or Georgia? Across these states, the COVID-19 situation is radically different.

__New York__

In the state of New York, 195,031 infections have been confirmed as of April 13; this is by far the greatest number of infections in any US state (New Jersey, the second most afflicted state, has 64,584 confirmed infections). The population of New York is about 19.5 million people, which makes the ratio infected 0.01—or 1%. Recall that the national average is only two tenths of a percent. New Yorkers are, therefore, expected to be at a greater-than-average risk of exposure than residents of other US states. Focusing on the Y-axis of the chart below, gatherings in New York would need to be restricted to about 10 people in the current situation to avoid exceeding a 10% risk of an infected attendee. New Yorkers who are only comfortable with a 0.5%-1% risk, on the other hand, need to get pretty comfortable hanging out with themselves.

__California__

Fortunately, the situation for Californians is less bleak. The state population is about 39.51 million people, with 23,608 confirmed infections, so the ratio infected is about six hundredths of a percent. Here, the Y axis is much more forgiving. In the current situation, events of 174 people are still within the 10% risk threshold. However, for a risk of 0.5% or 1%, gatherings need to be limited to 9 or 17 people, respectively.

__Wyoming__

Wyoming has an even lower infected ratio than California, likely owing to its sparsely distributed population of around 0.58 million people (in fact, the only US state with a lower population density is Alaska). With 270 confirmed infections, the infected ratio in Wyoming is around four hundredths of a percent. Here, the crowd caps are similar to California’s in the current situation. If there are 300 infections in the state, there is a 10% likelihood of an infected attendee at events with 203 people and a 0.5% likelihood at events with 9 people.

__Georgia__

In Georgia, the COVID-19 situation is comparable to the national average. Georgia’s population is around 10.62 million people, with 12,759 confirmed infections; thus, the ratio infected is about a tenth of a percent. At 13,000 infections, gatherings of more than 4 people surpass a 0.5% likelihood of an infected attendee, while those of more than 86 people exceed a 10% likelihood.

It is probably obvious that these state-by-state comparisons still do not capture the full picture. Residents of New York City, for example, are at a much greater risk of exposure than residents of more rural areas in the state.

However, these plots do demonstrate a few important points. For one, population density is intimately tied to risk. Additionally, public health interventions, like social distancing, can greatly mediate disease spread. For example, California’s early and robust social distancing measures likely allowed it to keep the infection rate low, despite its dense population.

At the same time, this also makes it likely that California has a greater susceptible population than states that were slow to impose such measures—simply because a lower proportion of the population has been exposed. Californians would be wise to limit their public outings, even after social distancing measures have been relaxed. In particular, tourist attractions—like beaches—could threaten the local population with a second wave of infections imported by travelers.

## Social Distancing and the Reproduction Number

A recent Triplebyte article covering mathematical modeling of COVID-19 and other infectious diseases introduced the **basic reproduction number (R _{0})**, a metric of how contagious a disease is.

R_{0} represents the average number of people that an infected person will infect over the course of their infection, provided there is no immunity in the population and no public health interventions are in place. Here, immunity could come from previous exposure or vaccination, and public health interventions may include social distancing, quarantines, travel restrictions, and stay-at-home orders.

When interventions are underway or immunity is present in the population, this quantity is referred to as the **effective reproduction number (R)**.

Like R_{0}, R is not rigid and inflexible—but highly malleable—and can differ across regions, populations, seasons, and time periods.

The goal of social distancing and other interventions is to lower R. Once R is below 1, infected people are generally no longer infecting others and outbreaks taper off. The less palatable way this can occur is through depletion (i.e., recovery or death) of the susceptible population as the virus courses through it.

Based on the early epidemic value of around 2.5, transmission must be reduced by about 60%^{[5]} to push R below 1. This is a lot to accomplish with social distancing, but China’s rebound is an encouraging example. The question, then, is not whether we have done too much, but whether we have done enough and done so with enough urgency.

There are still many important unknowns that factor into the progression of the epidemic and the effectiveness of social distancing. For example, it is unclear what percentage of those infected show symptoms, how soon and for how long an infected person is contagious, and what level of immunity an infected person has after they recover. All of this information is critical to the pattern of disease spread.

In any case, social distancing can only improve what has become a grim situation.

## Flattening the Curve: No, You’re Not Just Delaying the Inevitable by Staying Home

To clarify what *is* inevitable in this situation: it is inevitable that many, many people in the United States and across the world will become infected with COVID-19.

What *isn’t* inevitable is an overtaxed healthcare system and a shortage of medical supplies and workers. If the healthcare system is overwhelmed, deaths from both coronavirus and other diseases will increase. A patient who suffers from a stroke, for example, may struggle to receive adequate care. People with cardiac issues, cancer, and other conditions that increase the risk of severe COVID-19 infections must also weigh the benefits and risks of important treatments.

By acting now to reduce the transmission rate, these scenarios can be avoided. This is the crux of “flattening the curve,” a prominent concept in epidemiology that is now being widely circulated in media coverage of the COVID-19 pandemic. You may have seen a diagram that looks something like this:

The area under these curves should really overlap more, but you get the idea...

The green curve represents how the epidemic is likely to play out over time if social distancing measures are implemented for the duration of the epidemic. If such measures are relaxed within a few months, a second spike in cases is also likely in autumn, but the peak should be much shorter than the red curve (the likely epidemic progression in the absence of social distancing).

You might notice a couple of things here about social distancing:

- It extends the duration of the epidemic
- It does not necessarily reduce the number of infections, but it does spread the infections out over time, in addition to delaying and substantially reducing the peak.

What isn’t shown in these curves is the effect of social distancing on the mortality rate. When it comes to preventing deaths, the speed at which new infections occur is critical. Slowing down the rate of new infections prevents an epidemic peak that overwhelms healthcare services (i.e., the red peak).

As such, healthcare providers are better able to treat all their patients—including those suffering from COVID-19—and better poised to avoid becoming infected themselves. Although a vaccine will likely not be available for at least a year, social distancing also buys more time for treatments and vaccines to be developed.

## Caveats

This analysis uses simple probabilities to illustrate important points about social distancing and disease spread. It is by no means intended to be a comprehensive disease model. Like most projections of disease spread, this analysis is rich with caveats:

- It can be difficult to accurately estimate crowd sizes (it’s hard to forget the infamous controversy surrounding presidential inauguration crowds). Additionally, some events occur over multiple days. However, in this case, the result was essentially the same for any event larger than 5,000 people—a 100% chance of an infectious attendee.
- These calculations assume that infected and uninfected people are equally likely to attend a given event. In reality, some infectious people would be expected to stay home due to symptoms; this discrepancy could lead to
*overestimation*of the probability of an infected attendee. HOWEVER, see point 3. - These calculations are also based on the number of confirmed infections in the United States on April 13, 2020 (based on WHO data). The actual number of infections is probably much higher, due to mild and asymptomatic cases, limited testing capacity, and unequal access to the healthcare system (among other factors); this could lead to
*underestimation*of the probability of an infected attendee, counteracting the effects of point 2. - These calculations are based on confirmed infections at the national and state levels, and do not account for county-by-county differences—which can be significant. To inform your personal decision-making, it is prudent to consider local statistics (recognizing that data quality varies by state, county, and healthcare center).
- This method of calculating probabilities only works for a sufficiently large population. In a much smaller population (i.e., 4), it is necessary to account for individuals being removed from the general population as they are added to the event crowd.
- The calculations don’t factor in people who have recovered from the disease and are no longer contagious. However, little is known about COVID-19 immunity and duration of infectiousness, so it’s probably best to err on the side of caution by considering them infectious anyway.
- These probabilities reflect the likelihood of at least one infected person attending a given event, not the actual likelihood of becoming exposed and infected due to attending said event. Nonetheless, transmission is highest in dense, confined populations, so there is enhanced potential for viral spread in these situations.

## Closing Remarks

To summarize:

- Yes, your favorite event probably had to be canceled. In this situation, patience can save lives.
- Smaller events are not necessarily low-risk, particularly if many smaller events are attended.
- Risk varies by region and is influenced by population density and public health interventions.
- Regions that implemented robust and early public health interventions are in a better situation now, but may be particularly susceptible to a second wave of infections once social distancing measures are relaxed.
- Social distancing saves lives and improves the capacity of the healthcare system.

__Coding challenge__: can you use public data about infection levels to build a web app that displays these probabilities in real time?

You can access The New York Times county-level data on COVID-19 in the United States here.

How else could you use this data? Let me know at jen.ciarochi@triplebyte.com!

^{[1]}The handful of straggler exceptions include Arkansas, Iowa, Nebraska, North Dakota, and South Dakota.↩

^{[2]}Stalker fan.↩

^{[3]}Based on the average crowd size of 16,613 during the American leg of Bieber’s 2016 Purpose World Tour.↩

^{[4]}According to the Centers for Disease Control and Prevention (CDC), person-to-person transmission of COVID-19 is most frequent among people who are in close contact (within 6 feet of one another).↩

^{[5]}Possibly less, if transmission potential is lower during summer months. ↩

## References

Allain, Rhett. 2020. “The Promising Math Behind ‘Flattening the Curve.” WIRED, March 24, 2020. https://www.wired.com/story/the-promising-math-behind-flattening-the-curve/.

Anderson, Roy M., Hans Heesterbeek, Don Klinkenberg, and T Déirdre Hollingsworth. 2020. “How Will Country-based Mitigation Measures Influence the Course of the COVID-19 Epidemic?” The Lancet 395 (10228): 931–934. https://doi.org/10.1016/S0140-6736(20)30567-5.

Downey, Maureen. 2020. “Scientists do the math to show how large events like March Madness could spread coronavirus.” Atlanta Journal Constitution, March 12, 2020. https://www.ajc.com/blog/get-schooled/scientists-the-math-show-how-large-events-like-march-madness-could-spread-coronavirus/g1pVdzQgJS5aoPnadBqyXO/.

Editors, Vulture. 2020. “All the Live Events, Movie Releases, and Productions Affected by the Coronavirus.” Vulture, April 7, 2020. https://www.vulture.com/2020/04/events-cancelled-coronavirus.html.

“Get Your Mass Gatherings or Large Community Events Ready.” 2020. Centers for Disease Control and Prevention, March 15, 2020. https://www.cdc.gov/coronavirus/2019-ncov/community/large-events/mass-gatherings-ready-for-covid-19.html.

Kaplan, Juliana, Lauren Frias, and Morgan McFall-Johnsen. 2020. “A third of the global population is on coronavirus lockdown — here's our constantly updated list of countries and restrictions.” Business Insider, April 7, 2020. https://www.businessinsider.com/countries-on-lockdown-coronavirus-italy-2020-3?r=US&IR=T.

Mervosh, Sarah, Denise Lu, and Vanessa Swales. 2020. “See Which States and Cities Have Told Residents to Stay at Home.” The New York Times, April 7, 2020. https://www.nytimes.com/interactive/2020/us/coronavirus-stay-at-home-order.html.

“Novel Coronavirus (COVID-19) Situation Dashboard.” 2020. World Health Organization (WHO), April 2020. https://who.sprinklr.com.

The New York Times. “Coronavirus in the U.S.: Lates Map and Case Count.” The New York Times, April 13, 2020. _https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html#states.

“Using Mathematical Models to Assess Responses to an Outbreak of an Emerged Viral Respiratory Disease: The Reproduction Number.” 2006. Australian Government Department of Health, April 2006. https://www1.health.gov.au/internet/publications/publishing.nsf/Content/mathematical-models~mathematical-models-models.htm~mathematical-models-2.2.htm.

## Discussion