27 October 2022
How high is the actual infection rate?Researchers Determine Unreported Corona Cases Using Search Analysis
To grasp the corona pandemic and evaluate the measures in the fight against it, we need to know how many people have gotten sick as exactly as possible. Yet the numbers are often incomplete, for example because at the beginning of the pandemic testing possibilities were not extensive enough, or because infected people—despite symptoms—did not get tested, or they did not have symptoms and thus did not realize they were infected.
Analyzing large amounts of data on the basis of search engines or social media has already proven effective for other illnesses, such as the flu, in tracing infection dynamics, also because the data is quickly and freely available. A research team headed by Christina Maaß from Universität Hamburg has now, for the first time, studied whether these digital data are suitable for making conclusions about the actual number of corona cases in Germany.
To do so, the researchers put together the official data from the Robert Koch Institute (RKI) and the corona searches on the high-traffic search engine Google. These data, known as “Google trends,” reveal the development of the search volume for a term within a given period. Researchers looked at the terms “loss of smell,” “loss of taste,” “test center,” “quarantine,” “corona test,” and “pneumonia.”
The study was able to detect a significant link between the search volume of individual terms and the RIK numbers during the first 3 infection waves (March–May 2020; October 2020–January 2021; February–May 2021) on both national and state levels. As official numbers rose or went down, so did the number of searches. It was especially noticeable that search terms with the strongest links to official numbers during the waves differed: the words “quarantine,” “loss of smell,” and “corona test” were the most searched for in the first, second, and third waves respectively.
Moreover, the researchers were able to show, using various statistical procedures, that there is a causal relationship between searches and registered infections as well as that the development of the search volume can be used to estimate the infection rate. On this basis, they developed a method for calculating the unreported numbers according to which the number of infected people in the first wave was 31 percent higher than the official number. In Winter 2020/21, the number of unreported cases was 43 percent and in Spring 2021, as extensive testing was getting underway, it was 28 percent.
“Using our method, we were primarily able to add to the official number the number of infected people who had weak symptoms or who due to testing capacities or regulations did not get tested or did not wish to,” explains Christina Maaß, research associate in the team within the professorship for economics with a focus on international economic relations held by Prof. Dr. Thomas Straubhaar. This method, Maaß says, is especially useful for the phases of a pandemic in which there are no unrestricted test opportunities.
“In addition, asymptomatic infected people with a mean from the results of German and international studies can be taken into account; this would be about 20 percent. In combination with our findings, the actual number of infected people would be up to 72 percent higher than the official number,” according to Maaß.
The research team sees in the big data approach a further important tool alongside those already in use. “We could show that by analyzing search requests we could draw conclusions about actual infection dynamics,” says Maaß. By further distinguishing the terms studied, for example, using machine learning, Maaß says the method could be become an integral building block in determining the infection dynamic.
Maaß, Christina (2022): Shedding light on dark figures: Steps towards a methodology for estimating actual numbers of COVID-19 infections in Germany based on Google Trends. PLoS ONE 17(10): e0276485. https://doi.org/10.1371/journal.pone.0276485