Predict the Flu Season

Imagine you could predict the flu season in the same way that weather forecasters can now predict the chances of rain. That’s just what epidemiologists at the Centers for Disease Control hope to be able to do. Yesterday they announced the winner of their Predict the Influenza Season Challenge, which asked participants to use digital content to predict the timing, peak, and intensity of the influenza season on a national level. The winning team was led by Jeffrey Shaman of Columbia University, who built an algorithm (see for yourself here) fed by both Google Flu Trends and CDC’s influenza-like illness (ILI) data. Current flu surveillance systems are designed to determine where and when influenza activity is occurring every season, and rely on more traditional sources of healthcare data such as outpatient illness, hospitalization, and mortality data. These traditional surveillance systems provide excellent information on flu activity that has already occurred during the season but are not designed to predict what may happen in the future. Digital information such as Twitter feeds and Google and Wikipedia searches offer epidemiologists with near-instantaneous feedback on a potentially global scale. As the winning team realized, the real trick in the analysis was to combine complementary data sets and continually fine tune the predictive models. In recent years there has been increased interest to leverage the wealth of social media and other digital data to improve our ability to survey and predict infectious disease trends. This Challenge, run by the federal agency tasked with monitoring trends in infectious diseases, marks an important milestone towards achieving that broader goal. Additionally, the way in which CDC decided to identify solutions through a prize mechanism has three important lessons for future prize competitions at HHS: It is difficult to know the source and method of good solutions . It is an important recognition that solutions to this problem could come from almost anywhere, and that there may be a multitude of pathways to solution. This was evident in the diversity of submissions, as some participants utilized Twitter feeds, others Google Flu Trends (based on Google search data), while others utilized Wikipedia searches for information. Challenges by themselves do not solve problems. The team at CDC also recognized that this Challenge was an early step in a larger development process. In many ways this Challenge helped CDC map the baseline capabilities and inform future investment strategies. Therefore the team plans on maintaining relationships with the winning teams to further develop their capabilities and discuss how digital data can be integrated into the agency’s flu surveillance system. Government doesn’t always have the data. With the wealth of government health data being released these days, it’s intriguing to see a challenge that called on participants to utilize other data sources to help government solve problems. It’s an important realization that the best insights of the future may result from marrying public and private data sources in increasingly innovative ways. The initial prediction results, although far from perfect, offer a promising path forward, not only for influenza but potentially for other diseases. The use of a prize competition further emphasizes the highly collaborative nature of data analytics, and we are hopeful to see more of these types of challenges at HHS. More information on challenges and competitions can be found on the HHS Competes page.