Early on in the coronavirus pandemic, researchers at the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University began tracking, in near real-time, the novel coronavirus as it spread outside of Wuhan City, Hubei Province, China, and around the globe. In late January, Johns Hopkins developed an interactive, web-based dashboard to visualize their tracking efforts, and shared the underlying data to the public. This dashboard has become the de facto reference for the global pandemic that has now led to over four million confirmed cases, and claimed over 187,000 lives.
In March, the Seattle-based data visualization company Tableau launched a COVID-19 Data Hub in partnership with Johns Hopkins University, the Centers for Disease Control and Prevention (CDC), the World Health Organization (WHO) and others, providing free data resources to help the public health officials, data scientists, and others stay informed and make data-driven decisions around COVID-19.
Turbine Labs was approached by Tableau as a leading resource from which to provide the community with high-quality news coverage and AI-powered data enrichments related to the pandemic. The subsequent data and visualizations developed by Turbine Labs can be used as a standalone analysis tool, or in combination with data and visualizations contributed by other partners.
HOW DOES THE TURBINE LABS VIZ WORK?
The Tableau visualization was created by the Turbine Labs team through an integrative process of machine learning (ML) and human validation to ensure high-quality results. Representing a sample of more than 500,000 English news articles related to COVID-19, we were able to synthesize thousands of articles into trending topics and categories in the news coverage.
Our in-house journalist team, who produce a daily COVID-19 Briefing free of charge to the public, first identified top news categories such as Public Health, Business, Way of Life Disruption, Politics, Economy, etc. to be used in our model. We then used k-means clustering technique to identify diverse news articles in our dataset which were labeled into one of the categories by our team. All this human-labeled training data was used to train a ML category classifier that was then able to assign a category to all the news articles. As a final step, the application of n-gram modeling grouped words together, determined frequency of use, and scored groups of relevant articles to surface the top phrases per category in the visualization.
By leveraging human and artificial intelligence, the output produces a time sequence, mapping, and ranking of the top themes frequently discussed in news coverage. Users have the ability to determine when stories initially appeared in the media, how coverage of various topics compare with one another, and how often terminology appears in news coverage.
WHAT ARE THE USE CASES FOR THIS CLUSTERING TECHNOLOGY OUTSIDE OF COVID-19?
This dataset and the accompanying visualization serve as a real-world example of how machine learning and clustering can be used to understand key trends and patterns within large volumes of text data. However, the use cases extend far beyond the topic of COVID-19. For example, enterprises can use this technology to understand financial, competitive, and market topics to more quickly and accurately inform executive decision making. Political candidates and campaigns can quickly determine key messages that are resonating among their constituencies or among their opponents. And policy and lobbying firms can more quickly determine tide shifts on topics for which they are advocating.
In addition, other proprietary scoring attributes developed by Turbine Labs, such as content Relevancy, Impact, and Authority, can be woven into the analysis for an even more powerful perspective. When combining these attributes with sentiment analysis, our technology provides a holistic, deep understanding of public sentiment and discussion.
It is our hope that this visualization will provide the public with the vital data and information that it needs to make the best decisions possible in regards to health, safety and wellness. Staying up-to-date and informed during the pandemic keeps us all safer and more connected.