Five ways we’re using data science during the COVID-19 pandemic

By Alison Donnellan

30 June 2020

6 minute read

From analysing the SARS-CoV-2 genome to outlining in-demand jobs, we break down five ways we’ve employed data science in response to the COVID-19 pandemic.

Data science is an important weapon in the fight against COVID-19, one of the largest global emergencies of this century.

Artificial intelligence, machine learning, natural language processing and computational modelling are some of the tools scientists are employing. Data science helps inform medical organisations, industry leaders and governments, especially as the world figures out how to recover from the pandemic.

Here are five ways we’ve used data science in response to the pandemic.

Tracking the spread of COVID-19 in Australia

Our Australian e-Health Research Centre developed an Australian COVID-19 dashboard for the use of state and federal health agencies. The dashboard collects and analyses data from here and around the world to identify, analyse and forecast trends in the global outbreak. It also identifies COVID-19 case hotspots.

The dashboard includes visualisations, graphs, interactive maps and models. Some of these are informed by statistical and epidemiological models and have included visualisations of how SARS-CoV-2’s reproduction rate – also known as R0 – has tracked.

graphic visual representation of coronavirus that causes covid-19

Our researchers are studying the SARS-CoV-2 virus in order to fast track vaccine development.

Genomics software analysis for genetic codes

Understanding the genetic code of SARS-CoV-2 required a dataset with billions of data points. This dataset is a result of the virus’ 30,000-letter genome and corresponding tens of thousands of sequences. It’s a huge amount of data and impossible to assess using traditional methods and application of standard statistical models, making machine learning crucial to the process.

Dr Denis Bauer is a transformational bioinformatics specialist and researcher at our Australian e-Health Research Centre.

“Because the virus genome works as a whole, a change in one location of the genome can be boosted by a location several thousand letters away,” Denis said.

“Therefore the SARS-CoV-2 genome must be analysed as a whole. So machine learning methods are needed because they’re particularly good at forming complex interactions and getting information from them.”

VariantSpark is a software platform. It employs a tailored Apache Spark-based machine learning framework to create insights from high-dimensional data.

In this instance, VariantSpark allows the team to identify genetic mutations that cause the virus to behave differently. An example is the D614G mutation, which may allow the virus to spread more easily. To date, there is no clear evidence of whether changes in the virus’ genome have an impact on the disease’s outcome. Investigations are currently limited by the lack of strong data.

Researchers are asking the international community to share more genomic sequences of the virus. This includes de-identified information about the disease’s clinical symptoms and co-morbidities. Researchers can then monitor the disease’s changes and form a better understanding of how important genetic differences are to its progression.

Supercomputing and modelling

Researchers need a massive amount of data to create a visual simulation of a molecule. And that’s just the static 2D kind. To accurately show the process of the SARS-CoV-2 spike protein binding to a human cell through 3D animation, we need a combination of advanced data science and supercomputing.

Data 61’s Dr Michael Kuiper has employed both to create an animated replica of SARS-CoV-2. This identifies the regions of its proteins that could be good targets for a drug or vaccine.

“The advantage of using a 3D simulation designed by real-life data is you can optimise the structure in real-time to see how it fits. And you can easily step back if you’ve made a mistake,” Michael said.

“This helps us decide what to make in real life to test in the lab, as that is the most expensive and time-consuming part. So this greatly speeds up the drug development process.”

We use one of our high-performance computers, Bracewell, which generates results much faster. The device’s hundreds of central processing units (CPUs) process the huge amount of data needed to scale SARS-CoV-2 simulations. Virtual reality and high-performance scientific computing are helping scientists understand how COVID-19 behaves.

A screenshot of a virtual reality platform which highlights data science and SARS-CoV-2.

Dr Michael Kuiper and his US-based colleague Dr Michael Bishop exploring the structure of the virus using virtual reality. This is on the platform Nanome.

Detecting and preventing the spread of misinformation

We have an algorithm that can pinpoint misinformation on social media platforms. We used it in 2019 to identify bot activity on Twitter. But it recently received an update to recognise a disturbing COVID-19 trend. Disinformation is the deliberate intent to spread incorrect information, creating echo chambers and increased polarisation online.

The tool uses machine learning (ML), artificial intelligence (AI) and natural language processing (NLP). It analyses data from posts, identifying false and dangerous narratives or factual information taken out of context and divisions.

Dr Mehwish Nassim, co-developer from Data 61, said the insights the tool generates could inform users about what information is factual or misleading.

“This ultimately prevents the creation of echo chambers and the increase of misinformation,” Mehwish said.

“We’re looking at various ways to reduce the effects of this spread. The most important one is educating users. Misinformation requires an immediate response.”

Linking job demand with available skills in the disrupted labour market

We have an online platform designed to connect job seekers with high-demand areas in the COVID-19 disrupted labour market. And it’s using data science to create these insights into an online dashboard.

Created by Data61, in partnership with Adzuna Australia, it analyses information from thousands of online job ads posted on websites across the country. These insights show what types of roles are experiencing increased demand, where these jobs are located and what skills employers are looking for.

For example, the website reveals that an aircraft maintenance engineer might use their skills as a sterilisation technician. Whereas a travel attendant could become an aged and disabled carer, as they many of the same skills needed. And waiters from formerly busy restaurants can efficiently become pharmacy sales assistants.

Dr Claire Mason, from Data61’s Insights Team, said the platform can also support workers considering new career pathways or upskilling opportunities.

“Job seekers, employers and government agencies can use them to support the Australian workforce to stay strong in these challenging times. And to help with these transitions, disrupted workers need to know which roles are experiencing increased demand,” she said.

A screenshot of graphs which outline critical skills needed in a post-pandemic labour market.

Insights provided by the ‘Fast-Growing Skills’ and ‘Declining Skills’ section of the Data61 Australian Skills Dashboard.

The future

There is still a long way to go. But data science is one of the tools helping the world respond to a pandemic.

We’re continuing to responsibly and securely gather and share information. And this helps us to not only overcome the virus but better prepare for the future.