In a world-first, VariantSpark has allowed us to process one trillion points of genomic data.
A timelapse photo shows sparks flying with the VariantSpark logo edited in front

VariantSpark is helping to analyse genetic data.

Ever wondered why you are as tall as you are? Or how you inherited a physical trait that your sibling didn’t? What about why some people are more susceptible to disease? The answer lies in our biological make-up, the human genome.

By comprehending how human traits and diseases are driven at a gene level, we can not only better understand ourselves, but how we can treat or cure various diseases. However, with more than three billion letters in the human genome and any one of them a potential contributor to a disease or trait, a thorough analysis is a monolithic task.

Evaluating such an astronomical amount of data had previously been near-impossible, but we have revolutionised the approach.

How machine learning is analysing genetic data

VariantSpark is a software platform that uses a distributed machine learning (ML) framework to generate insights from high-dimensional biological data.

VariantSpark is the first method to explore all potential genetic variations and interactions. It could provide the missing puzzle piece in explaining how the genome influences complex diseases such as diabetes or Alzheimer’s.

Compared to traditional genome-wide association studies (GWAS), VariantSpark can provide crucial insights 3.6 times faster. Allowing it to more accurately identify genomic variants and scale to ultra-high dimensional genomic data in more manageable timeframes.

VariantSpark and COVID-19

Dr Denis Bauer, Head of Transformational Bioinformatics, said with its 30,000-letter genome and tens of thousands of sequences, the interactions in the COVID-19 genome were impossible to assess using standard statistical models.

By applying machine learning in the form of VariantSpark, our researchers identified mutations in the virus’s genome that could influence characteristics of the disease, such as the severity of the symptoms and infection spread.

While the analysis is still in its early days, the insights extracted by VariantSpark are playing an essential role in the identification of regions may be susceptible to a drug or vaccine.

How to access VariantSpark

VariantSpark is available on AWS Marketplace and GitHub, allowing for self-serve deployment to national and international organisations. This type of deployment is ideal for organisations looking to become more resilient in a post-COVID world.

For the full version of this article head on over to our Algorithm blog.