Applied AI • March 25, 2023

Drive Efficient Drug Discovery with AI-powered Virus Mutation Prediction Leveraging Sequencing Data

The COVID-19 pandemic, caused by the SARS-CoV-2 virus, became an unprecedented global health crisis with the virus going through a series of mutations and resulting in a variety of new effects and symptoms for its host. The impact of the virus increased significantly with each new variant, leading to over six million deaths across the world till date. This surrounds the planning and execution of the preventive measures with uncertainties and challenges.

The scale of the pandemic and virus transmission made it vital to leverage digital technologies such as Vertex AI to implement the AI and ML workloads to help the world build resilience and preparedness for the future. Quantiphi worked on bringing certainty to pandemic response planning by enabling the prediction of probable future virus variants through the studies of different spike proteins and their structure. Quantiphi also compared their binding affinities with antibodies, a few antiviral drugs, and human ACE2 through various docking workflows, leading to successful simulated harmful variants.

Understanding SARS-CoV-2

SARS-CoV-2 binds with the human cells with the sole purpose of utilizing the advanced cell mechanisms to rapidly reproduce and spread throughout the body. SARS-CoV-2’s structure is instrumental in giving it the ability to penetrate human cells. The virus consists of a spherical coating that contains its genetic material, and this coating is covered with numerous proteins called Surface-Glycoproteins or the Spike proteins.

The interaction of the spike protein with the human’s h-ACE2 is the root cause of the virus’s entry into the host cells. The structure of the protein is instrumental in binding well with the proteins, giving it a structure analogous to lock and key. The h-ACE2 can bind even with spike proteins with slight variations from the requirements, letting the virus enter.

The closer the structure of spike protein is to the requirements, the better is its binding ability. The binding of the spike proteins strengthens as the structure of the protein suits the needs for binding, resulting in greater infection. In order to improve the lock and key arrangement, the virus gives birth to new variants to bind better with the proteins.

Vaccines to the rescue

Vaccines produce antibodies to make the human system immune to the infection, avoiding infection from the virus. The developed antibodies bind with the virus before allowing it to bind with the h-ACE2. The immune response is generated within the human body once the antibodies bind with the spike protein.

The antiviral drugs prescribed for treating Covid-19 performed a similar action like the vaccines. These drugs contained special compounds, known as ligands, that bind with spike proteins preventing them to bind with h-ACE2. Once the spike proteins become incapable of binding with h-ACE2, they cannot penetrate the human cell, which helps in hindering the virus replication process, thereby breaking the reproduction chain.

Binding can be studied based on the binding affinity of the protein-protein complex or protein-ligand complex which is usually measured by Binding Free Energy(BFE).  The binding energy is an exothermic reaction showing the stability of the bond. These energy calculations were also studied in this feasibility study to provide an idea of the protein-protein bond and the protein-ligand bond. 

This binding energy forms an important part of a study as it forms the basis of the virus infectivity. The efficacy of the vaccines and the drugs can be estimated based on the binding affinity of the spike proteins with the antibodies and ligands. Virtual screening of the variants against antibodies and h-ACE2 to compute the respective BFEs speeds up the study and the process. By screening the variants against the set of antibodies and h-ACE2, binding affinities are calculated to determine the set of more infectious and harmful variants and the set of robust vaccines and drugs.

Quantiphi’s Approach

Quantiphi developed various workflows and pipelines leveraging Vertex AI and other tools to understand the interaction of spike proteins with antibodies, h-ACE2, and antiviral drug ligands. Based on that understanding, we performed an extensive study to check the feasibility of predicting future SARS-CoV-2 variants and the efficacy and robustness of existing vaccines and antiviral drugs.

The workflow consists of the following:

  1. Variant Generation Simulation - The evolution of the SARS-CoV-2 virus, including its mutation and replication process, was simulated. The simulated variants for SARS-CoV-2 gene sequences were produced. The original spike protein and omicron spike protein were used as seed sequences in the Mutation Simulation pipeline to generate the simulated spike protein nucleotide variants. Then, amino acid sequences for the spike proteins were obtained from the Protein Translation pipeline.
  2. Protein-Protein Interaction Simulation - It used the Sequence-Based Protein Screening pipeline to screen the variants against antibodies and h-ACE2. The variants generated from the Variant Generation Simulation workflow were used to compute the binding affinities for the protein-protein complexes. The interaction was simulated and simulated spike proteins variants were screened against antibodies and h-ACE2, producing the binding affinities in a quantitative capacity. Unlike the Protein-Protein Docking Simulation workflow, this workflow prioritizes speed over accuracy while computing binding affinities.
  3. Protein-Protein Docking Simulation - It used the Structure-Based Protein Screening pipeline to screen the variants against the antibodies and h-ACE2. The variants generated from the Variant Generation Simulation pipeline were used to compute the binding affinities for protein-protein complexes. Structure-based simulation of the docking of antibodies and h-ACE2 into variant spike proteins was done. All the variants were screened against antibodies and h-ACE2. It utilized the functionalities of ClusPro for docking and AlphaFold for structure prediction. Unlike the Protein-Protein Interaction Simulation Workflow, this workflow involves protein structure prediction and prioritizes accuracy over speed of binding energy calculation.
  4. Protein-Ligand Docking Simulation - It used the Ligand Screening pipeline to screen the variants against antiviral drugs and compute the binding affinities for the protein-ligand complexes. The variants generated from the Variant Generation Simulation pipeline were used to compute the binding affinities for protein-protein complexes. The spike protein variants against antiviral drugs were docked and simulated using the AutoDock tool. It employed AlphaFold for variant spike protein structure prediction and produced binding free energies of the protein-ligand complex.
  5. Future Variants Prediction - It used the Screening Results Analysis pipeline to analyze the binding affinities of the spike proteins with antibodies, antiviral drugs and h-ACE2 and drew comparisons to identify the probable variants that were potentially harmful and infectious to the host. It identified two simulated variants that were generated from the Variant Generation Simulation workflow as probable future variants of concern.
  6. Mutation Metadata Analysis - It used the Mutation Metadata Extraction pipeline to extract the relevant mutation data for each variant generated from the Variant Generation Simulation workflow. The Binding Affinity Interpretation pipeline churned the metadata into various perspectives for analysis. It produced analysis on mutation type, occurrence percentage and sequence coordinates. It correlated factors like frequency of occurrence and region of mutation to spike protein binding affinities with h-ACE2 to study and rank the importance of the aforementioned factors in their contribution to variant infectivity.

Quantiphi built various workflows and pipelines for variant generation, docking, screening, and mutation analysis. The feasibility study was conducted to identify future potentially harmful variants by simulating and studying its impact on infectivity, and vaccine and drug efficacy. The study concluded with building a pipeline to detect future candidates of variants of concern and perform mutation analysis for rapid drug discovery and development.

Get in touch with our experts to understand sequencing data with AI/ML and be equipped for pandemic response and preparedness.

Written by

Aditya Sharma

Thank you for reaching out to us!

Our experts will be in touch with you shortly.

In the meantime, explore our insightful blogs and case studies.

Something went wrong!

Please try it again.