IndiCoV -a unique resource for genetic variants and annotations in SARS-CoV-2 genomes from India




Mercy Rophina
12-October 2020

The COVID-19 disease which emerged as an isolated epidemic in Wuhan province of China has now evolved as a global pandemic affecting millions of people worldwide. SARS-CoV-2 virus belonging to the family of coronaviruses has been identified as the causative organism of this novel disease. Ever since the breakout, researchers around the globe have put strenuous efforts to understand the pathogenesis and dynamics of this pathogen.

Viral genome sequencing turned out to be one of the approaches which helps in understanding the epidemiology of this pandemic. Wuhan Hu 1 isolate (GenBank NC_045512), the earliest high quality viral genome sequence shared by researchers of China is widely used as the reference template for comparing other global strains thus providing insights on the diversity and evolutionary transformation of the pathogen. Recent months have witnessed enormous genome sequence data from multiple laboratories across the world. Currently more than 1,40,000 genomes of SARS-CoV-2 isolates are available on public sources.


How exactly do we make sense of such huge data in hand?

Genomic research on SARS-CoV-2 strains isolated from India has been actively undertaken by multiple institutions and laboratories across the nation. Right from the earliest genome isolated from a student who travelled from Wuhan to Kerala, India has managed to sequence over 3600 genomes of clinical isolates from over 21 States and Union Territories majorly deposited by 20 Institutions and laboratories in the country. Currently, the major challenge was to provide comprehensive insights about the pathogen from the available data. Not being an exception from other viruses, SARS-CoV-2 also evolves through acquisition of numerous genetic mutations. Extensive phylogenetic analysis on the available resource, exposed the prevalence of novel lineages of this pathogen in Indian subcontinent.


Understanding the genetic variants - a step forward!

While a number of global resources catalog the genomes, there is a paucity of resources providing access, analysis and annotation of genomic variants. In depth analyses on genetic variants have contributed to the identification of novel variants appearing at higher frequencies in our country in comparison with other global nations. Some variants were found frequent in particular geographical locations within our country too. An appropriately organized and highly annotated variant resource would serve helpful to make best use of the available genomic data. To this end we have proposed a collaborative resource IndiCoV, which provides systematically analysed and annotated genomic variants of SARS-CoV-2 genomes from India. Genomes were compared with Wuhan-Hu-1 reference to ensure uniform interpretation and the resulting genetic variants were integrated with a range of annotations. Annotations on a variety of functional relevances including Protein Domains, Secondary structures, Potential Epitopes, Allele frequencies, Diagnostic Primers/Probes and probable Error prone/Homoplasic sites were catalogued from various public resources and custom databases.


This resource, we believe, would serve beneficial to the scientific community in understanding the potential impact of the genetic variants carried by this pathogen. Immediate clinical implications include identifying the effect of these genetic variants in target sites of molecular diagnostic probes. Provision of statewise allele frequencies exposes the pattern of prevalence of these variants across the nation. Information on Protein domains and Secondary structures would throw limelight on the functional impact and pathogenesis.


We ensure to acknowledge the contributions of all the Institutions and laboratories who provided genomic data for the scientific community. 

IndiCov is one of the first and most comprehensive resources providing a standardised interface to access information on genomic variants in SARS-CoV-2 genomes. 

The web interface to the resource is available at http://clingen.igib.res.in/indicov

We are glad to collaborate and invite individual researchers and research groups to render their contributions to further enrich this open platform thus lead to better scientific approaches in understanding and dealing with the current pandemic.



About the author
Mercy Rophina is a PhD candidate at the CSIR Institute of Genomics and Integrative Biology. All opinions expressed are personal. She can be reached on twitter handle @mercy_rophina

Comments