Building a collaborative resource for SARS-CoV-2 genomic variants and variant annotations

COVID-19 has now emerged as a global pandemic affecting lives and livelihoods across the globe. This pandemic of global proportions has also accelerated approaches to understand the pathogen, SARS-CoV-2 or the novel coronavirus and understand the dynamics and epidemiology of the epidemic at global as well as local scales.

One of the approaches where scientists have come together to understand the epidemiology of the epidemic is through genome sequencing of the pathogen. Closely following the outbreak in Wuhan, researchers from China shared the high quality genome sequence of the virus, now popularly known as the Wuhan Hu 1 isolate (Genbank NC_045512) and widely considered as the reference genome. The months which followed has seen multiple laboratories share the genome sequences of isolates from their respective countries in an unprecedented fashion, possibly making it one of the best examples of the #OpenData movement. It is estimated that more than 40,000 genomes of SARS-CoV-2 isolates are now in the public domain.

India has also been at the forefront of genomics of the SARS-CoV-2 isolates. The earliest genomes came from the patients who travelled from Wuhan to Kerala. This was closely followed by a number of labs depositing genomes of clinical isolates from across the country. As we speak today, over 600 genomes of clinical isolates are currently available in the public domain, covering over 21 states and union territories and deposited by 19 laboratories and institutions across the country. Of special mention would be the Gujarat Biotechnology Research Centre (GBRC) a research institute in Gandhinagar which has deposited over 170 genomes. Thanks to their efforts, Gujarat has the best genetic epidemiology datasets available among the many states in the country. It is estimated that over a few thousands of SARS-CoV-2 genomes would be available from India.

The availability of genomes also opens up new challenges and opportunities for researchers in the country and across the world. The major challenge being able to make sense of the genomic data available to provide interesting insights into the epidemiology and evolution of the virus. Additionally the genomic sequence can provide us interesting insights into the dynamics of the epidemic and potentially allows for molecular contact tracing. The second challenge obviously would be to make sense of the genomic variants and how the variants could potentially impact the virus and its pathogenesis as well as host response. Genomic variants could also potentially provide insights into developing better and efficient diagnostics as well as vaccines. This is even more important, since there is a paucity of resources providing variants and variant annotations in SARS-CoV-2 genomes.

To accelerate these approaches, it is imperative that the data is appropriately organised and annotated to researchers so they can take best use of the data. To this end we propose a collaborative resource IndiCoV based on the tenets of #OpenSource and #OpenData. This in our opinion would enable the best minds across the country and across the globe to provide in-depth understanding of the genomic variants and their functionalities / properties in a searchable and accessible format to accelerate research in the area. Needless to say such an initiative would also require a robust credit / acknowledgement system to ensure that all contributors are credited. 

The IndiCoV resource of genomic variants and variant annotations could be accessed at

We invite research groups and individual researchers with deep expertise in a specific area of interest and passion to contribute to the initiative to come together and contribute their in-depth knowledge and analysis of each of the variants to IndiCov. 

This would surely be a great example of how researchers have contributed to creating a formidable resource towards understanding the genomic variants in SARS-CoV-2 towards gaining insights into understanding the pathogen better which could potentially lead to better approaches to diagnose , control and prevent infections. 

Interested groups may contact Dr Vinod Scaria to discuss the annotations the group wishes to contribute to the IndiCoV resource.

Sofia Banu, Bani Jolly, Payel Mukherjee, Priya Singh, Shagufta Khan, Lamuk Zaveri, Sakshi Shambhavi, Namami Gaur, Rakesh K Mishra, Vinod Scaria, Divya Tej Sowpati
A distinct phylogenetic cluster of Indian SARS-CoV-2 isolates
bioRxiv 2020.05.31.126136; doi: