A timeline for molecular epidemiology of SARS-CoV-2 in India

Vigneshwar Senthilvel
17 July 2020



It has been 4 months (March 11,2020) since the World Health Organization (WHO) declared the novel corona virus disease of 2019 to be a pandemic, and the disease has spread all across the world like wildfire. While the healthcare workers and lab technicians around the world are putting up a fight against the disease on the frontlines, the scientists and research scholars are trying to gather intel and identify new tools to better equip the health care system with the resources and support provided by the governments and corporations. Ever since India went into a country-wide lockdown, our researchers have been working to understand the type, extent of spread and the molecular characteristics of the virus which is affecting the Indian population. This is a report covering the intel we have managed to gather using the molecular investigations done and where does it take us on the road to development of a vaccine or a discovery of a cure. 


The first epicenter:
In December 2019, multiple local health facilities reported patients who had developed symptoms of pneumonia with unknown aetiology, suspecting a link to the seafood market in Wuhan, China. The patients were found to be either vendors in the seafood market or those who have frequently visited the seafood market. They presented themselves with fever, cough accompanied with chest discomfort. The Chinese Centre for Disease Control and Prevention (China CDC) initiated an investigation on the aetiology of the disease by collecting lower respiratory tract samples from affected patients. WHO’s technical lead for the response of this outbreak noted that there may have been limited human-to-human transmission of the virus. The initial results of the investigation by China CDC were negative for all the known respiratory illness causing pathogens. Hence RNA was isolated and genome sequencing was carried out. Viral reads were obtained from the patient samples and the assembled contigs had 85% identity with a known bat SARS like corona virus whose sequence was published previously. The results were confirmed by RT- PCR and the novel virus was names 2019-nCoV. They also visualized the viral particles using electron microscopy, which established that the viral particles are spherical in shape (60-140nm in size) and had spikes (9-12nm in size). Their observations were consistent with the coronaviridae family of viruses. Although the viruses were similar to some betacorona viruses found in bats, they clustered distinctly from the SARS-CoV and MERS-CoV. The findings were published in the New England Journal of Medicine on January 24, 2020.







A visit to India:
By January 30, the WHO emergency committee who visited Wuhan to assess the situation, released a situation report in which they had identified the spread of this 2019-nCoV in 18 countries outside of China. India reported its first case in Kerala, identified in a student returning from Wuhan. Two more students who returned in early February were also reported to have similar symptoms. The three cases were taken care of and India did not witness anymore cases in February and thus India became free of the virus. By the end of February, scientists from the National Institute of Health (NIH- USA), identified human ACE2 receptor as the point of entry for the virus. In the beginning March 2020, there was a surge in the number of cases in India, after several people with travel history from the virus affected countries other than China tested positive. The International Committee on Taxonomy of Viruses (ICTV) and WHO officially named the virus as SARS-CoV-2 and the disease caused by the virus as COVID-19 (Corona Virus Disease-2019).On March 11, deeply concerned by the alarming levels of spread and severity, and by alarming levels of inaction, WHO declared COVID-19 to be a pandemic. As Europe was taking a heavy blow and officially became the second epicenter outside of China, the cases in India steadily rose and crossed 1000 by the end of March.

Samples from the first three reported cases of COVID-19 from India was sequenced at NIV, Pune. The data showed that the Indian sequences clustered with the virus identified in Wuhan (Wuhan- Hu-1) but had two notable changes in the spike protein, compared to the Wuhan sequence. This indicated that there were two different introductions into the country. The RNA genome of the SARS-CoV-2 was found to be ~30 kb in length and encoded 9860 amino acids. It was annotated to contain 29 Open Reading Frames (ORFs) and codes for a total of 27 proteins including 16 non-structural proteins, 4 structural proteins and seven accessory proteins. The study also predicted various B-cell and T-cell epitopes which could initiate a high immune response and also could help in vaccine design. On March 5, India shared the first two sequences of SARS-CoV-2 isolates. On March 14, scientists at National Institute of Virology (NIV), Pune successfully isolated and cultured SARS-CoV-2 from an Indian isolate. With this, India became only the fifth country in the world to successfully isolate and culture the virus after China, USA, Japan, and Thailand. This enabled India to study the virus at a molecular scale and placed India a step closer to the goal of developing a vaccine or discover a drug for treatment of COVID-19 patients. As the COVID-19 cases were increasing, research labs across the country were given access to SARS-CoV-2 isolates by ICMR, for genome sequencing. Gujarat Biotechnology Research Centre (GBRC) became the second in the country to successfully sequence the genome of SARS-CoV-2 after NIV. The samples fall under the B4 clade, a branch within the superclade B, which has potential origins from East Asia or Oceania. By the end of March, India went into what is now known as the ‘World’s largest lockdown’, closing its international borders and borders between states and districts, except for essential commute. By then we knew that the virus has not come here to visit, it has come to stay. 


Sequencing SARS-CoV-2 genomes
India ramped up its SARS-Cov-2 genome sequencing efforts by the end of March. ICMR also sequenced isolates from Indians in Iran and Italy who were stranded at airports. On 29 April, nature- india covered about the need for patient data to make use of the large-scale sequencing of corona virus. In the article, experts collectively agreed that as long as the clinical parameters are well defined, the molecular epidemiology studies and the nature of clusters within the country cannot be fully understood. In a collaborative effort uniting multiple research laboratories across the country, the Government of India announced that within two months from then, India would sequence more than 1000 isolates to understand the spread of the virus within the country and study the viral and host genomics. By the end of April, India had sequenced more than 200 isolates from India and Indians from abroad. On May 05, the National Institute of Biomedical Genomics (NIBMG) sequenced 5 isolates from Kolkata and established the spread of A2a clade in majority of the population. They also identified one sample which belonged to B4 clade.

The Council of Scientific and Industrial Research (CSIR) had been leading the sequencing efforts in the country since then. A study from NCDC in collaboration with CSIR- IGIB, analyzed around 104 genomes from Indian isolates and found that most variations were in non-structural proteins. D614G, a commonly found mutation worldwide, in the spike protein was also identified but in only 26 samples. Viruses accumulate mutations as they multiply and spread within a population. The mutations could be advantageous, disadvantageous or just remain as neutral changes. To explain the multiple outbreaks within the country, they described a 3-wave viral entry into India, first from European and American travelers, second from the middle east and third from regions of South-east Asia. On 13 May, CSIR- Center for Cellular and Molecular Biology and CSIR- Institute of Genomics and Integrative Biology analyzed more than 200 Indian sequences and identified a novel cluster of SARS-CoV-2 in India, found in about 41% of the isolates. The new clade, named A3i, which was previously reported to be unclassified, was found in India as the second most common clade amongst the population. This lineage, defined by its 4 mutations, was suspected to have entered India from South-east Asia and comprises 3.5% of the affected population globally.

One of the viral isolates which belonged to the B clade, isolated from Kerala, was reported to harbor mutations in the receptor binding domain of spike protein at position 407. This was identified by scientists at Indian Institute of Chemical Biology (CSIR-IICB). They also identified two unique mutations in the spike protein at positions 723 and 1124. They did molecular dynamics simulations and showed that the mutations change the secondary structure of the protein and affects its flexibility. They stated that the mutations could potentially alter the binding of the virus. They are looking into mutations which could provide insights to identify potential drug targets and aid in developing vaccines.


What is molecular phylogeny and what are clades?
Phylogeny is the study of evolutionary development of organisms to identify its evolutionary ancestry. Betacoronaviruses are a group of viruses under the family of coronaviridae. They are spherical viruses which are enveloped RNA viruses which affect mammals. Bats and rodents act as natural reservoirs of this family of viruses. They further are sub divided into 4 lineages viz., A, B, C, and D. The SARS-CoV which caused the Severe Acute Respiratory Syndrome and the novel SARS-CoV-2 fall under lineage B. They are so close on the evolutionary tree yet are so distinct based on their genetic code. So why not call it a completely new species? This is generally because the identified variations are not meaningful enough to call it a sperate species. According to ICTV, SARS-CoV-2 is a sub-species of SARS-CoV because, SARS-CoV is identified by specific sequences in its genome, which has also been found in SARS-CoV-2. So as long as those species defining sequences(variations) remain the same, it will be named a sub-species or a clade.

Multiple variations have been identified among the different clades found to spread across the country, of which certain variations alter the protein of the virus in a way to adapt to the environmental/host conditions or make them weak and not enable them to spread and infect as effectively as others.

A milestone and not the goal:
Meanwhile several COVID-19 centric resources were developed to aid in data sharing and better use of the information generated.

· PhyloVis (CSIR- IGIB) is a resource to display the genetic phylogeny of SARS-CoV-2 isolates from Indian laboratories.

· COVID-19 Genomepedia (CSIR- IGIB) is a resource which catalogs the genome information of the novel corona virus sequenced across the globe.

· SARS-nCoV-2 Genome Resource (CSIR-IGIB) is a collection of SARS-CoV-2 genomes from across the globe compiled for analysis.

· COVID19 Beacon (CSIR-IGIB, New Delhi & CSIRO, Australia) is a searchable interface to query genomic variants in SARS-CoV-2 genomes. It is a collaborative effort with CSIRO, Australia

· GEAR19 (CSIR-CCMB) is a comprehensive tool which integrates all the details about the COVID-19 samples sequenced in India

The data is being collected and compiled from about 35 laboratories all across the country and the compiled resources are being maintained by CSIR laboratories of India. This database has genome sequence data of more than 1500 SARS-CoV-2 isolates and catalogued more than 2000 variations identified in its genome. The database also contains the age demographics of the patients from which the isolates were obtained and also clade distribution timeline.

India reached its 1000 SARS-CoV-2 genome sequencing target by the end of June. While it was set as the goal for the pilot phase, India continues the sequencing of SARS-Cov-2 genomes as much as we can. Now we can clearly say that 1000 SARS-CoV-2 genome sequencing was just a milestone in the extended fight against COVID-19. CSIR laboratories also published a detailed protocol for assembly and analysis of SARS-CoV-2 genomes. It can be found here. Links to all the resources can also be found here.

Vaccines development in India:
There were around 30 vaccines in development, against COVID-19, in India. They were being developed by Indian companies in collaboration with research institutes within and outside the country. On May 25, the Union Health Minister of India, Dr. Harsh Vardhan said that 14 of the vaccines have shown promising developments and 4 of the 14 vaccine candidates were ready to enter clinical trials. The Department of Biotechnology has made a central agency to coordinate the efforts of vaccine development in India. One of the successful candidates is said to be Covaxin, which is being developed by Bharat Biotech, Hyderabad, in collaboration with ICMR- National Institute of Virology, Pune. The vaccine was approved for clinical trials on June 29. As on 14 July, the vaccine has entered the first phase of clinical trials with around 1500 participants from 14 different locations across the country. On 16 July, a second indigenously developed vaccine candidate ZyCoV-D was approved for clinical trials by DCGI.

Conclusion:
As we witness the ever increasing numbers of COVID-19 affected patients, these are the developments which are happening in the background. As a country equipped with the latest technologies in the field of molecular biology and genomics, these tools come in handy during the time of crisis, such as the one we are in right now. These resources help understand the threat in greater detail and can help us reach the farther goal of a COVID-19 free world.



References:

· https://www.who.int/news-room/detail/29-06-2020-covidtimeline

· Yadav PD et al. Full-genome sequences of the first two SARS-CoV-2 viruses from India. Indian J Med Res

· Na Zhu et al. A Novel Coronavirus from Patients with Pneumonia in China, 2019. N Engl J Med 2020; 382:727-733

· Gorbalenya, A.E., Baker, S.C., Baric, R.S. et al. The species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2. Nat Microbiol 5, 536–544 (2020).

· World Health Organization. Novel coronavirus (2019-nCoV) situation reports. WHO; 2020. Available from: https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports

· https://www.thehindu.com/sci-tech/science/coronavirus-india-shares-two-sars-cov-2-genome-sequences/article31007227.ece

· https://www.thehindubusinessline.com/news/science/csir-lab-working-on-genome-sequencing-of-covid-19/article31175131.ece

· https://www.business-standard.com/article/pti-stories/guj-institute-identifies-3-new-mutations-of-novel-coronavirus-120041700900_1.html

· Potdar V et al. Genomic analysis of SARS-CoV-2 strains among Indians returning from Italy, Iran & China, & Italian tourists in India. Indian J Med Res 2020

· Sarkale P, Patil S, Yadav PD, Nyayanit DA, Sapkal G, Baradkar S, et al. First isolation of SARS-CoV-2 from clinical samples in India. Indian J Med Res 2020; 151. doi:10.4103/ijmr.IJMR_1029_20

· Zhou P, Yang XL, Wang XG, Hu B, Zhang L, Zhang W, et al. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 2020; 579 : 270-3

· Sarkale P et al. First isolation of SARS-CoV-2 from clinical samples in India. Indian J Med Res 2020

· Maitra A, Sarkar MC, Raheja H, et al. Mutations in SARS-CoV-2 viral RNA identified in Eastern India: Possible implications for the ongoing outbreak in India and impact on viral structure and host susceptibility. J Biosci. 2020

· Saha P, Banerjee AK, Tripathi PP, Srivastava AK, Ray U. A virus that has gone viral: amino acid mutation in S protein of Indian isolate of Coronavirus COVID-19 might impact receptor binding, and thus, infectivity. Biosci Rep. 2020

· https://www.hindustantimes.com/india-news/india-to-sequence-1-000-genomes-to-understand-covid-19-virus/story-QwV3xAIkQJAlpXm6kCZKjO.html

· http://vinodscaria.rnabiology.org/covid-19

· Sofia Banu et al. A distinct phylogenetic cluster of Indian SARS-CoV-2 isolates. bioRxiv 2020 
· Kumar et al, Integrated genomic view of SARS-CoV-2 in India bioRxiv 2020.06.04.128751; doi: https://doi.org/10.1101/2020.06.04.128751


· https://www.natureasia.com/en/nindia/article/10.1038/nindia.2020.75

· Ruby Dhar et al. Genotypic and antigenic study of SARS-CoV-2 from an Indian isolate. bioRxiv 2020

· Letko, M., Marzi, A. & Munster, V. Functional assessment of cell entry and receptor usage for SARS-CoV-2 and other lineage B betacoronaviruses. Nat Microbiol 5, 562–569 (2020)

· https://weather.com/en-IN/india/coronavirus/news/2020-07-04-covid-19-vaccine-candidates-india-and-worldwide-their-current

· Pramod Kumar et al. Integrated genomic view of SARS-CoV-2 in India. bioRxiv 2020

· https://www.rediff.com/news/interview/beware-of-a3i-the-new-coronavirus-strain/20200615.htm

· https://economictimes.indiatimes.com/industry/healthcare/biotech/pharmaceuticals/bharat-bio-begins-human-trials-of-covaxin/articleshow/76947209.cms





About the Author
Vigneshwar Senthilvel is a graduate student at the CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB) . He can be reached on twitter handle @vickymj93

Comments