Download


Basic information about 9,832 mutations
For each of the 13,344,494 SARS-CoV-2 genomes, we conducted mutation analysis. Based on published literature, the filtering criteria were as follows: 1) A mutation needed to have a frequency greater than 0.01 in at least one of the 2,735 viral lineages and occur at least twice within that specific lineage. 2) A mutation that were present in two or more lineages out of a total of 2,735 lineages. After the filtering process, we identified a total of 9,832 common mutations from 2,372 viral lineages.
Mutation distribution in lineages
The frequency of mutations among the 2,735 lineages is displayed, showing only those with a frequency greater than 0.01. The data is obtained from GISAID's metadata, specifically capturing the lineage of genomic sequences.
Alternative non-human animal hosts
Alternative non-human animal hosts (35 different species), unveils implications for viral adaptation and cross-species transmission. Mutations were retained in the genomic sequences of at least 3 non-human animal hosts. The data is obtained from GISAID's metadata, specifically capturing the host of genomic sequences.
The co-mutation patterns of SARS-CoV-2 across 2,735 viral lineages
The Spearman correlation coefficient is used to calculate the correlation between two mutations within the same lineage. Holm–Bonferroni method was used for multiple test adjustment. We retained mutation pairs with correlation values greater than 0.6 or less than -0.6 and Holm–Bonferroni corrected p-values less than 0.05.
Association between mutation and patients of different ages, genders, and infection status
We employed a logistic regression model to investigate the association between mutations and patient age, patient gender, and patient status. A p-value of less than 0.001 indicates a significant distinction between the population carrying the mutation and the population without the mutation. The data is obtained from GISAID's metadata, specifically capturing the host status, gende, and age of genomic sequences.
Mutation effects on antigenicity and immunogenicity
The VaxiJen software is utilized for antigenicity analysis, while the IEDB software is used for immunogenicity analysis. The threshold value to accept the antigenicity score was fixed at 0.4.
Mutation effects on physicochemical parameters
ProtParam software is used for the analysis of physicochemical properties.
Results of natural selection analysis
Meme, FUBAR, and FEL method of HyPhy software are used for natural selection analysis. List of sites found to be under episodic selection by MEME (p < 0.05), MEME: episodic selection. List of sites found to be under pervasive selection by FEL (p < 0.05), FEL: pervasive selection on samll datasets. List of sites found to be under pervasive selection by FUBAR (prob > 0.95), FUBAR: pervasive selection on datasets.
Mutation-induced alterlation in proteion stability
iMutant 2.0 is a software tool utilized to analyze the effects of mutations on protein stability. pH 7 and a temperature of 25°C are employed to replicate the extracellular environment in in vitro studies. pH 7.4 and a temperature of 37°C are utilized to simulate the extracellular environment in in vitro experiments.
Mutation-induced changes in proteion function
The MutPred2 software is used to predict the functional impact of protein variations. A score threshold of 0.5 would indicate potential pathogenicity. A p-value less than 0.05 is considered a significant effect. "Pr" is the abbreviation for "proportion. P" is the abbreviation for "p-value.
Mutation-induced alterlation on binding affinity between RBD and ACE2
The results of the deep mutational scanning approach were employed to empirically assess the impact of all potential amino acid mutations within the SARS-CoV-2 RBD on ACE2-binding affinity. A t-test was utilized to analyze the effect of mutations on binding affinity, where a p-value < 0.05 indicated a significant impact.
Mutation-induced alterlation on binding affinity between RBD and antibody/serum
The deep mutational scanning approach was utilized to empirically evaluate the effects of all possible amino acid mutations within the SARS-CoV-2 RBD on antibody/serum affinity. We considered a mutation to mediate immune escape if the escape score was greater than 0.1 (10% of the maximum score of 1).
The file contains 13,344,494 GISAID accession IDs for SARS-CoV-2 genome sequences, which were used in this study
We retrieved genome sequences and metadata from GISAID (https://gisaid.org/). Detailed information about the GISAID data used can be found at https://doi.org/10.55876/gis8.230705yx. Some GISAID accession IDs are not available on this link; you can find them in this downloaded file.