COV2Var

The count of genome sequences harboring this mutation and its distribution across global regions offer insights into regional variations.

Note: The distribution of mutation across 218 geographical regions. Color representation of genome sequence counts. The data is obtained from GISAID's metadata, specifically capturing the regional distribution of genomic sequences.

The dynamic count of genome sequences containing this mutation over time.

Note: Clicking the "Count" or "Cumulative Count" button toggles the view. Count represents the number of genome sequences per month. Cumulative count represents the accumulated total count up to the respective month. The data is obtained from GISAID's metadata, specifically capturing the collection date of genomic sequences.

For every time point represented in the graph above, identifying the top 3 lineages with the highest count of genome sequences carrying this mutation aids in pinpointing noteworthy lineages for further analysis.

Note: Users can filter the lineages by entering a "Year-Month" term in the search box. For example, entering 2020-01 will display lineages that appeared in January 2020. The data is obtained from GISAID's metadata, specifically capturing the collection date of genomic sequences.

Collection date	Lineage	Total lineage monthly counts	Lineage-specific monthly counts	Lineage-specific monthly frequency
2020-10	B.1.177	936	449	4.80e-1
2020-10	B.1.177.44	936	208	2.22e-1
2020-10	B.1.177.12	936	170	1.82e-1
2020-11	B.1.177.12	2110	753	3.57e-1
2020-11	B.1.177	2110	608	2.88e-1
2020-11	B.1.177.44	2110	374	1.77e-1
2020-12	B.1.177.12	4474	2405	5.38e-1
2020-12	B.1.177	4474	704	1.57e-1
2020-12	B.1.177.44	4474	444	9.92e-2
2020-03	B.1.177.12	1	1	1.00e+0
2020-06	B.1.177	2	1	5.00e-1
2020-06	B.1.240	2	1	5.00e-1
2020-07	B.1.177	2	1	5.00e-1
2020-07	B.1.240	2	1	5.00e-1
2020-08	B.1.177	192	127	6.61e-1
2020-08	B.1.177.44	192	62	3.23e-1
2020-08	B.1.177.12	192	1	5.21e-3
2020-09	B.1.177	458	303	6.62e-1
2020-09	B.1.177.44	458	69	1.51e-1
2020-09	B.1.177.12	458	60	1.31e-1
2021-01	B.1.177.12	5191	2184	4.21e-1
2021-01	B.1.177	5191	948	1.83e-1
2021-01	B.1.177.35	5191	722	1.39e-1
2021-10	AY.88	9	5	5.56e-1
2021-10	AY.103	9	1	1.11e-1
2021-10	AY.20	9	1	1.11e-1
2021-11	AY.122	8	2	2.50e-1
2021-11	AY.4	8	2	2.50e-1
2021-11	AY.103	8	1	1.25e-1
2021-12	AY.25.1.1	8	2	2.50e-1
2021-12	BA.1	8	2	2.50e-1
2021-12	AY.4.2.3	8	1	1.25e-1
2021-02	B.1.177	2216	932	4.21e-1
2021-02	B.1.177.12	2216	574	2.59e-1
2021-02	B.1.177.35	2216	344	1.55e-1
2021-03	B.1.177	677	263	3.88e-1
2021-03	B.1.177.12	677	199	2.94e-1
2021-03	B.1.177.35	677	71	1.05e-1
2021-04	B.1.177	96	42	4.38e-1
2021-04	B.1.177.12	96	35	3.65e-1
2021-04	B.1.177.41	96	9	9.38e-2
2021-05	B.1.177	11	11	1.00e+0
2021-06	AY.20	6	2	3.33e-1
2021-06	B.1.177.44	6	2	3.33e-1
2021-06	B.1.177	6	1	1.67e-1
2021-07	AY.20	4	2	5.00e-1
2021-07	AY.4	4	1	2.50e-1
2021-07	P.1	4	1	2.50e-1
2021-08	AY.14	15	10	6.67e-1
2021-08	AY.4	15	3	2.00e-1
2021-08	AY.122	15	1	6.67e-2
2021-09	AY.14	15	9	6.00e-1
2021-09	AY.88	15	3	2.00e-1
2021-09	AY.20	15	1	6.67e-2
2022-01	BA.1.1	15	9	6.00e-1
2022-01	BA.1	15	4	2.67e-1
2022-01	BA.1.21	15	2	1.33e-1
2022-10	BF.7	1	1	1.00e+0
2022-11	BN.1.3	1	1	1.00e+0
2022-12	BN.1.3	8	7	8.75e-1
2022-12	BQ.1.3	8	1	1.25e-1
2022-02	BA.1.1	11	8	7.27e-1
2022-02	BA.1.21	11	2	1.82e-1
2022-02	BA.2	11	1	9.09e-2
2022-03	BA.1.1	15	4	2.67e-1
2022-03	BA.2.12	15	4	2.67e-1
2022-03	BA.2.9	15	4	2.67e-1
2022-04	BA.2.12	12	7	5.83e-1
2022-04	BA.2	12	4	3.33e-1
2022-04	BA.1.1	12	1	8.33e-2
2022-05	BA.2.12	2	2	1.00e+0
2022-07	BF.5	5	3	6.00e-1
2022-07	BA.2.18	5	1	2.00e-1
2022-07	BA.5	5	1	2.00e-1
2022-08	BF.5	6	3	5.00e-1
2022-08	BA.2.76	6	1	1.67e-1
2022-08	BA.4.6	6	1	1.67e-1
2023-01	XBF	10	6	6.00e-1
2023-01	XBB.1.5	10	2	2.00e-1
2023-01	BQ.1.1.32	10	1	1.00e-1
2023-02	XBF	10	9	9.00e-1
2023-02	XBB.1.5	10	1	1.00e-1

The count of genome sequences and the frequency of this mutation in each lineage.

Note: Displaying mutation frequencies (>0.01) among 2,735 lineages. Mutation Count represents the count of sequences carrying this mutation. Users can filter the lineages by entering a search term in the search box. For example, entering "A.1" will display A.1 lineages. The data is obtained from GISAID's metadata, specifically capturing the lineage of genomic sequences. Mutation count: Count of sequences carrying this mutation.

Mutation ID	Lineage	Mutation frequency	Mutation count	Earliest lineage emergence	Latest lineage emergence
V8517	B.1.177.35	9.57e-1	1487	2020-10-17	2021-4-23
V8517	B.1.177.42	9.99e-1	855	2020-10-26	2021-3-22
V8517	B.1.177.43	9.71e-1	605	2020-8-17	2021-12-29
V8517	B.1.177.44	9.70e-1	1742	2020-8-7	2021-6-7
V8517	B.1.177.12	9.97e-1	6382	2020-3-30	2021-4-12
V8517	B.1.177	6.13e-2	4390	2020-2-2	2022-5-25
V8517	B.1.160.10	1.69e-2	2	2020-8-27	2021-2-20
V8517	B.1.160.32	1.09e-2	2	2020-10-2	2021-1-26
V8517	B.1.177.31	3.94e-1	13	2020-8-24	2021-2-3
V8517	B.1.177.36	9.48e-1	55	2020-9-14	2021-3-2
V8517	B.1.177.37	1.00e+0	34	2020-10-12	2021-2-25
V8517	B.1.177.38	1.00e+0	93	2020-9-11	2021-3-17
V8517	B.1.177.39	1.00e+0	59	2020-10-5	2021-3-2
V8517	B.1.177.40	9.88e-1	170	2020-10-14	2021-3-10
V8517	B.1.177.41	9.92e-1	368	2020-9-20	2021-4-24

Examining mutation (20397T>C) found in abundant sequences of non-human animal hosts

Exploring mutation presence across 35 non-human animal hosts for cross-species transmission.

Note: We retained the mutation that appear in at least three non-human animal hosts' sequences. The data is obtained from GISAID's metadata, specifically capturing the host of genomic sequences.

Animal host	Lineage	Source region	Collection date	Accession ID

Association between mutation (20397T>C) and patients of different ages, genders, and statuses

Note: The logistic regression model was employed to examine changes in patient data before and after the mutation. The logistic regression model was conducted using the glm function in R. The data is obtained from GISAID's metadata, specifically capturing the patient status, gender, and age of genomic sequences.

Analyzing the association between mutation and patient status.

Note: we categorized the data into different patient statuses (ambulatory, deceased, homebound, hospitalized, mild, and recovered) based on GISAID classifications. In the analysis exploring the association between mutation and patient status, the model included mutation, patient status, patient age, gender, sequence region of origin, and sequence collection time point. In the 'increase' direction of the mutation, it means that when this mutation occurs, it increases the corresponding effect proportion. In the 'decrease' direction of the mutation, it means that when this mutation occurs, it decreases the corresponding effect proportion. A p-value lower than 0.001 signifies a notable differentiation between the population with and without the mutation.

Attribute	Effect	Estimate	SE	Z-value	P-value	Direction
Patient status	Ambulatory	8.33e+0	2.96e+0	2.82e+0	4.85e-3	Increase
	Deceased	-6.67e-1	7.70e-1	-8.66e-1	3.87e-1	Decrease
	Homebound	-1.50e+1	1.70e+3	-8.80e-3	9.93e-1	Decrease
	Hospitalized	1.25e+0	3.95e-1	3.16e+0	1.55e-3	Increase
	Mild	-8.89e-1	1.21e+0	-7.36e-1	4.62e-1	Decrease
	Recovered	-2.42e+0	6.79e-1	-3.57e+0	3.59e-4	Decrease

Analyzing the association between mutation and patient status.

Note: we categorized the data into different patient age (0-17, 18-39, 40-64, 65-84, and 85+). In the analysis exploring the association between mutation and patient age, the model included mutation, patient age, gender, sequence region of origin, and sequence collection time point. In the 'increase' direction of the mutation, it means that when this mutation occurs, it increases the corresponding effect proportion. In the 'decrease' direction of the mutation, it means that when this mutation occurs, it decreases the corresponding effect proportion. A p-value lower than 0.001 signifies a notable differentiation between the population with and without the mutation.

Attribute	Effect	Estimate	SE	Z-value	P-value	Direction
Patient age, years	0-17	-1.14e-1	1.06e-1	-1.07e+0	2.84e-1	Decrease
	18-39	-1.35e-2	6.41e-2	-2.10e-1	8.34e-1	Decrease
	40-64	6.18e-2	5.83e-2	1.06e+0	2.90e-1	Increase
	65-84	-4.25e-2	7.14e-2	-5.95e-1	5.52e-1	Decrease
	>=85	3.95e-2	9.66e-2	4.09e-1	6.83e-1	Increase

Analyzing the association between mutation and patient status.

Note: we categorized the data into different patient gender (male and female). In the analysis exploring the association between mutation and patient gender, the model included mutation, patient gender, patient age, sequence region of origin, and sequence collection time point. In the 'increase' direction of the mutation, it means that when this mutation occurs, it increases the corresponding effect proportion. In the 'decrease' direction of the mutation, it means that when this mutation occurs, it decreases the corresponding effect proportion. A p-value lower than 0.001 signifies a notable differentiation between the population with and without the mutation.

Attribute	Effect	Estimate	SE	Z-value	P-value	Direction
Patient gender	Male	-1.00e-1	5.66e-2	-1.77e+0	7.66e-2	Decrease

Investigating natural selection at mutation (20397T>C) site for genetic adaptation and diversity

Note: Investigating the occurrence of positive selection or negative selection at this mutation site reveals implications for genetic adaptation and diversity.

The MEME method within the HyPhy software was employed to analyze positive selection. MEME: episodic selection.

Note: List of sites found to be under episodic selection by MEME (p < 0.05). "Protein Start" corresponds to the protein's starting genomic position. "Protein End" corresponds to the protein's ending genomic position. The term 'site' represents a selection site within the protein.

Protein name	Protein start	Protein end	Protein length	Site	P-value	Lineage	Method

The FEL method within the HyPhy software was employed to analyze both positive and negative selection. FEL: pervasive selection on samll datasets.

Note: List of sites found to be under pervasive selection by FEL (p < 0.05). A beta value greater than alpha signifies positive selection, while a beta value smaller than alpha signifies negative selection. "Protein Start" corresponds to the protein's starting genomic position. "Protein End" corresponds to the protein's ending genomic position. The term 'site' represents a selection site within the protein.

Protein name	Protein start	Protein end	Protein length	Site	Alpha	Beta	P-value	Lineage	Method

The FUBAR method within the HyPhy software was employed to analyze both positive and negative selection. FUBAR: pervasive selection on large datasets.

Note: List of sites found to be under pervasive selection by FUBAR (prob > 0.95). A prob[alpha < beta] value exceeding 0.95 indicates positive selection, while a prob[alpha > beta] value exceeding 0.95 indicates negative selection. "Protein Start" corresponds to the protein's starting genomic position. "Protein End" corresponds to the protein's ending genomic position. The term 'site' represents a selection site within the protein.

Protein name	Protein start	Protein end	Protein length	Site	Prob[alpha>beta]	Prob[alpha<beta]	Lineage	Method

Alterations in protein physicochemical properties induced by mutation (20397T>C)

Understanding the alterations in protein physicochemical properties can reveal the evolutionary processes and adaptive changes of viruses

Note: ProtParam software was used for the analysis of physicochemical properties. Significant change threshold: A change exceeding 10% compared to the reference is considered a significant change. "GRAVY" is an abbreviation for "grand average of hydropathicity".

Group	Protein name	Molecular weight	Theoretical PI	Extinction coefficients	Aliphatic index	GRAVY

Alterations in protein stability induced by mutation (20397T>C)

The impact of mutations on protein stability directly or indirectly affects the biological characteristics, adaptability, and transmission capacity of the virus

Note: iMutant 2.0 was utilized to analyze the effects of mutations on protein stability. pH 7 and a temperature of 25°C are employed to replicate the in vitro environment. pH 7.4 and a temperature of 37°C are utilized to simulate the in vivo environment.

Mutation	Protein name	Mutation type	Position	ΔDDG	Stability	pH	Temperature	Condition

Impact on protein function induced by mutation (20397T>C)

The impact of mutations on protein function

Note: The MutPred2 software was used to predict the pathogenicity of a mutation and gives the molecular mechanism of pathogenicity. A score above 0.5 indicates an increased likelihood of pathogenicity. "Pr" is the abbreviation for "proportion. P" is the abbreviation for "p-value.

Mutation	Protein name	Mutation type	Score	Molecular mechanisms

Exploring mutation (20397T>C) distribution within intrinsically disordered protein regions

Intrinsically Disordered Proteins (IDPs) which refers to protein regions that have no unique 3D structure. In viral proteins, mutations in the disordered regions s are critical for immune evasion and antibody escape, suggesting potential additional implications for vaccines and monoclonal therapeutic strategies.

Note: The iupred3 software was utilized for analyzing IDPs. A score greater than 0.5 is considered indicative of an IDP. In the plot, "POS" represents the position of the mutation.

Alterations in enzyme cleavage sites induced by mutation (20397T>C)

Exploring the impact of mutations on the cleavage sites of 28 enzymes.

Note: The PeptideCutter software was used for detecting enzymes cleavage sites. The increased enzymes cleavage sites refer to the cleavage sites in the mutated protein that are added compared to the reference protein. Conversely, the decreased enzymes cleavage sites indicate the cleavage sites in the mutated protein that are reduced compared to the reference protein.

Mutation	Protein name	Genome position	Enzyme name	Increased cleavage sites	Decreased cleavage sites

Impact of spike protein mutation (20397T>C) on antigenicity and immunogenicity

Investigating the impact of mutations on antigenicity and immunogenicity carries important implications for vaccine design and our understanding of immune responses.

Note: An absolute change greater than 0.0102 (three times the median across sites) in antigenicity score is considered significant. An absolute changegreater than 0.2754 (three times the median across sites) in immunogenicity score is considered significant. The VaxiJen tool was utilized for antigenicity analysis. The IEDB tool was used for immunogenicity analysis. Antigens with a prediction score of more than 0.4 for this tool are considered candidate antigens. MHC I immunogenicity score >0, indicating a higher probability to stimulate an immune response.

Group	Protein name	Protein region	Antigenicity score	Immunogenicity score

Impact of mutation (20397T>C) on viral transmissibility by the affinity between RBD and ACE2 receptor

Unraveling the impact of mutations on the interaction between the receptor binding domain (RBD) and ACE2 receptor using deep mutational scanning (DMS) experimental data to gain insights into their effects on viral transmissibility.

Note: The ΔBinding affinity represents the disparity between the binding affinity of a mutation and the reference binding affinity. A positive Δbinding affinity value (Δlog10(KD,app) > 0) signifies an increased affinity between RBD and ACE2 receptor due to the mutation. Conversely, a negative value (Δlog10(KD,app) < 0) indicates a reduced affinity between RBD and ACE2 receptor caused by the mutation. A p-value smaller than 0.05 indicates significance. "Ave mut bind" represents the average binding affinity of this mutation. "Ave ref bind" refers to the average binding affinity at a site without any mutation (reference binding affinity).

;

Mutation	Protein name	Protein region	Mutation Position	Ave mut bind	Ave ref bind	ΔBinding affinity	P-value	Image

The interface between the receptor binding domain (RBD) and ACE2 receptor is depicted in the crystal structure 6JM0.

Note: The structure 6M0J encompasses the RBD range of 333 to 526. The binding sites (403-406, 408, 417, 439, 445-447, 449, 453, 455-456, 473-478, 484-498, and 500-506) on the RBD that interface with ACE2 are indicated in magenta. The binding sites on the RBD that have been identified through the interface footprints experiment. The ACE2 binding sites within the interface are shown in cyan, representing residues within 5Å proximity to the RBD binding sites. The mutation within the RBD range of 333 to 526 is depicted in red.

Show as:

Show interface residues: RBD Residue ACE2 Residue

Impact of mutation (20397T>C) on immune escape by the affinity between RBD and antibody/serum

By utilizing experimental data from deep mutational scanning (DMS), we can uncover how mutations affect the interaction between the receptor binding domain (RBD) and antibodies/serum. This approach provides valuable insights into strategies for evading the host immune response.

Note: We considered a mutation to mediate strong escape if the escape score exceeded 0.1 (10% of the maximum score of 1). A total of 1,504 antibodies/serum data were collected for this analysis. "Condition name" refers to the name of the antibodies/serum. "Mut escape score" represents the escape score of the mutation in that specific condition. "Avg mut escape score" indicates the average escape score of the mutation site in that condition, considering the occurrence of this mutation and other mutations. Class 1 antibodies bind to an epitope only in the RBD “up” conformation, and are the most abundant. Class 2 antibodies bind to the RBD both in “up” and “down” conformations. Class 3 and class 4 antibodies both bind outside the ACE2 binding site. Class 3 antibodies bind the RBD in both the open and closed conformation, while class 4 antibodies bind only in the open conformation.

Mutation	Condition name	Condition type	Condition subtype	Condition year	Mut escape score	Avg mut escape score

Investigating the co-mutation patterns of mutation (20397T>C) across 2,735 viral lineages

Investigating the co-mutation patterns of SARS-CoV-2 across 2,735 viral lineages to unravel the cooperative effects of different mutations. In biological research, correlation analysis of mutation sites helps us understand whether there is a close relationship or interaction between certain mutations.

Note: The Spearman correlation coefficient is used to calculate the correlation between two mutations within each Pango lineage. Holm–Bonferroni method was used for multiple test adjustment. We retained mutation pairs with correlation values greater than 0.6 or less than -0.6 and Holm–Bonferroni corrected p-values less than 0.05.

Associated mutation ID	DNA mutation	Mutation type	Protein name	Protein mutation	correlation coefficient	Lineage
V5600	1093C>T	missense_variant	N	P365S	7.47e-1	B.1.177
V85	-44C>T	upstream_gene_variant	ORF1ab_pp1a	None	8.15e-1	B.1.177
V9790	1113C>T	synonymous_variant	N	D371D	9.59e-1	B.1.177
V1626	7034C>T	missense_variant	ORF1ab_pp1a	A2345V	6.07e-1	AY.20
V392	997A>G	missense_variant	ORF1ab_pp1a	T333A	8.33e-1	B.1.429
V5329	118C>T	missense_variant	N	R40C	1.00e+0	AY.79
V6219	2445C>T	synonymous_variant	ORF1ab_pp1a	L815L	1.00e+0	AY.88
V7358	11409C>T	synonymous_variant	ORF1ab_pp1a	Y3803Y	1.00e+0	AY.88
V1828	8507C>T	missense_variant	ORF1ab_pp1a	T2836I	7.06e-1	B.1.1.141
V5575	976C>T	missense_variant	N	P326S	7.06e-1	B.1.1.141
V9287	15C>T	synonymous_variant	M	N5N	7.06e-1	B.1.1.141
V3508	21204G>T	missense_variant	ORF1ab_pp1ab	M7068I	1.00e+0	B.1.1.189
V5505	659C>T	missense_variant	N	A220V	1.00e+0	B.1.1.189
V5600	1093C>T	missense_variant	N	P365S	1.00e+0	B.1.1.189
V5713	88G>T	missense_variant	ORF10	V30L	1.00e+0	B.1.1.189
V5866	180T>C	synonymous_variant	ORF1ab_pp1a	V60V	1.00e+0	B.1.1.189
V6689	6021C>T	synonymous_variant	ORF1ab_pp1a	T2007T	1.00e+0	B.1.1.189
V85	-44C>T	upstream_gene_variant	ORF1ab_pp1a	None	1.00e+0	B.1.1.189
V8591	20991G>C	synonymous_variant	ORF1ab_pp1ab	A6997A	1.00e+0	B.1.1.189
V9322	279C>G	synonymous_variant	M	L93L	1.00e+0	B.1.1.189
V9790	1113C>T	synonymous_variant	N	D371D	1.00e+0	B.1.1.189
V2182	11114C>T	missense_variant	ORF1ab_pp1a	A3705V	7.07e-1	B.1.1.39
V3001	17819T>C	missense_variant	ORF1ab_pp1ab	I5940T	7.07e-1	B.1.1.39
V3748	665C>T	missense_variant	S	A222V	6.32e-1	B.1.1.39
V4796	3G>T	start_lost	ORF6	M1?	7.07e-1	B.1.1.39
V5505	659C>T	missense_variant	N	A220V	7.07e-1	B.1.1.39
V5600	1093C>T	missense_variant	N	P365S	1.00e+0	B.1.1.39
V5713	88G>T	missense_variant	ORF10	V30L	6.32e-1	B.1.1.39
V6689	6021C>T	synonymous_variant	ORF1ab_pp1a	T2007T	6.32e-1	B.1.1.39
V8436	19620C>T	synonymous_variant	ORF1ab_pp1ab	Y6540Y	7.07e-1	B.1.1.39
V8886	1989C>T	synonymous_variant	S	D663D	7.07e-1	B.1.1.39
V9322	279C>G	synonymous_variant	M	L93L	7.07e-1	B.1.1.39
V3001	17819T>C	missense_variant	ORF1ab_pp1ab	I5940T	6.52e-1	B.1.160
V2182	11114C>T	missense_variant	ORF1ab_pp1a	A3705V	1.00e+0	B.1.1.70
V3001	17819T>C	missense_variant	ORF1ab_pp1ab	I5940T	1.00e+0	B.1.1.70
V4390	148G>A	missense_variant	ORF3a	V50I	1.00e+0	B.1.1.70
V5505	659C>T	missense_variant	N	A220V	1.00e+0	B.1.1.70
V5600	1093C>T	missense_variant	N	P365S	1.00e+0	B.1.1.70
V8300	18540C>T	synonymous_variant	ORF1ab_pp1ab	S6180S	1.00e+0	B.1.1.70
V8591	20991G>C	synonymous_variant	ORF1ab_pp1ab	A6997A	7.07e-1	B.1.1.70
V9322	279C>G	synonymous_variant	M	L93L	1.00e+0	B.1.1.70
V9790	1113C>T	synonymous_variant	N	D371D	1.00e+0	B.1.1.70
V3508	21204G>T	missense_variant	ORF1ab_pp1ab	M7068I	8.94e-1	B.1.177.17
V5628	1148C>T	missense_variant	N	P383L	7.74e-1	B.1.177.17
V6170	2151C>T	synonymous_variant	ORF1ab_pp1a	Y717Y	8.94e-1	B.1.177.17
V6457	4305C>T	synonymous_variant	ORF1ab_pp1a	I1435I	8.94e-1	B.1.177.17
V7505	12519C>T	synonymous_variant	ORF1ab_pp1a	N4173N	7.74e-1	B.1.177.17
V85	-44C>T	upstream_gene_variant	ORF1ab_pp1a	None	1.00e+0	B.1.177.17
V9790	1113C>T	synonymous_variant	N	D371D	8.94e-1	B.1.177.17
V3001	17819T>C	missense_variant	ORF1ab_pp1ab	I5940T	6.32e-1	B.1.177.43
V6170	2151C>T	synonymous_variant	ORF1ab_pp1a	Y717Y	1.00e+0	B.1.214.2
V677	2041C>T	missense_variant	ORF1ab_pp1a	L681F	1.00e+0	B.1.214.2
V2927	17254C>T	missense_variant	ORF1ab_pp1ab	L5752F	6.42e-1	B.1.221
V3785	770G>A	missense_variant	S	G257D	9.82e-1	B.1.221
V4919	186_188delATT	disruptive_inframe_deletion	ORF7a	Q62_F63delinsH	9.31e-1	B.1.221
V5505	659C>T	missense_variant	N	A220V	8.50e-1	B.1.221
V5600	1093C>T	missense_variant	N	P365S	9.16e-1	B.1.221
V5601	1094C>T	missense_variant	N	P365L	8.38e-1	B.1.221
V5713	88G>T	missense_variant	ORF10	V30L	6.00e-1	B.1.221
V8591	20991G>C	synonymous_variant	ORF1ab_pp1ab	A6997A	7.78e-1	B.1.221
V9117	3690G>T	synonymous_variant	S	V1230V	9.64e-1	B.1.221
V9322	279C>G	synonymous_variant	M	L93L	8.62e-1	B.1.221
V9790	1113C>T	synonymous_variant	N	D371D	9.47e-1	B.1.221
V6467	4368C>T	synonymous_variant	ORF1ab_pp1a	G1456G	6.67e-1	B.1.258
V3508	21204G>T	missense_variant	ORF1ab_pp1ab	M7068I	1.00e+0	B.1.36
V5505	659C>T	missense_variant	N	A220V	1.00e+0	B.1.36
V9322	279C>G	synonymous_variant	M	L93L	7.07e-1	B.1.36
V6176	2184C>T	synonymous_variant	ORF1ab_pp1a	G728G	7.07e-1	B.1.565
V7263	10521A>G	synonymous_variant	ORF1ab_pp1a	Q3507Q	8.89e-1	BA.2.12
V5368	307G>T	missense_variant	N	D103Y	7.07e-1	BA.2.18
V7607	13251C>T	synonymous_variant	ORF1ab_pp1ab	G4417G	7.07e-1	BA.2.18
V1220	4702A>G	missense_variant	ORF1ab_pp1a	I1568V	1.00e+0	BA.5.2.21
V2088	10628C>T	missense_variant	ORF1ab_pp1a	T3543I	1.00e+0	BQ.1.1.32
V4564	578G>T	missense_variant	ORF3a	W193L	7.07e-1	BQ.1.1.32
V8034	16605C>T	synonymous_variant	ORF1ab_pp1ab	Y5535Y	1.00e+0	BQ.1.1.32
V4366	119C>T	missense_variant	ORF3a	S40L	8.66e-1	XBF
V5369	337C>A	missense_variant	N	L113I	9.68e-1	XBF
V7993	16257T>C	synonymous_variant	ORF1ab_pp1ab	N5419N	9.68e-1	XBF
V8001	16311C>T	synonymous_variant	ORF1ab_pp1ab	D5437D	1.00e+0	AY.25.1.1
V3508	21204G>T	missense_variant	ORF1ab_pp1ab	M7068I	1.00e+0	B.1.1.521
V3748	665C>T	missense_variant	S	A222V	7.05e-1	B.1.1.521
V5505	659C>T	missense_variant	N	A220V	7.05e-1	B.1.1.521
V5600	1093C>T	missense_variant	N	P365S	1.00e+0	B.1.1.521
V5713	88G>T	missense_variant	ORF10	V30L	7.05e-1	B.1.1.521
V5866	180T>C	synonymous_variant	ORF1ab_pp1a	V60V	7.05e-1	B.1.1.521
V6170	2151C>T	synonymous_variant	ORF1ab_pp1a	Y717Y	1.00e+0	B.1.1.521
V6689	6021C>T	synonymous_variant	ORF1ab_pp1a	T2007T	7.05e-1	B.1.1.521
V8429	19575T>C	synonymous_variant	ORF1ab_pp1ab	N6525N	-7.05e-1	B.1.1.521
V85	-44C>T	upstream_gene_variant	ORF1ab_pp1a	None	1.00e+0	B.1.1.521
V8591	20991G>C	synonymous_variant	ORF1ab_pp1ab	A6997A	1.00e+0	B.1.1.521
V9322	279C>G	synonymous_variant	M	L93L	7.05e-1	B.1.1.521
V9790	1113C>T	synonymous_variant	N	D371D	1.00e+0	B.1.1.521
V1935	9261G>T	missense_variant	ORF1ab_pp1a	M3087I	-6.24e-1	B.1.160.10
V2182	11114C>T	missense_variant	ORF1ab_pp1a	A3705V	1.00e+0	B.1.160.10
V2329	12194C>T	missense_variant	ORF1ab_pp1a	T4065I	-7.01e-1	B.1.160.10
V2536	13729G>T	missense_variant	ORF1ab_pp1ab	A4577S	-8.13e-1	B.1.160.10
V2723	15502G>T	missense_variant	ORF1ab_pp1ab	V5168L	-8.13e-1	B.1.160.10
V2836	16625A>G	missense_variant	ORF1ab_pp1ab	K5542R	-8.13e-1	B.1.160.10
V2857	16755G>T	missense_variant	ORF1ab_pp1ab	E5585D	-8.13e-1	B.1.160.10
V2990	17735C>T	missense_variant	ORF1ab_pp1ab	T5912I	-8.13e-1	B.1.160.10
V3001	17819T>C	missense_variant	ORF1ab_pp1ab	I5940T	1.00e+0	B.1.160.10
V3748	665C>T	missense_variant	S	A222V	1.00e+0	B.1.160.10
V3896	1430G>A	missense_variant	S	S477N	-6.24e-1	B.1.160.10
V4390	148G>A	missense_variant	ORF3a	V50I	1.00e+0	B.1.160.10
V4406	171G>T	missense_variant	ORF3a	Q57H	-1.00e+0	B.1.160.10
V460	1249C>T	missense_variant	ORF1ab_pp1a	H417Y	-6.24e-1	B.1.160.10
V5505	659C>T	missense_variant	N	A220V	1.00e+0	B.1.160.10
V5512	702G>C	missense_variant	N	M234I	-6.24e-1	B.1.160.10
V5600	1093C>T	missense_variant	N	P365S	1.00e+0	B.1.160.10
V5713	88G>T	missense_variant	ORF10	V30L	1.00e+0	B.1.160.10
V5866	180T>C	synonymous_variant	ORF1ab_pp1a	V60V	1.00e+0	B.1.160.10
V6454	4278C>T	synonymous_variant	ORF1ab_pp1a	T1426T	-7.01e-1	B.1.160.10
V6689	6021C>T	synonymous_variant	ORF1ab_pp1a	T2007T	1.00e+0	B.1.160.10
V7336	11232C>T	synonymous_variant	ORF1ab_pp1a	Y3744Y	-8.13e-1	B.1.160.10
V8300	18540C>T	synonymous_variant	ORF1ab_pp1ab	S6180S	1.00e+0	B.1.160.10
V8315	18613C>T	synonymous_variant	ORF1ab_pp1ab	L6205L	-8.13e-1	B.1.160.10
V85	-44C>T	upstream_gene_variant	ORF1ab_pp1a	None	1.00e+0	B.1.160.10
V8591	20991G>C	synonymous_variant	ORF1ab_pp1ab	A6997A	1.00e+0	B.1.160.10
V9195	318C>T	synonymous_variant	ORF3a	L106L	-7.01e-1	B.1.160.10
V9314	213C>T	synonymous_variant	M	Y71Y	-1.00e+0	B.1.160.10
V9322	279C>G	synonymous_variant	M	L93L	1.00e+0	B.1.160.10
V9339	354T>C	synonymous_variant	M	I118I	-8.13e-1	B.1.160.10
V9790	1113C>T	synonymous_variant	N	D371D	1.00e+0	B.1.160.10
V3508	21204G>T	missense_variant	ORF1ab_pp1ab	M7068I	1.00e+0	B.1.160.30
V5505	659C>T	missense_variant	N	A220V	7.05e-1	B.1.160.30
V5600	1093C>T	missense_variant	N	P365S	1.00e+0	B.1.160.30
V6170	2151C>T	synonymous_variant	ORF1ab_pp1a	Y717Y	1.00e+0	B.1.160.30
V9790	1113C>T	synonymous_variant	N	D371D	1.00e+0	B.1.160.30
V2182	11114C>T	missense_variant	ORF1ab_pp1a	A3705V	1.00e+0	B.1.160.32
V3001	17819T>C	missense_variant	ORF1ab_pp1ab	I5940T	1.00e+0	B.1.160.32
V3748	665C>T	missense_variant	S	A222V	8.14e-1	B.1.160.32
V4796	3G>T	start_lost	ORF6	M1?	1.00e+0	B.1.160.32
V5505	659C>T	missense_variant	N	A220V	1.00e+0	B.1.160.32
V5600	1093C>T	missense_variant	N	P365S	1.00e+0	B.1.160.32
V5627	1147C>T	missense_variant	N	P383S	1.00e+0	B.1.160.32
V5713	88G>T	missense_variant	ORF10	V30L	7.03e-1	B.1.160.32
V5866	180T>C	synonymous_variant	ORF1ab_pp1a	V60V	8.14e-1	B.1.160.32
V85	-44C>T	upstream_gene_variant	ORF1ab_pp1a	None	8.14e-1	B.1.160.32
V8591	20991G>C	synonymous_variant	ORF1ab_pp1ab	A6997A	1.00e+0	B.1.160.32
V9322	279C>G	synonymous_variant	M	L93L	7.03e-1	B.1.160.32
V9790	1113C>T	synonymous_variant	N	D371D	1.00e+0	B.1.160.32
V3508	21204G>T	missense_variant	ORF1ab_pp1ab	M7068I	1.00e+0	B.1.177.23
V5600	1093C>T	missense_variant	N	P365S	1.00e+0	B.1.177.23
V85	-44C>T	upstream_gene_variant	ORF1ab_pp1a	None	1.00e+0	B.1.177.23
V9790	1113C>T	synonymous_variant	N	D371D	1.00e+0	B.1.177.23
V53	-80C>T	upstream_gene_variant	ORF1ab_pp1a	None	-7.05e-1	B.1.177.40
V531	1555G>A	missense_variant	ORF1ab_pp1a	G519S	-7.05e-1	B.1.177.40
V4011	2025G>T	missense_variant	S	Q675H	1.00e+0	B.1.177.89
V5281	53G>T	missense_variant	N	G18V	1.00e+0	B.1.177.89
V6170	2151C>T	synonymous_variant	ORF1ab_pp1a	Y717Y	1.00e+0	B.1.177.89
V6265	2772C>T	synonymous_variant	ORF1ab_pp1a	F924F	-1.00e+0	B.1.177.89
V1571	6689T>C	missense_variant	ORF1ab_pp1a	I2230T	1.00e+0	B.1.1.83
V2536	13729G>T	missense_variant	ORF1ab_pp1ab	A4577S	1.00e+0	B.1.1.83
V3950	1709C>A	missense_variant	S	A570D	1.00e+0	B.1.1.83
V5248	7G>C	missense_variant	N	D3H	1.00e+0	B.1.1.83
V5250	8A>T	missense_variant	N	D3V	1.00e+0	B.1.1.83
V5251	9T>A	missense_variant	N	D3E	1.00e+0	B.1.1.83
V5514	704C>T	missense_variant	N	S235F	7.06e-1	B.1.1.83
V5954	648C>T	synonymous_variant	ORF1ab_pp1a	S216S	1.00e+0	B.1.1.83
V7758	14412C>T	synonymous_variant	ORF1ab_pp1ab	P4804P	7.06e-1	B.1.1.83
V7836	15015C>T	synonymous_variant	ORF1ab_pp1ab	H5005H	1.00e+0	B.1.1.83
V7948	15912T>C	synonymous_variant	ORF1ab_pp1ab	T5304T	7.06e-1	B.1.1.83
V3104	18410G>A	missense_variant	ORF1ab_pp1ab	R6137K	1.00e+0	B.1.36.9

Manual curation of mutation (20397T>C)-related literature from PubMed

The pubmed.mineR and pubmed-mapper were utilized for extracting literature from PubMed, followed by manual filtering.

Note: PubMed: (COVID-19 [Title/Abstract] OR SARS-COV-2 [Title/Abstract]) AND (DNA mutation [Title/Abstract] OR Protein mutation-1 letter [Title/Abstract] OR Protein mutation-3 letter [Title/Abstract]).

DNA level	Protein level	Paper title	Journal name	Publication year	Pubmed ID

Gene Information	Gene ID	GU280_gp01_pp1ab
	Gene Name	ORF1ab_pp1ab
	Gene Type	protein_coding
	Genome position	20661
	Reference genome	GenBank ID: NC_045512.2
	Mutation type	synonymous_variant
DNA Level	DNA Mutation: 20397T>C
	Ref Seq: T
	Mut Seq: C
Protein Level	Protein 1-letter Mutation: S6799S
	Protein 3-letter Mutation: Ser6799Ser

COV2Var annotation categories

Summary information of mutation

Analyzing the distribution of mutation across geographic regions, temporal trends, and lineages

Examining mutation found in abundant sequences of non-human animal hosts

Investigating the association between mutation and patients of different ages, genders, and statuses

Investigating natural selection at mutation site for genetic adaptation and diversity

Alterations in protein physicochemical properties induced by mutation

Alterations in protein stability induced by mutation

Impact on protein function induced by mutation

Exploring mutation distribution within intrinsically disordered protein regions

Alterations in enzyme cleavage sites induced by mutation

Impact of spike protein mutation on antigenicity and immunogenicity

Impact of mutation on viral transmissibility by the affinity between receptor binding domain (RBD) and ACE2 receptor

Impact of mutation on immune escape by the affinity between receptor binding domain (RBD) and antibody/serum

Investigating the co-mutation patterns of SARS-CoV-2 across 2,735 viral lineages

Manual curation of mutation-related literature from PubMed

Summary information of mutation (20397T>C)

Analyzing the distribution of mutation (20397T>C) across geographic regions, temporal trends, and lineages

Examining mutation (20397T>C) found in abundant sequences of non-human animal hosts

Association between mutation (20397T>C) and patients of different ages, genders, and statuses

Investigating natural selection at mutation (20397T>C) site for genetic adaptation and diversity

Alterations in protein physicochemical properties induced by mutation (20397T>C)

Alterations in protein stability induced by mutation (20397T>C)

Impact on protein function induced by mutation (20397T>C)

Exploring mutation (20397T>C) distribution within intrinsically disordered protein regions

Alterations in enzyme cleavage sites induced by mutation (20397T>C)

Impact of spike protein mutation (20397T>C) on antigenicity and immunogenicity

Impact of mutation (20397T>C) on viral transmissibility by the affinity between RBD and ACE2 receptor

Impact of mutation (20397T>C) on immune escape by the affinity between RBD and antibody/serum

Investigating the co-mutation patterns of mutation (20397T>C) across 2,735 viral lineages

Manual curation of mutation (20397T>C)-related literature from PubMed