COV2Var

The count of genome sequences harboring this mutation and its distribution across global regions offer insights into regional variations.

Note: The distribution of mutation across 218 geographical regions. Color representation of genome sequence counts. The data is obtained from GISAID's metadata, specifically capturing the regional distribution of genomic sequences.

The dynamic count of genome sequences containing this mutation over time.

Note: Clicking the "Count" or "Cumulative Count" button toggles the view. Count represents the number of genome sequences per month. Cumulative count represents the accumulated total count up to the respective month. The data is obtained from GISAID's metadata, specifically capturing the collection date of genomic sequences.

For every time point represented in the graph above, identifying the top 3 lineages with the highest count of genome sequences carrying this mutation aids in pinpointing noteworthy lineages for further analysis.

Note: Users can filter the lineages by entering a "Year-Month" term in the search box. For example, entering 2020-01 will display lineages that appeared in January 2020. The data is obtained from GISAID's metadata, specifically capturing the collection date of genomic sequences.

Collection date	Lineage	Total lineage monthly counts	Lineage-specific monthly counts	Lineage-specific monthly frequency
2020-01	B.1.438	1	1	1.00e+0
2020-10	B.1.438	39	30	7.69e-1
2020-10	B.1.177	39	6	1.54e-1
2020-10	B.1.438.1	39	2	5.13e-2
2020-11	B.1.438.1	119	82	6.89e-1
2020-11	B.1.234	119	12	1.01e-1
2020-11	B.1.438.4	119	10	8.40e-2
2020-12	B.1.438.1	550	491	8.93e-1
2020-12	B.1.438.4	550	26	4.73e-2
2020-12	B.1.243	550	13	2.36e-2
2020-03	B.1.513	1	1	1.00e+0
2020-06	B.1.438	4	4	1.00e+0
2020-07	B.1.438	1	1	1.00e+0
2020-08	B.1.438	12	12	1.00e+0
2020-09	B.1.438	17	15	8.82e-1
2020-09	B.1.438.1	17	2	1.18e-1
2021-01	B.1.438.1	1322	1241	9.39e-1
2021-01	B.1.438.2	1322	47	3.56e-2
2021-01	B.1.438.4	1322	18	1.36e-2
2021-10	AY.4	102	47	4.61e-1
2021-10	AY.41	102	16	1.57e-1
2021-10	AY.127	102	10	9.80e-2
2021-11	AY.4	107	18	1.68e-1
2021-11	AY.4.2	107	13	1.21e-1
2021-11	AY.43	107	13	1.21e-1
2021-12	AY.118	111	24	2.16e-1
2021-12	AY.4	111	17	1.53e-1
2021-12	AY.43	111	15	1.35e-1
2021-02	B.1.438.1	1734	1671	9.64e-1
2021-02	B.1.177	1734	13	7.50e-3
2021-02	B.1.438	1734	12	6.92e-3
2021-03	B.1.438.1	2835	2785	9.82e-1
2021-03	B.1.2	2835	8	2.82e-3
2021-03	B.1	2835	7	2.47e-3
2021-04	B.1.438.1	1999	1978	9.89e-1
2021-04	P.1	1999	9	4.50e-3
2021-04	B.1.1.7	1999	5	2.50e-3
2021-05	B.1.438.1	582	565	9.71e-1
2021-05	B.1.1.7	582	6	1.03e-2
2021-05	P.1	582	5	8.59e-3
2021-06	B.1.438.1	225	203	9.02e-1
2021-06	AY.16	225	6	2.67e-2
2021-06	P.1	225	5	2.22e-2
2021-07	B.1.438.1	29	6	2.07e-1
2021-07	P.1	29	5	1.72e-1
2021-07	AY.102	29	3	1.03e-1
2021-08	AY.4	74	13	1.76e-1
2021-08	B.1.438	74	9	1.22e-1
2021-08	B.1.617.2	74	7	9.46e-2
2021-09	AY.41	50	12	2.40e-1
2021-09	AY.4	50	6	1.20e-1
2021-09	AY.103	50	5	1.00e-1
2022-01	BA.1.1	58	18	3.10e-1
2022-01	BA.1	58	7	1.21e-1
2022-01	BA.1.17.2	58	6	1.03e-1
2022-10	XBD	21	17	8.10e-1
2022-10	BA.2.75.2	21	1	4.76e-2
2022-10	BA.5.1.23	21	1	4.76e-2
2022-11	BF.7	10	2	2.00e-1
2022-11	XBD	10	2	2.00e-1
2022-11	BA.5.2.20	10	1	1.00e-1
2022-12	BA.1.1	9	1	1.11e-1
2022-12	BA.5.1.12	9	1	1.11e-1
2022-12	BE.9	9	1	1.11e-1
2022-02	BA.1.1	42	27	6.43e-1
2022-02	BA.1.1.1	42	4	9.52e-2
2022-02	BA.2	42	4	9.52e-2
2022-03	BA.2	51	36	7.06e-1
2022-03	BA.1.1	51	7	1.37e-1
2022-03	BA.2.9.2	51	3	5.88e-2
2022-04	BA.2.10	10	5	5.00e-1
2022-04	BA.2	10	2	2.00e-1
2022-04	BA.1	10	1	1.00e-1
2022-05	BA.2	12	7	5.83e-1
2022-05	BA.2.9	12	2	1.67e-1
2022-05	AY.126	12	1	8.33e-2
2022-06	BE.1.1	8	3	3.75e-1
2022-06	BA.2	8	2	2.50e-1
2022-06	BA.5.3	8	2	2.50e-1
2022-07	BA.2.36	9	2	2.22e-1
2022-07	BF.10	9	2	2.22e-1
2022-07	BA.4	9	1	1.11e-1
2022-08	BA.5.5	10	2	2.00e-1
2022-08	BF.5	10	2	2.00e-1
2022-08	BA.2.9.5	10	1	1.00e-1
2022-09	BE.1	13	4	3.08e-1
2022-09	XBD	13	3	2.31e-1
2022-09	BA.2.75.2	13	2	1.54e-1
2023-01	BA.5.2.28	8	2	2.50e-1
2023-01	BQ.1	8	2	2.50e-1
2023-01	BF.7	8	1	1.25e-1
2023-02	XBB.1.5	4	2	5.00e-1
2023-02	BF.7.14	4	1	2.50e-1
2023-02	BQ.1.1	4	1	2.50e-1

The count of genome sequences and the frequency of this mutation in each lineage.

Note: Displaying mutation frequencies (>0.01) among 2,735 lineages. Mutation Count represents the count of sequences carrying this mutation. Users can filter the lineages by entering a search term in the search box. For example, entering "A.1" will display A.1 lineages. The data is obtained from GISAID's metadata, specifically capturing the lineage of genomic sequences. Mutation count: Count of sequences carrying this mutation.

Mutation ID	Lineage	Mutation frequency	Mutation count	Earliest lineage emergence	Latest lineage emergence
V365	B.1.438	9.92e-1	120	2020-1-10	2021-8-7
V365	B.1.438.1	9.98e-1	9030	2020-9-16	2021-12-7
V365	B.1.438.2	1.00e+0	53	2021-1-6	2021-2-15
V365	B.1.438.3	1.00e+0	13	2020-12-9	2021-3-18
V365	B.1.438.4	1.00e+0	64	2020-10-19	2021-3-23
V365	XBD	7.33e-2	22	2022-8-25	2023-2-7

Examining mutation (889A>G) found in abundant sequences of non-human animal hosts

Exploring mutation presence across 35 non-human animal hosts for cross-species transmission.

Note: We retained the mutation that appear in at least three non-human animal hosts' sequences. The data is obtained from GISAID's metadata, specifically capturing the host of genomic sequences.

Animal host	Lineage	Source region	Collection date	Accession ID

Association between mutation (889A>G) and patients of different ages, genders, and statuses

Note: The logistic regression model was employed to examine changes in patient data before and after the mutation. The logistic regression model was conducted using the glm function in R. The data is obtained from GISAID's metadata, specifically capturing the patient status, gender, and age of genomic sequences.

Analyzing the association between mutation and patient status.

Note: we categorized the data into different patient statuses (ambulatory, deceased, homebound, hospitalized, mild, and recovered) based on GISAID classifications. In the analysis exploring the association between mutation and patient status, the model included mutation, patient status, patient age, gender, sequence region of origin, and sequence collection time point. In the 'increase' direction of the mutation, it means that when this mutation occurs, it increases the corresponding effect proportion. In the 'decrease' direction of the mutation, it means that when this mutation occurs, it decreases the corresponding effect proportion. A p-value lower than 0.001 signifies a notable differentiation between the population with and without the mutation.

Attribute	Effect	Estimate	SE	Z-value	P-value	Direction
Patient status	Ambulatory	3.85e+1	2.60e+5	1.48e-4	1.00e+0	Increase
	Deceased	3.73e-14	4.41e+5	8.46e-20	1.00e+0	Increase
	Homebound	3.73e-14	4.41e+5	8.46e-20	1.00e+0	Increase
	Hospitalized	1.26e+1	1.58e+3	7.98e-3	9.94e-1	Increase
	Mild	1.49e+1	2.76e+3	5.39e-3	9.96e-1	Increase
	Recovered	-1.34e+0	5.55e+5	-2.42e-6	1.00e+0	Decrease

Analyzing the association between mutation and patient status.

Note: we categorized the data into different patient age (0-17, 18-39, 40-64, 65-84, and 85+). In the analysis exploring the association between mutation and patient age, the model included mutation, patient age, gender, sequence region of origin, and sequence collection time point. In the 'increase' direction of the mutation, it means that when this mutation occurs, it increases the corresponding effect proportion. In the 'decrease' direction of the mutation, it means that when this mutation occurs, it decreases the corresponding effect proportion. A p-value lower than 0.001 signifies a notable differentiation between the population with and without the mutation.

Attribute	Effect	Estimate	SE	Z-value	P-value	Direction
Patient age, years	0-17	5.25e+0	1.27e+0	4.14e+0	3.41e-5	Increase
	18-39	-1.57e+0	6.52e-1	-2.41e+0	1.58e-2	Decrease
	40-64	-1.21e+0	6.35e-1	-1.91e+0	5.62e-2	Decrease
	65-84	2.62e+0	9.57e-1	2.74e+0	6.14e-3	Increase
	>=85	2.15e+0	1.65e+0	1.30e+0	1.92e-1	Increase

Analyzing the association between mutation and patient status.

Note: we categorized the data into different patient gender (male and female). In the analysis exploring the association between mutation and patient gender, the model included mutation, patient gender, patient age, sequence region of origin, and sequence collection time point. In the 'increase' direction of the mutation, it means that when this mutation occurs, it increases the corresponding effect proportion. In the 'decrease' direction of the mutation, it means that when this mutation occurs, it decreases the corresponding effect proportion. A p-value lower than 0.001 signifies a notable differentiation between the population with and without the mutation.

Attribute	Effect	Estimate	SE	Z-value	P-value	Direction
Patient gender	Male	2.63e-1	6.00e-1	4.38e-1	6.62e-1	Increase

Investigating natural selection at mutation (889A>G) site for genetic adaptation and diversity

Note: Investigating the occurrence of positive selection or negative selection at this mutation site reveals implications for genetic adaptation and diversity.

The MEME method within the HyPhy software was employed to analyze positive selection. MEME: episodic selection.

Note: List of sites found to be under episodic selection by MEME (p < 0.05). "Protein Start" corresponds to the protein's starting genomic position. "Protein End" corresponds to the protein's ending genomic position. The term 'site' represents a selection site within the protein.

Protein name	Protein start	Protein end	Protein length	Site	P-value	Lineage	Method

The FEL method within the HyPhy software was employed to analyze both positive and negative selection. FEL: pervasive selection on samll datasets.

Note: List of sites found to be under pervasive selection by FEL (p < 0.05). A beta value greater than alpha signifies positive selection, while a beta value smaller than alpha signifies negative selection. "Protein Start" corresponds to the protein's starting genomic position. "Protein End" corresponds to the protein's ending genomic position. The term 'site' represents a selection site within the protein.

Protein name	Protein start	Protein end	Protein length	Site	Alpha	Beta	P-value	Lineage	Method

The FUBAR method within the HyPhy software was employed to analyze both positive and negative selection. FUBAR: pervasive selection on large datasets.

Note: List of sites found to be under pervasive selection by FUBAR (prob > 0.95). A prob[alpha < beta] value exceeding 0.95 indicates positive selection, while a prob[alpha > beta] value exceeding 0.95 indicates negative selection. "Protein Start" corresponds to the protein's starting genomic position. "Protein End" corresponds to the protein's ending genomic position. The term 'site' represents a selection site within the protein.

Protein name	Protein start	Protein end	Protein length	Site	Prob[alpha>beta]	Prob[alpha<beta]	Lineage	Method

Alterations in protein physicochemical properties induced by mutation (889A>G)

Understanding the alterations in protein physicochemical properties can reveal the evolutionary processes and adaptive changes of viruses

Note: ProtParam software was used for the analysis of physicochemical properties. Significant change threshold: A change exceeding 10% compared to the reference is considered a significant change. "GRAVY" is an abbreviation for "grand average of hydropathicity".

Group	Protein name	Molecular weight	Theoretical PI	Extinction coefficients	Aliphatic index	GRAVY
Mutation	ORF1ab_pp1a	489956.85	6.04	543550	89.05	-0.023
Reference	ORF1ab_pp1a	489988.91	6.04	543550	88.99	-0.023

Alterations in protein stability induced by mutation (889A>G)

The impact of mutations on protein stability directly or indirectly affects the biological characteristics, adaptability, and transmission capacity of the virus

Note: iMutant 2.0 was utilized to analyze the effects of mutations on protein stability. pH 7 and a temperature of 25°C are employed to replicate the in vitro environment. pH 7.4 and a temperature of 37°C are utilized to simulate the in vivo environment.

Mutation	Protein name	Mutation type	Position	ΔDDG	Stability	pH	Temperature	Condition
M297V	ORF1ab_pp1a	Point	297	-0.68	Decrease	7	25	Environment
M297V	ORF1ab_pp1a	Point	297	-0.69	Decrease	7.4	37	Internal

Impact on protein function induced by mutation (889A>G)

The impact of mutations on protein function

Note: The MutPred2 software was used to predict the pathogenicity of a mutation and gives the molecular mechanism of pathogenicity. A score above 0.5 indicates an increased likelihood of pathogenicity. "Pr" is the abbreviation for "proportion. P" is the abbreviation for "p-value.

Mutation	Protein name	Mutation type	Score	Molecular mechanisms
M297V	ORF1ab_pp1a	Point	0.16	Altered Cytoplasmic_loop (Pr = 0.25 \| P = 1.7e-03) Gain of ADP-ribosylation at R301 (Pr = 0.17 \| P = 0.08) Altered Calmodulin_binding (Pr = 0.11 \| P = 0.10)

Exploring mutation (889A>G) distribution within intrinsically disordered protein regions

Intrinsically Disordered Proteins (IDPs) which refers to protein regions that have no unique 3D structure. In viral proteins, mutations in the disordered regions s are critical for immune evasion and antibody escape, suggesting potential additional implications for vaccines and monoclonal therapeutic strategies.

Note: The iupred3 software was utilized for analyzing IDPs. A score greater than 0.5 is considered indicative of an IDP. In the plot, "POS" represents the position of the mutation.

Alterations in enzyme cleavage sites induced by mutation (889A>G)

Exploring the impact of mutations on the cleavage sites of 28 enzymes.

Note: The PeptideCutter software was used for detecting enzymes cleavage sites. The increased enzymes cleavage sites refer to the cleavage sites in the mutated protein that are added compared to the reference protein. Conversely, the decreased enzymes cleavage sites indicate the cleavage sites in the mutated protein that are reduced compared to the reference protein.

Mutation	Protein name	Genome position	Enzyme name	Increased cleavage sites	Decreased cleavage sites
M297V	ORF1ab_pp1a	1154	Proteinase K	DGFVGRIRSV (pos: 297)	NA
M297V	ORF1ab_pp1a	1154	Chymotrypsin-low specificity	NA	DGFMGRIRSV (pos: 297)
M297V	ORF1ab_pp1a	1154	CNBr	NA	DGFMGRIRSV (pos: 297)

Impact of spike protein mutation (889A>G) on antigenicity and immunogenicity

Investigating the impact of mutations on antigenicity and immunogenicity carries important implications for vaccine design and our understanding of immune responses.

Note: An absolute change greater than 0.0102 (three times the median across sites) in antigenicity score is considered significant. An absolute changegreater than 0.2754 (three times the median across sites) in immunogenicity score is considered significant. The VaxiJen tool was utilized for antigenicity analysis. The IEDB tool was used for immunogenicity analysis. Antigens with a prediction score of more than 0.4 for this tool are considered candidate antigens. MHC I immunogenicity score >0, indicating a higher probability to stimulate an immune response.

Group	Protein name	Protein region	Antigenicity score	Immunogenicity score

Impact of mutation (889A>G) on viral transmissibility by the affinity between RBD and ACE2 receptor

Unraveling the impact of mutations on the interaction between the receptor binding domain (RBD) and ACE2 receptor using deep mutational scanning (DMS) experimental data to gain insights into their effects on viral transmissibility.

Note: The ΔBinding affinity represents the disparity between the binding affinity of a mutation and the reference binding affinity. A positive Δbinding affinity value (Δlog10(KD,app) > 0) signifies an increased affinity between RBD and ACE2 receptor due to the mutation. Conversely, a negative value (Δlog10(KD,app) < 0) indicates a reduced affinity between RBD and ACE2 receptor caused by the mutation. A p-value smaller than 0.05 indicates significance. "Ave mut bind" represents the average binding affinity of this mutation. "Ave ref bind" refers to the average binding affinity at a site without any mutation (reference binding affinity).

;

Mutation	Protein name	Protein region	Mutation Position	Ave mut bind	Ave ref bind	ΔBinding affinity	P-value	Image

The interface between the receptor binding domain (RBD) and ACE2 receptor is depicted in the crystal structure 6JM0.

Note: The structure 6M0J encompasses the RBD range of 333 to 526. The binding sites (403-406, 408, 417, 439, 445-447, 449, 453, 455-456, 473-478, 484-498, and 500-506) on the RBD that interface with ACE2 are indicated in magenta. The binding sites on the RBD that have been identified through the interface footprints experiment. The ACE2 binding sites within the interface are shown in cyan, representing residues within 5Å proximity to the RBD binding sites. The mutation within the RBD range of 333 to 526 is depicted in red.

Show as:

Show interface residues: RBD Residue ACE2 Residue

Impact of mutation (889A>G) on immune escape by the affinity between RBD and antibody/serum

By utilizing experimental data from deep mutational scanning (DMS), we can uncover how mutations affect the interaction between the receptor binding domain (RBD) and antibodies/serum. This approach provides valuable insights into strategies for evading the host immune response.

Note: We considered a mutation to mediate strong escape if the escape score exceeded 0.1 (10% of the maximum score of 1). A total of 1,504 antibodies/serum data were collected for this analysis. "Condition name" refers to the name of the antibodies/serum. "Mut escape score" represents the escape score of the mutation in that specific condition. "Avg mut escape score" indicates the average escape score of the mutation site in that condition, considering the occurrence of this mutation and other mutations. Class 1 antibodies bind to an epitope only in the RBD “up” conformation, and are the most abundant. Class 2 antibodies bind to the RBD both in “up” and “down” conformations. Class 3 and class 4 antibodies both bind outside the ACE2 binding site. Class 3 antibodies bind the RBD in both the open and closed conformation, while class 4 antibodies bind only in the open conformation.

Mutation	Condition name	Condition type	Condition subtype	Condition year	Mut escape score	Avg mut escape score

Investigating the co-mutation patterns of mutation (889A>G) across 2,735 viral lineages

Investigating the co-mutation patterns of SARS-CoV-2 across 2,735 viral lineages to unravel the cooperative effects of different mutations. In biological research, correlation analysis of mutation sites helps us understand whether there is a close relationship or interaction between certain mutations.

Note: The Spearman correlation coefficient is used to calculate the correlation between two mutations within each Pango lineage. Holm–Bonferroni method was used for multiple test adjustment. We retained mutation pairs with correlation values greater than 0.6 or less than -0.6 and Holm–Bonferroni corrected p-values less than 0.05.

Associated mutation ID	DNA mutation	Mutation type	Protein name	Protein mutation	correlation coefficient	Lineage
V3001	17819T>C	missense_variant	ORF1ab_pp1ab	I5940T	7.09e-1	AY.4.2
V2847	16711G>T	missense_variant	ORF1ab_pp1ab	V5571F	8.02e-1	AY.98
V3014	17896G>A	missense_variant	ORF1ab_pp1ab	D5966N	8.00e-1	B.1.2
V9476	294T>C	synonymous_variant	ORF7a	S98S	7.07e-1	BA.1.15.1
V1844	8590T>C	missense_variant	ORF1ab_pp1a	F2864L	1.00e+0	AY.10
V7081	9144A>G	synonymous_variant	ORF1ab_pp1a	V3048V	1.00e+0	AY.10
V7246	10377G>T	synonymous_variant	ORF1ab_pp1a	T3459T	1.00e+0	AY.10
V7966	16059C>T	synonymous_variant	ORF1ab_pp1ab	C5353C	7.07e-1	AY.10
V9279	195G>T	synonymous_variant	E	L65L	1.00e+0	AY.10
V553	1625G>A	missense_variant	ORF1ab_pp1a	R542H	6.71e-1	AY.112
V5776	*4358G>A	downstream_gene_variant	S	None	7.07e-1	AY.112
V2605	14264G>A	missense_variant	ORF1ab_pp1ab	S4755N	9.71e-1	AY.118
V7734	14265C>T	synonymous_variant	ORF1ab_pp1ab	S4755S	9.57e-1	AY.118
V8267	18306C>T	synonymous_variant	ORF1ab_pp1ab	L6102L	9.06e-1	AY.118
V858	2897C>T	missense_variant	ORF1ab_pp1a	S966F	7.07e-1	AY.120
V7809	14784C>T	synonymous_variant	ORF1ab_pp1ab	I4928I	8.11e-1	AY.127
V4720	8A>G	missense_variant	M	D3G	7.74e-1	AY.134
V3680	519G>T	missense_variant	S	Q173H	6.32e-1	AY.14
V8435	19617C>T	synonymous_variant	ORF1ab_pp1ab	D6539D	7.30e-1	AY.16
V8872	1908T>C	synonymous_variant	S	Y636Y	6.79e-1	AY.16
V693	2167T>C	missense_variant	ORF1ab_pp1a	S723P	8.16e-1	AY.34.1
V7373	11502C>T	synonymous_variant	ORF1ab_pp1a	S3834S	1.00e+0	AY.36
V694	2168C>T	missense_variant	ORF1ab_pp1a	S723F	7.07e-1	AY.39.1.4
V8259	18219C>T	synonymous_variant	ORF1ab_pp1ab	H6073H	7.07e-1	AY.39.1.4
V1387	5840A>G	missense_variant	ORF1ab_pp1a	Y1947C	1.00e+0	AY.41
V2983	17700G>T	missense_variant	ORF1ab_pp1ab	M5900I	9.72e-1	AY.41
V3266	19630G>T	missense_variant	ORF1ab_pp1ab	A6544S	8.59e-1	AY.41
V3423	20666C>T	missense_variant	ORF1ab_pp1ab	T6889M	7.67e-1	AY.41
V9195	318C>T	synonymous_variant	ORF3a	L106L	9.85e-1	AY.41
V9394	645C>T	synonymous_variant	M	D215D	6.17e-1	AY.41
V1154	4295C>T	missense_variant	ORF1ab_pp1a	A1432V	7.07e-1	AY.46.6
V905	3050C>T	missense_variant	ORF1ab_pp1a	T1017I	8.16e-1	AY.46.6
V8477	20025A>G	synonymous_variant	ORF1ab_pp1ab	E6675E	7.07e-1	AY.4.6
V9628	345T>C	synonymous_variant	N	T115T	7.07e-1	AY.4.6
V301	577C>T	missense_variant	ORF1ab_pp1a	P193S	1.00e+0	AY.71
V3551	76C>T	missense_variant	S	P26S	1.00e+0	AY.71
V4446	262G>T	missense_variant	ORF3a	V88L	7.07e-1	AY.71
V2995	17764G>T	missense_variant	ORF1ab_pp1ab	A5922S	1.00e+0	AY.85
V7942	15837C>T	synonymous_variant	ORF1ab_pp1ab	Y5279Y	1.00e+0	AY.85
V3047	18034A>G	missense_variant	ORF1ab_pp1ab	I6012V	1.00e+0	AY.98.1
V9769	996C>A	synonymous_variant	N	T332T	1.00e+0	B.1.1.519
V1387	5840A>G	missense_variant	ORF1ab_pp1a	Y1947C	1.00e+0	B.1.1.63
V1938	9296C>T	missense_variant	ORF1ab_pp1a	S3099L	1.00e+0	B.1.1.63
V2212	11257T>G	missense_variant	ORF1ab_pp1a	F3753V	1.00e+0	B.1.1.63
V4597	670G>C	missense_variant	ORF3a	G224R	1.00e+0	B.1.1.63
V6958	8124C>T	synonymous_variant	ORF1ab_pp1a	N2708N	1.00e+0	B.1.1.63
V7155	9702C>T	synonymous_variant	ORF1ab_pp1a	L3234L	1.00e+0	B.1.1.63
V9672	540T>C	synonymous_variant	N	S180S	1.00e+0	B.1.1.63
V8324	18682C>T	synonymous_variant	ORF1ab_pp1ab	L6228L	9.17e-1	B.1.234
V1667	7491G>T	missense_variant	ORF1ab_pp1a	K2497N	8.09e-1	B.1.243
V7531	12705C>T	synonymous_variant	ORF1ab_pp1a	N4235N	7.83e-1	B.1.243
V8569	20859G>T	synonymous_variant	ORF1ab_pp1ab	G6953G	7.54e-1	B.1.243
V2027	10076C>T	missense_variant	ORF1ab_pp1a	P3359L	1.00e+0	B.1.36.8
V2143	10957G>T	missense_variant	ORF1ab_pp1a	V3653F	7.07e-1	B.1.36.8
V382	945G>A	missense_variant	ORF1ab_pp1a	M315I	1.00e+0	B.1.36.8
V4546	524C>T	missense_variant	ORF3a	T175I	1.00e+0	B.1.36.8
V7141	9612T>C	synonymous_variant	ORF1ab_pp1a	Y3204Y	1.00e+0	B.1.36.8
V7429	11856A>G	synonymous_variant	ORF1ab_pp1a	P3952P	1.00e+0	B.1.36.8
V8378	19116C>T	synonymous_variant	ORF1ab_pp1ab	Y6372Y	7.07e-1	B.1.36.8
V9607	210A>G	synonymous_variant	N	Q70Q	1.00e+0	B.1.36.8
V9388	606C>T	synonymous_variant	M	G202G	6.71e-1	B.1.427
V2813	16441G>T	missense_variant	ORF1ab_pp1ab	V5481L	1.00e+0	B.1.466.2
V7571	12987C>T	synonymous_variant	ORF1ab_pp1a	Y4329Y	7.07e-1	B.1.621
V8418	19491T>C	synonymous_variant	ORF1ab_pp1ab	N6497N	1.00e+0	B.1.621
V3375	20255A>G	missense_variant	ORF1ab_pp1ab	D6752G	1.00e+0	B.1.637
V3520	13C>T	missense_variant	S	L5F	6.53e-1	BA.2.9.2
V9178	234C>T	synonymous_variant	ORF3a	H78H	1.00e+0	BA.2.9.2
V2965	17486C>T	missense_variant	ORF1ab_pp1ab	A5829V	7.07e-1	BA.2.9.5
V9564	39C>T	synonymous_variant	N	P13P	7.07e-1	BA.2.9.5
V3814	968C>T	missense_variant	S	T323I	7.07e-1	BA.4.1.4
V6069	1473G>T	synonymous_variant	ORF1ab_pp1a	V491V	1.00e+0	BA.4.1.4
V2817	16477G>T	missense_variant	ORF1ab_pp1ab	V5493F	7.07e-1	BA.5.1.12
V5914	411C>T	synonymous_variant	ORF1ab_pp1a	G137G	1.00e+0	BA.5.1.12
V8345	18852A>G	synonymous_variant	ORF1ab_pp1ab	K6284K	7.07e-1	BA.5.1.23
V8530	20526C>T	synonymous_variant	ORF1ab_pp1ab	V6842V	1.00e+0	BA.5.1.3
V3082	18256G>A	missense_variant	ORF1ab_pp1ab	V6086I	1.00e+0	BA.5.2.28
V5927	486C>T	synonymous_variant	ORF1ab_pp1a	N162N	7.07e-1	BA.5.2.28
V6336	3297G>A	synonymous_variant	ORF1ab_pp1a	V1099V	8.16e-1	BA.5.2.28
V7105	9327C>T	synonymous_variant	ORF1ab_pp1a	Y3109Y	7.07e-1	BA.5.2.28
V8283	18417C>T	synonymous_variant	ORF1ab_pp1ab	A6139A	7.07e-1	BA.5.2.28
V7556	12915T>C	synonymous_variant	ORF1ab_pp1a	G4305G	1.00e+0	BE.1
V5665	-15G>C	upstream_gene_variant	ORF10	None	1.00e+0	BE.4
V1320	5407C>T	missense_variant	ORF1ab_pp1a	P1803S	7.07e-1	BE.9
V7450	12018T>C	synonymous_variant	ORF1ab_pp1a	D4006D	1.00e+0	BE.9
V3602	250C>A	missense_variant	S	L84I	7.07e-1	BF.7.14
V7966	16059C>T	synonymous_variant	ORF1ab_pp1ab	C5353C	7.07e-1	BF.7.14
V1212	4654A>G	missense_variant	ORF1ab_pp1a	T1552A	7.07e-1	BM.1.1.1
V1911	9095C>T	missense_variant	ORF1ab_pp1a	T3032I	7.07e-1	BM.1.1.1
V2094	10712C>T	missense_variant	ORF1ab_pp1a	A3571V	7.07e-1	BM.1.1.1
V2185	11140G>A	missense_variant	ORF1ab_pp1a	V3714I	7.07e-1	BM.1.1.1
V3698	544A>G	missense_variant	S	K182E	7.07e-1	BM.1.1.1
V459	1243G>A	missense_variant	ORF1ab_pp1a	G415S	7.07e-1	BM.1.1.1
V4886	106T>C	missense_variant	ORF7a	S36P	7.07e-1	BM.1.1.1
V5656	1250C>T	missense_variant	N	T417I	7.07e-1	BM.1.1.1
V6517	4734C>T	synonymous_variant	ORF1ab_pp1a	N1578N	7.07e-1	BM.1.1.1
V7190	9921C>T	synonymous_variant	ORF1ab_pp1a	C3307C	7.07e-1	BM.1.1.1
V7802	14700A>G	synonymous_variant	ORF1ab_pp1ab	K4900K	7.07e-1	BM.1.1.1
V8462	19851C>T	synonymous_variant	ORF1ab_pp1ab	V6617V	7.07e-1	BM.1.1.1
V3607	284C>A	missense_variant	S	T95N	1.00e+0	BN.1.5
V6168	2145A>G	synonymous_variant	ORF1ab_pp1a	G715G	7.07e-1	BN.1.5
V2975	17558C>T	missense_variant	ORF1ab_pp1ab	P5853L	7.07e-1	BQ.1.28
V4786	562G>A	missense_variant	M	A188T	8.16e-1	C.37
V3357	20164C>T	missense_variant	ORF1ab_pp1ab	P6722S	1.00e+0	CM.2
V6467	4368C>T	synonymous_variant	ORF1ab_pp1a	G1456G	8.16e-1	CM.8.1
V428	1127C>T	missense_variant	ORF1ab_pp1a	S376L	1.00e+0	P.1.15
V8941	2448A>G	synonymous_variant	S	S816S	1.00e+0	P.1.15
V2831	16595C>T	missense_variant	ORF1ab_pp1ab	A5532V	-1.00e+0	AY.18
V3953	1715C>T	missense_variant	S	T572I	1.00e+0	AY.18
V8435	19617C>T	synonymous_variant	ORF1ab_pp1ab	D6539D	1.00e+0	AY.18
V9120	3714C>T	synonymous_variant	S	T1238T	-1.00e+0	AY.18
V4100	2533G>T	missense_variant	S	A845S	1.00e+0	AY.3.2
V3343	20120C>T	missense_variant	ORF1ab_pp1ab	A6707V	1.00e+0	B.1.227
V3782	764C>T	missense_variant	S	S255F	1.00e+0	B.1.227
V4011	2025G>T	missense_variant	S	Q675H	1.00e+0	B.1.227
V4406	171G>T	missense_variant	ORF3a	Q57H	1.00e+0	B.1.227
V4456	285G>T	missense_variant	ORF3a	L95F	1.00e+0	B.1.227
V4560	555G>T	missense_variant	ORF3a	Q185H	1.00e+0	B.1.227
V4626	766G>T	missense_variant	ORF3a	V256F	-6.89e-1	B.1.227
V6338	3318C>T	synonymous_variant	ORF1ab_pp1a	S1106S	1.00e+0	B.1.227
V6742	6381A>G	synonymous_variant	ORF1ab_pp1a	L2127L	1.00e+0	B.1.227
V8063	16803T>C	synonymous_variant	ORF1ab_pp1ab	Y5601Y	1.00e+0	B.1.227
V8814	1467C>T	synonymous_variant	S	Y489Y	1.00e+0	B.1.227
V9396	12C>T	synonymous_variant	ORF6	L4L	1.00e+0	B.1.227
V9479	315G>A	synonymous_variant	ORF7a	A105A	1.00e+0	B.1.227
V9546	300G>A	synonymous_variant	ORF8	V100V	1.00e+0	B.1.227
V4558	550T>C	missense_variant	ORF3a	Y184H	7.57e-1	XBD
V5897	318C>T	synonymous_variant	ORF1ab_pp1a	V106V	1.00e+0	XBD
V903	3043G>A	missense_variant	ORF1ab_pp1a	E1015K	1.00e+0	XBD

Manual curation of mutation (889A>G)-related literature from PubMed

The pubmed.mineR and pubmed-mapper were utilized for extracting literature from PubMed, followed by manual filtering.

Note: PubMed: (COVID-19 [Title/Abstract] OR SARS-COV-2 [Title/Abstract]) AND (DNA mutation [Title/Abstract] OR Protein mutation-1 letter [Title/Abstract] OR Protein mutation-3 letter [Title/Abstract]).

DNA level	Protein level	Paper title	Journal name	Publication year	Pubmed ID

Gene Information	Gene ID	GU280_gp01_pp1a
	Gene Name	ORF1ab_pp1a
	Gene Type	protein_coding
	Genome position	1154
	Reference genome	GenBank ID: NC_045512.2
	Mutation type	missense_variant
DNA Level	DNA Mutation: 889A>G
	Ref Seq: A
	Mut Seq: G
Protein Level	Protein 1-letter Mutation: M297V
	Protein 3-letter Mutation: Met297Val

COV2Var annotation categories

Summary information of mutation

Analyzing the distribution of mutation across geographic regions, temporal trends, and lineages

Examining mutation found in abundant sequences of non-human animal hosts

Investigating the association between mutation and patients of different ages, genders, and statuses

Investigating natural selection at mutation site for genetic adaptation and diversity

Alterations in protein physicochemical properties induced by mutation

Alterations in protein stability induced by mutation

Impact on protein function induced by mutation

Exploring mutation distribution within intrinsically disordered protein regions

Alterations in enzyme cleavage sites induced by mutation

Impact of spike protein mutation on antigenicity and immunogenicity

Impact of mutation on viral transmissibility by the affinity between receptor binding domain (RBD) and ACE2 receptor

Impact of mutation on immune escape by the affinity between receptor binding domain (RBD) and antibody/serum

Investigating the co-mutation patterns of SARS-CoV-2 across 2,735 viral lineages

Manual curation of mutation-related literature from PubMed

Summary information of mutation (889A>G)

Analyzing the distribution of mutation (889A>G) across geographic regions, temporal trends, and lineages

Examining mutation (889A>G) found in abundant sequences of non-human animal hosts

Association between mutation (889A>G) and patients of different ages, genders, and statuses

Investigating natural selection at mutation (889A>G) site for genetic adaptation and diversity

Alterations in protein physicochemical properties induced by mutation (889A>G)

Alterations in protein stability induced by mutation (889A>G)

Impact on protein function induced by mutation (889A>G)

Exploring mutation (889A>G) distribution within intrinsically disordered protein regions

Alterations in enzyme cleavage sites induced by mutation (889A>G)

Impact of spike protein mutation (889A>G) on antigenicity and immunogenicity

Impact of mutation (889A>G) on viral transmissibility by the affinity between RBD and ACE2 receptor

Impact of mutation (889A>G) on immune escape by the affinity between RBD and antibody/serum

Investigating the co-mutation patterns of mutation (889A>G) across 2,735 viral lineages

Manual curation of mutation (889A>G)-related literature from PubMed