CN114242166A - High myopia prediction model based on SNP susceptible sites and application thereof - Google Patents

High myopia prediction model based on SNP susceptible sites and application thereof Download PDF

Info

Publication number
CN114242166A
CN114242166A CN202111667221.0A CN202111667221A CN114242166A CN 114242166 A CN114242166 A CN 114242166A CN 202111667221 A CN202111667221 A CN 202111667221A CN 114242166 A CN114242166 A CN 114242166A
Authority
CN
China
Prior art keywords
snp
model
high myopia
sites
following
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202111667221.0A
Other languages
Chinese (zh)
Inventor
徐良德
王宏
王文灿
林鹏
张秀峰
陈琪
吕帆
瞿佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Eye Hospital of Wenzhou Medical University
Original Assignee
Eye Hospital of Wenzhou Medical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Eye Hospital of Wenzhou Medical University filed Critical Eye Hospital of Wenzhou Medical University
Priority to CN202111667221.0A priority Critical patent/CN114242166A/en
Publication of CN114242166A publication Critical patent/CN114242166A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Abstract

The invention belongs to the field of biomedicine, particularly relates to a high myopia prediction model based on SNP susceptible sites and application thereof, and belongs to the field of biomedicine. Specifically, the SNP at least comprises rs199970974, rs199873247, rs117708355, rs 122878787360, rs 4156615, rs9513522, s9517546, s533280354, rs41558815, s16898116, rs866140198 and rs 76542212.

Description

High myopia prediction model based on SNP susceptible sites and application thereof
Technical Field
The invention belongs to the field of biomedicine, and particularly relates to a high myopia prediction model based on SNP susceptible sites and application thereof.
Background
Myopia refers to the condition that parallel rays of light are focused on the retina after passing through an eye dioptric system under the condition of adjusting relaxation, and cannot be clearly imaged on the retina. Myopic eyes with a diopter of-6D (D means diopter) or more are highly myopic. High myopia is the second leading cause of severe visual impairment and blindness following cataract. The risk of the high myopia patients suffering from complications such as muscae volitantes, strabismus, retinal detachment, fundus oculi lesion and the like is obviously higher than that of the common myopia patients. Meanwhile, patients with high myopia are at high risk of intraocular blood circulation disorder and tissue degeneration, and are more likely to cause complications such as cataract, glaucoma and the like. The incidence of high myopia has increased dramatically in recent decades and it has been speculated that by 2050, the number of people with high myopia may reach 9.38 billion (9.8% of the world population). The results of the study showed that the incidence of high myopia in young asian populations (6.8% to 21.6%) was much higher than in non-asian populations (2.0% to 2.3%). It is well known that genetics and environment can contribute to the development and progression of myopia, while high myopia is more likely to be genetic.
In recent years, with the development of genomics and sequencing technologies, genome wide association analysis (GWAS) has become a major method for studying genetic susceptibility sites of myopia and high myopia. Single Nucleotide Polymorphism (SNP) is a widespread form of genomic variation. An SNP refers to the situation in which, in normal individuals in a population, there are different bases at a single base pair position in genomic DNA. Among the bases appearing at the SNP site, the one appearing the least frequently is called the minimum allele, and the frequency thereof is called the Minimum Allele Frequency (MAF). Although there are 4 types of bases constituting DNA, SNP generally consists of only two types of bases, and thus it is a two-state marker, i.e., a allele (biallelic). Due to the bimorphity of SNPs, SNPs are often analyzed only +/-in genomic screening, rather than analyzing the length of the fragments, which facilitates the development of automated techniques for screening or detecting SNPs.
SNP sites associated with high myopia have been reported in studies, but most have not been validated. Moreover, most published studies have focused on non-asians who have a much lower incidence of high myopia than asians. Due to differences in genetic background, sites identified in non-asian populations may not appear or have different effects in asian populations.
Disclosure of Invention
In order to develop products and methods for diagnosing high myopia suitable for Asian population, the invention carries out Whole Exome Sequencing (WES) on a large number of subjects and carries out whole genome case control correlation study on data, and according to the analysis on the sequencing data of participants (9, 852 high myopia patients and 11,375 healthy control participants), the invention discovers a new susceptible SNP locus and establishes a prediction (diagnosis) model of high myopia, thereby providing a new basis for clinically predicting (diagnosing) high myopia.
In a first aspect, the present invention provides a set of highly myopic related SNP site combinations comprising any one of:
1) rs199970974, rs199873247, rs117708355, rs1228787360, rs 4156615, rs9513522, rs9517546, rs533280354, rs16898116, rs866140198 and rs76542212 (the total number of sites of the "model 1_ 1", "model 1_ 2" and "model 1_ 3" is 11);
2) rs199970974, rs199873247, rs117708355, rs1228787360, rs 4156615, rs9513522, rs9517546, rs533280354, rs41558815, rs16898116, rs866140198, rs76542212, rs9264670, rs3094609, rs2001181, rs11554776, rs7380272, rs62626261, rs422951, rs 3522380, rs520692, rs111265204, rs520803 and rs72500812 (the total of 24 sites of the "model 2_ 1" or "model 2_ 2" described herein);
3) rs199970974, rs199873247, rs117708355, rs1228787360, rs 4156615, rs9513522, rs9517546, rs533280354, rs41558815, rs16898116, rs866140198, rs76542212, rs9264670, rs3094609, rs2001181, rs11554776, rs1065711, rs7380272, rs62626261, rs7380824, rs422951, rs2233580, rs520692, rs 1115205205204, rs520803 and rs 500812 (the total number of 26 sites of the "model 2_ 3" described in the present invention);
4) the following 1-12 SNP sites (sites of the "model 1" according to the present invention);
or the following SNP sites 1 to 27 (sites of the "model 2" according to the present invention);
or the following SNP sites 1 to 43 (sites of the "model 3" according to the present invention);
or the following 1-195 SNP sites (sites of the "model 4" according to the present invention);
or the following SNP sites 1 to 568 (sites of the "model 5" described in the present invention):
Figure BDA0003451438340000031
Figure BDA0003451438340000041
Figure BDA0003451438340000051
Figure BDA0003451438340000061
preferably, the SNP site combination further includes at least one or all (297) of the following SNP sites.
Figure BDA0003451438340000062
Figure BDA0003451438340000071
The SNP loci listed in the invention are arranged from left to right and from top to bottom in sequence, and specifically comprise: the first row is the 1 st to 5 th SNP sites in turn from left to right, the second row is the 6 th to 10 th SNP sites in turn from left to right, and so on.
Preferably, the "or" in the SNP loci indicates that the two SNP loci are names of the loci in different versions of dbSNP databases, and the substantial contents are consistent. For example, the 63 rd SNP site is rs1280768485 or rs368234054, wherein rs1280768485 is 190388283 at position on chromosome and 412 at position on mRNA, corresponding to the 56 th amino acid code, and the SNP exists as a frame shift mutation that causes the expression of the gene; wherein rs368234054 is 190388285 at the position on the chromosome and 414 at the position on the mRNA, and also corresponds to the coding of 56 th amino acid, and the SNP also causes the frame shift mutation of the gene expression, as shown in the following table 1; therefore rs1280768485 or rs368234054 may be substituted for each other. Other SNP sites linked by an "OR" are the same.
TABLE 1 comparison of information rs1280768485 and rs368234054
dbSNP rs ID Chromosomal location mRNA location Amino acid position Function(s)
rs1280768485 190388283 412 56 Frame shift mutation
rs368234054 190388285 414 56 Frame shift mutation
In the present invention, SNP (single nucleotide polymorphism) refers to a single base position in DNA, and a subject may be homozygous or heterozygous. The SNP sites of the invention are named "rs-" and the person skilled in the art is able to determine their exact position, nucleotide sequence from a suitable database and related information systems, such as the single nucleotide polymorphism database (dbSNP), based on the rs-naming above.
On the other hand, the invention provides a model construction method for constructing a high myopia diagnosis model by using the SNP site combination.
Preferably, the method uses a logistic regression method.
Preferably, the present invention further comprises a step of 10-fold cross-validation. The 10-fold cross validation specifically refers to selecting 10 random numbers from 0 to 100 as random seeds by using a sklern model _ selection function, dividing samples into 10 training sets and validation sets respectively, performing 10-fold cross validation, and taking the average of 10 results as a model result.
On the other hand, the invention also provides a high myopia diagnosis model constructed by the model construction method;
preferably, the model may be a formula, nomogram, or other manner of facilitating manipulation by the subject; according to the model, whether the subject is a myopic patient at present can be directly calculated.
In another aspect, the present invention provides a method for diagnosing high myopia, the method comprising the step of determining the disease condition of high myopia based on the detection result of the SNP site combination according to the present invention, or the method comprising the step of inputting the detection result of the SNP site combination into the high myopia diagnosis model to obtain the determination result.
In particular, the method may comprise the steps of:
1) collecting a subject sample, preferably said sample is an oral swab (oral test strip);
2) carrying out SNP detection on the sample, wherein the detection also comprises a step of extracting DNA;
3) judging the high myopia morbidity of the subject according to the detection result of the step 2); specifically, the detection of at least 1, 10, 100 or 500 of the SNP site combinations in a sample of a subject represents that the subject is a high myopia patient or that the subject is a susceptible population with high myopia; the number of SNP sites can be specifically shown in Table 4 of the present invention.
The determination may be manually, automatically, or a combination thereof to perform or complete the selected task; the result can be calculated manually according to the detection result or automatically by inputting the detection result into the system.
In another aspect, the present invention provides a diagnosis system for high myopia diagnosis, which includes a computing device for performing computation using the above SNP locus combination or the above high myopia diagnosis model.
Preferably, the diagnostic system may further comprise a means for detecting the SNP site.
In another aspect, the invention provides the application of the model, the system or the reagent for detecting the SNP locus combination in preparing a product for diagnosing myopia.
Preferably, the reagent for detecting the combination of the SNP sites includes, but is not limited to, reagents used in the detection of SNPs by the following methods: TaqMan probe method, sequencing method, chip method, flight mass spectrometer (MALDI-TOFMS) detection, restriction fragment length polymorphism (PCR-RFLP), single strand conformation polymorphism (PCR-SSCP), allele-specific PCR (AS-PCR), SNaPshot method, SNPlex method, Denaturing High Performance Liquid Chromatography (DHPLC), Denaturing Gradient Gel Electrophoresis (DGGE). One skilled in the art can select any one or several methods for detecting the SNP site as long as the detection of the SNP site can be achieved.
Specifically, the reagents include, but are not limited to, primers, probes, chips, and the like.
Drawings
FIG. 1 is a Manhattan plot of 12 susceptibility SNPs.
Fig. 2 is a map of the results of the 12 snps.
Figure 3 is a heatmap from GO analysis.
FIG. 4 shows ROC curves of models 1-11 in diagnosing high myopia provided by the present invention, A: models 1-5, B: 6-10 of a model, C: models 5, 11, 9.
FIG. 5 is a ROC curve of a model constructed by SNP after feature screening in diagnosis of high myopia, A: model constructed after Lasso screening, B: model constructed after LinearSVC screening, C: and (3) constructing a model after the Logistic Regression screening.
Detailed Description
The present invention will be further described with reference to the following examples, which are intended to be illustrative only and not to be limiting of the invention in any way, and any person skilled in the art can modify the present invention by applying the teachings disclosed above and applying them to equivalent embodiments with equivalent modifications. Any simple modification or equivalent changes made to the following embodiments according to the technical essence of the present invention, without departing from the technical spirit of the present invention, fall within the scope of the present invention.
Example 1 SNP site screening
21,227 persons (including 9,852 highly myopic patients and 11,375 healthy controls) were recruited by the present invention. Genomic DNA of all subjects was isolated from oral mucosal samples (oral strips) according to standard procedures.
Exclusion criteria for the study subjects of the present invention were:
greater than 10% SNP deletion, low average sequencing depth (<10), low average genotype quality (<65), close relationship (pihat >0.2), abnormal gender information, non-east Asia descent, over-or under-high heterozygosity in genotype individuals (deviation of + -4 SD from the average heterozygosity rate)
Exclusion criteria for SNP sites were:
the deletion rate in individuals is greater than 10%, and the deviation from Hardy-Weinberg equilibrium (P) is obvious<1.0×10-6) Significant difference in SNP deletion rates between the high myopia samples and the control samples: (>0.007), minimum allelic frequency of less than 0.01
From the above screen, 89,268 SNPs (MAF >0.01) from 20, 955 individuals (9, 730 high myopia patients and 11, 255 healthy consecutive subjects) were used for subsequent genome-wide association analysis.
Finally, the P value<5.6×10-7(═ 0.05/89,268) was considered statistically significant, and the corresponding SNP was a susceptibility SNP for high myopia. A total of 12 highly myopic susceptible SNPs were screened, and the results are shown in FIG. 1. Of these 6 (50%) SNP sites are located on chromosome 6, indicating that chromosome 6 is the major enrichment region for high myopia risk variation.
The genome swollenin lambda was 0.944, excluding false positive results due to population split. The 12 snps were mapped to the nearest gene regions (table 2, fig. 2).
TABLE 2.12 details of susceptibility SNPs
Figure BDA0003451438340000101
Figure BDA0003451438340000111
These 12 SNPs are newly found susceptibility SNPs for high myopia, and no previous studies report that these genes are associated with high myopia. In previous reports, among these 9 gene regions:
3 were reported to be associated with ametropia (reactive error): DOCK9(rs9513522, rs9517546), ZNF204P (rs16898116), ZSCAN9(rs 76542212); 4 were reported to be associated with ophthalmic diseases: MDC1 is associated with Behcet Syndrome (PMID: 20622878), 7q11.2(rs 122878787360) is associated with corneal astigmatism (corneal astigmatism) associated site (PMID: 29422769), HLA-B (rs 4156615, rs41558815) is associated with Stevens-Johnson Syndrome (Stevens-Johnson Syndrome) (OMIM: 142830); HLA-DRB1 was reported to be associated with uveal meningitis Syndrome (Uveomeninggoencephalitic Syndrome) (PMID: 25108386), Sjogren's Syndrome (PMID: 28076899).
Example 2 linkage disequilibrium analysis, functional analysis of snp
The degree and distribution of Linkage Disequilibrium (LD) in the human genome play an important role in gene localization, and can be used as a tool for accurately localizing complex diseases and a basis for whole genome association analysis. The basic idea of linkage disequilibrium mapping (linkage disequilibrium mapping) is to examine a large number of genetic marker sites in the vicinity of the entire genome or candidate genes to find sites that appear to be associated with disease, as they are close enough to the pathogenic site. In the research, a plink-block method is adopted to carry out monomer block evaluation (haploid block estimation) and identify susceptible SNP with high LD. LD was calculated only for SNPs within 200 kb.
Through linkage disequilibrium analysis, 13 SNP loci with high LD with more than 12 SNPs are found; the function of the above selected susceptible snps was studied using regulomemdb v2.0 and 3DSNP v 2.0.
The RegulomeDB database showed 8 of 12 SNPs with ranking scores between 2b and 4, with strong evidence that they can influence transcriptional regulation. The 3DSNP v2.0 database shows that of the 12 SNPs, 7 are located in the enhancer group protein tag, 6 are located in the promoter group protein tag, 7 are located in the Transcription Factor Binding Sites (TFBSs), and 6 can alter the transcription factor binding motif. These results indicate that most of the 12 SNPs may have a significant effect on gene expression.
The GTEx Portal database showed that 5 SNPs (rs9513522, rs9517546, rs 4156615, rs76542212, rs16898116) were significantly affected gene expression. rs9513522 and rs9517546 affect the expression of DOCK9 and DOCK9-AS 2. rs 4156615 affects the expression of multiple genes including ABCF1, C6orf15, CCHCR1, HCG22, GTF2H4, HCG27, HLA-B, HLA-C, HLA-S, LY6G5B, LY6G5C, MICA, MIR6891, POU5F1, PSORS1C1, PSORS1C2, PSORS1C3, TCF19, XXbac-BPG181B23.7, XXbac-BPG29F13.17.rs76542212 affects the expression of AL022393.9, LINC01012, TRIM27, ZNF603P, ZSCACAN 12, CAN ZS 23, Z SCAN26 and ZSCACAN 31. rs16898116 affected the expression of GUSBP2 (table 3).
TABLE 3.12 functional Annotation of susceptible SNPs
Figure BDA0003451438340000121
Example 3 Gene nomenclature (GO) and pathway enrichment analysis
GO enrichment analysis of annotated genes for 12 susceptibility SNPs showed a total enrichment of 82 significant GO terms (GO terms) (P <0.05, supplementary table 1), of which 43 biological processes, 26 cellular components, and 13 molecular functions. GO enrichment is clustering by term similarity pairs, and the R package used is simplifyEnrichment. We found that these genes were enriched in immune-related aspects, including antigen processing presenting peptide mhc, in response to modulation of cellular immune-mediated signal transduction (fig. 3). To demonstrate the association between genes and immunity, we collected immune GO terms from AmiGO v2 by entering "immunity" as a keyword. Then, we calculated the similarity score (similarity score) between the GO term and the immune GO term for the gene, resulting in 0.685(P < 0.0001).
Co-enrichment into 32 KEGG pathways (P < <0.005, supplementary Table 1) involved in immune-related pathways, including antigen processing and presentation, IgA production of the gut immune network and infection by human immunodeficiency virus 1.
Example 4 construction of a predictive model
We used logistic predictive modeling based on the genotype of SNPs under different associated P-values or different sources. The different associated P values include 5.6 × 10-7,1.0×10-5,5.0×10-5,1.0×10-3,5.0×10-3. Based on the above criteria, combining the reported SNP sites to obtain 11 groups of models based on different SNP combinations; and carrying out modeling in different modes on each SNP combination, wherein each SNP combination can establish a model in 3.
1) Obtaining a snp gene map: extracting corresponding required snps (1354 snps) in the vcf by using python, selecting an additive model (unmutated is set to be 0, one mutant allele is 1, and two mutant alleles are 2), and acquiring an original snp genotype spectrum;
2) screening of snp gene maps: calculating the missing number of snps in each sample in the original snps genotype spectrum, and if the missing is more than 5% (N _ missing/N _ allosamples), removing the corresponding sample. For samples with deletion condition < 5%, the artificial neural network is used for filling the deletion. Preferably, the artificial neural network has three layers, including an input layer, a hidden layer and an output layer: the number of nodes in the input layer is len (label _ data), that is, the snp without missing completely; the hidden layer is provided with 20 nodes; the output layer is provided with 3 nodes; setting the learning rate to be 0.003 and the training iteration number to be 20;
3) and (3) feature screening: the linear model regularized using L1 has a coefficient solution with many estimated coefficients of 0. When the goal is to reduce the dimensionality of the dataset using another classifier, they can be used with feature selection. The 3 sparse evaluators chosen included Lasso for Regression and Logistic Regression and linear svc for classification
4) Constructing a model: the logistic regression method is used for constructing a model, samples are classified by using logistic regression, and the parameters are as follows:
Figure BDA0003451438340000141
5)2 random 10-fold cross-validation: a sklern model _ selection function is utilized, 10 random numbers from 0 to 100 are selected as random seeds, samples are divided into 10 groups of training sets and verification sets respectively, 10 times of cross verification is carried out, and the average of 10 results is taken as a model result so as to ensure the objectivity of the model.
Models obtained directly in the case of omitting step 3) were named models 1-11, and were themselves 10-fold cross-validated as a plot of the mean AUC results obtained, as shown in fig. 4.
Further performing characteristic screening on the models 1-11 to construct a more simplified SNP model, so as to facilitate detection and clinical application, specifically, representing Lasso screening by-1, representing Linear SVC screening by-2, representing Logistic Regression screening by-3, that is, SNP sites in the models 1-1/2/3 are all selected from the models 1, and the same is as follows. Combining the screened SNP loci to construct a logistic regression model, and calculating an AUC value (as shown in figure 5); the number of SNPs and AUC values for each model are summarized in Table 4.
TABLE 4 model constructed by the invention and verification of its effect
Figure BDA0003451438340000142
Figure BDA0003451438340000151
Note: in the table: lasso _ L1 represents screening features using Lasso with a method based on L1 regularization, and then constructing a logistic regression model; sv _ L1 shows that characteristics are screened by using linear svc based on the L1 regularization method, and then a logistic regression model is constructed; log _ L1 shows that the L1 regularization-based method screens features using Logistic Regression, and then constructs a Logistic Regression model.
The model 1 of the present invention specifically refers to the following first 12 SNP sites.
The model 2 of the invention specifically refers to the following first 27 SNP sites.
The model 3 of the present invention specifically refers to the following first 43 SNP sites.
The model 4 of the invention specifically refers to the following first 195 SNP sites.
The model 5 of the invention specifically refers to the first 568 SNP sites.
Figure BDA0003451438340000161
Figure BDA0003451438340000171
Figure BDA0003451438340000181
Figure BDA0003451438340000191
The model 6 of the invention specifically refers to a model constructed by combining the SNP sites of the model 1 with the following 297 SNP sites.
The model 7 of the invention specifically refers to a model constructed by combining the SNP sites of the model 2 with the following 297 SNP sites.
The model 8 of the invention specifically refers to a model constructed by combining the SNP sites of the model 3 with the following 297 SNP sites.
The model 9 of the invention specifically refers to a model constructed by combining the SNP sites of the model 4 with the following 297 SNP sites.
The model 10 of the invention specifically refers to a model constructed by combining the SNP sites of the model 5 with the following 297 SNP sites.
The specific SNP sites represented by SNPs _ environment are listed below:
Figure BDA0003451438340000192
Figure BDA0003451438340000201
the 'model 1_ 1', 'model 1_ 2' and 'model 1_ 3' of the invention specifically refer to the following sites: rs199970974, rs199873247, rs117708355, rs1228787360, rs 4156615, rs9513522, rs9517546, rs533280354, rs16898116, rs866140198, rs 76542212;
the "model 2_ 1" or "model 2_ 2" of the invention specifically refers to the following sites: rs199970974, rs199873247, rs117708355, rs1228787360, rs 4156615, rs9513522, rs9517546, rs533280354, rs41558815, rs16898116, rs866140198, rs76542212, rs9264670, rs3094609, rs2001181, rs11554776, rs7380272, rs62626261, rs422951, rs 3522380, rs520692, rs111265204, rs520803, rs 72500812;
the model 2_3 of the invention specifically refers to the following sites: rs199970974, rs199873247
rs117708355、rs1228787360、rs41563615、rs9513522、rs9517546、rs533280354、rs41558815、rs16898116、rs866140198、rs76542212、rs9264670、rs3094609、rs2001181、rs11554776、rs1065711、rs7380272、rs62626261、rs7380824、rs422951、rs2233580、rs520692、rs111265204、rs520803、rs72500812。

Claims (10)

1. A SNP site combination comprising any one of:
1)rs199970974、rs199873247、rs117708355、rs1228787360、rs41563615、rs9513522、rs9517546、rs533280354、rs16898116、rs866140198、rs76542212;
2)rs199970974、rs199873247、rs117708355、rs1228787360、rs41563615、rs9513522、rs9517546、rs533280354、rs41558815、rs16898116、rs866140198、rs76542212、rs9264670、rs3094609、rs2001181、rs11554776、rs7380272、rs62626261、rs422951、rs2233580、rs520692、rs111265204、rs520803、rs72500812;
3)rs199970974、rs199873247、rs117708355、rs1228787360、rs41563615、rs9513522、rs9517546、rs533280354、rs41558815、rs16898116、rs866140198、rs76542212、rs9264670、rs3094609、rs2001181、rs11554776、rs1065711、rs7380272、rs62626261、rs7380824、rs422951、rs2233580、rs520692、rs111265204、rs520803、rs72500812;
4) the following 1-12 SNP sites,
or the following 1-27 th SNP site,
or the following 1-43 th SNP site,
or the following 1-195 SNP sites,
or the following SNP sites 1-568:
Figure FDA0003451438330000011
Figure FDA0003451438330000021
Figure FDA0003451438330000031
Figure FDA0003451438330000041
2. the SNP site combination of claim 1, further comprising at least one of the following SNP sites:
Figure FDA0003451438330000042
Figure FDA0003451438330000051
3. a model construction method for constructing a high myopia diagnosis model, which comprises constructing a model using the SNP site combination according to claim 1;
preferably, the method uses a logistic regression method.
4. The method of claim 3, further comprising the step of 10-fold cross-validation.
5. A high myopia diagnostic model constructed by the method of claim 3.
6. The high myopia diagnostic model of claim 5, wherein the model comprises a formula, nomogram.
7. A diagnostic system for high myopia diagnosis, the diagnostic system comprising a computing device for determining the high myopia of a subject using the high myopia diagnosis model of claim 5 and the SNP site combination of claim 1.
8. The diagnostic system of claim 7, further comprising means for detecting a SNP.
9. Use of the high myopia diagnostic model of claim 5, the diagnostic system of claim 7, or a reagent for detecting the combination of SNP sites of claim 1 in the preparation of a product for diagnosing high myopia.
10. The use according to claim 9, wherein the reagent for detecting the combination of SNP sites according to claim 1 includes, but is not limited to, the reagents used for detecting SNPs by the following methods:
TaqMan probe method, sequencing method, chip method, flight mass spectrometer detection, restriction fragment length polymorphism method, single-strand conformation polymorphism method, allele specific PCR, SNaPshot method, SNPlex method, denaturing high performance liquid chromatography, denaturing gradient gel electrophoresis method.
CN202111667221.0A 2021-12-31 2021-12-31 High myopia prediction model based on SNP susceptible sites and application thereof Withdrawn CN114242166A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111667221.0A CN114242166A (en) 2021-12-31 2021-12-31 High myopia prediction model based on SNP susceptible sites and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111667221.0A CN114242166A (en) 2021-12-31 2021-12-31 High myopia prediction model based on SNP susceptible sites and application thereof

Publications (1)

Publication Number Publication Date
CN114242166A true CN114242166A (en) 2022-03-25

Family

ID=80745300

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111667221.0A Withdrawn CN114242166A (en) 2021-12-31 2021-12-31 High myopia prediction model based on SNP susceptible sites and application thereof

Country Status (1)

Country Link
CN (1) CN114242166A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114891876A (en) * 2022-05-13 2022-08-12 上海谱希和光基因科技有限公司 Functional genome area biomarker combination for diagnosing high myopia

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114891876A (en) * 2022-05-13 2022-08-12 上海谱希和光基因科技有限公司 Functional genome area biomarker combination for diagnosing high myopia

Similar Documents

Publication Publication Date Title
Oh et al. Predicting autism spectrum disorder using blood-based gene expression signatures and machine learning
CN108624650B (en) Method for judging whether solid tumor is suitable for immunotherapy and detection kit
KR101542529B1 (en) Examination methods of the bio-marker of allele
Haasnoot et al. Identification of an amino acid motif in HLA–DR β1 that distinguishes uveitis in patients with juvenile idiopathic arthritis
KR101460520B1 (en) Detecting method for disease markers of NGS data
KR101693510B1 (en) Genotype analysis system and methods using genetic variants data of individual whole genome
CN103571847B (en) FOXC1 gene mutation bodies and its application
CN114891876A (en) Functional genome area biomarker combination for diagnosing high myopia
CN114242166A (en) High myopia prediction model based on SNP susceptible sites and application thereof
KR20150024232A (en) Examination methods of the origin marker of resistance from drug resistance gene about disease
KR101693717B1 (en) Bioactive variant analysis system using genetic variants data of individual whole genome
CN116287204A (en) Application of mutation condition of detection characteristic gene in preparation of venous thromboembolism risk detection product
KR20180069651A (en) Analysis platform for personalized medicine based personal genome map and Analysis method using thereof
CN110423809A (en) A kind of FBN1 G2125A mutation and its application influencing the diagnosis and treatment of people&#39;s marfan&#39;s syndrome
CN114783613A (en) Myopia prediction analysis method
CN115505638A (en) Application of biomarker combination in risk prediction of highly myopic male susceptible population
KR20190000341A (en) Analysis platform for personalized medicine based personal genome map and Analysis method using thereof
Sabbagh et al. Clinico-biological refinement of BCL11B-related disorder and identification of an episignature: A series of 20 unreported individuals
CN108629148A (en) The genome analytical method and device of ocular physiology information based on phenotypic analysis
CN107841551B (en) Application of single nucleotide polymorphism site in wound sepsis risk assessment
KR101818103B1 (en) Apparatus and method for companion diagnosis
CN110484617A (en) A kind of FBN1 C2223A mutation and its application influencing the diagnosis and treatment of people&#39;s marfan&#39;s syndrome
KR20190000340A (en) Analysis platform for personalized medicine based personal genome map and Analysis method using thereof
WO2023197442A2 (en) Group of myopia and high myopia related snp markers and application thereof
EP3055425B1 (en) Predicting increased risk for cancer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20220325