CN116042810B - Molecular marker of motion sickness and method for establishing genetic risk assessment model by molecular marker - Google Patents

Molecular marker of motion sickness and method for establishing genetic risk assessment model by molecular marker Download PDF

Info

Publication number
CN116042810B
CN116042810B CN202211600938.8A CN202211600938A CN116042810B CN 116042810 B CN116042810 B CN 116042810B CN 202211600938 A CN202211600938 A CN 202211600938A CN 116042810 B CN116042810 B CN 116042810B
Authority
CN
China
Prior art keywords
locus
base
human chromosome
motion sickness
risk
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211600938.8A
Other languages
Chinese (zh)
Other versions
CN116042810A (en
Inventor
赵志虎
沈文龙
张彦
李平
史姝
李进让
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Academy of Military Medical Sciences AMMS of PLA
Original Assignee
Academy of Military Medical Sciences AMMS of PLA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Academy of Military Medical Sciences AMMS of PLA filed Critical Academy of Military Medical Sciences AMMS of PLA
Priority to CN202211600938.8A priority Critical patent/CN116042810B/en
Publication of CN116042810A publication Critical patent/CN116042810A/en
Application granted granted Critical
Publication of CN116042810B publication Critical patent/CN116042810B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Medical Informatics (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Genetics & Genomics (AREA)
  • Public Health (AREA)
  • Organic Chemistry (AREA)
  • Biotechnology (AREA)
  • Analytical Chemistry (AREA)
  • Pathology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Primary Health Care (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioethics (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides a molecular marker of motion sickness and a method for establishing a genetic risk assessment model thereof, belonging to the technical field of medical detection. The invention uses 21 SNP loci as molecular markers of motion sickness, and uses the molecular markers for model construction to carry out genetic risk assessment on motion sickness. Experimental results show that the model evaluation index AUC is 0.811, and the method has the characteristics of strong specificity and high sensitivity; of the 38 low risk individuals with risk factors lower than 0.33, 31 are individuals without seasickness, the accuracy reaches 81.58%, and of the 38 high risk individuals with risk factors higher than 0.67, 33 are individuals with seasickness, the accuracy reaches 86.84%. Therefore, the motion sickness genetic risk assessment model has the characteristics of strong specificity, high sensitivity and high accuracy, and provides a more comprehensive, accurate and individualized scientific basis for motion sickness risk assessment.

Description

Molecular marker of motion sickness and method for establishing genetic risk assessment model by molecular marker
Technical Field
The invention belongs to the technical field of medical detection, and particularly relates to a molecular marker for motion sickness and a method for establishing a genetic risk assessment model.
Background
Motion Sickness (MS), which is a syndrome of the body that is mainly stimulated by inappropriate exercise environment and causes vestibular and autonomic nerve reactions, is a common and frequently occurring disease, and is most commonly referred to as motion sickness and seasickness. When people ride vehicles, the balance receptors of the vestibule of the inner ear of the human body are in conflict with the vision receptors, so that cold sweat, dizziness, nausea, vomiting and other symptoms are generated. This is also a widely accepted sensory conflict hypothesis in the current motion sickness etiology hypothesis. Especially, motion sickness syndrome induced by immersion in virtual reality has been a prominent problem in recent years with the development of virtual reality technology. Motion sickness is a complex disease that is not only related to environmental factors but also has a great correlation with genetic factors. There is evidence that genetic factors play an important role in the occurrence of motion sickness.
The occurrence of motion sickness is related to genetic factors, vision, individual constitution and other factors, and is well developed in the group with family history of motion sickness, vestibular dysfunction and the group of 3-20 years old, and the motion sickness can be induced by environmental changes such as high temperature and high humidity. Clearly, the key and relevant markers in studying genetic and individual physiological characteristics are central to MS sensitivity screening.
Fast moving vehicles are essential in modern society, and the high incidence of motion sickness can lead to serious non-fight fighters, especially for long-range personnel and pilots, etc., motion sickness is a significant challenge. Therefore, establishing a genetic risk evaluation model of motion sickness has important significance for the selection and improvement of relevant post adaptability of long-distance aviation personnel and the like.
Disclosure of Invention
Therefore, the invention aims to provide a molecular marker of motion sickness and a method for establishing a genetic risk assessment model, and the molecular marker has the characteristics of high sensitivity, strong specificity and high accuracy.
In order to achieve the above object, the present invention provides the following technical solutions:
a molecular marker for motion sickness, the marker comprising 21 SNP sites; the SNP loci are as follows: rs2476191, rs1475565, rs2551802, rs6748491, rs2318131, rs59171567, rs16836687, rs11914835, rs11928839, rs16860023, rs1801072, rs186097, rs34917904, rs6862443, rs768229, rs10253374, chr7_65737334, rs17350191, rs10970305, rs60003319, and rs36008205.
Preferably, the rs2476191 locus is positioned at 62113791 th position of human chromosome 1, and the base is G/A; the rs1475565 locus is positioned at 62113851 th position of a human chromosome 1, and the base is C/T; the rs2551802 locus is positioned at 176157430 th position of a human chromosome 2, and the base is C/G; the rs6748491 locus is positioned at 207626021 th position of a human chromosome 2, and the base is G/A; the rs2318131 locus is positioned at 237025323 th position of a human chromosome 2, and the base is A/C; the rs59171567 locus is positioned at 63501069 th position of human chromosome 3, and the base is A/G; the rs16836687 locus is positioned at 125271985 th position of human chromosome 3, and the base is C/T; the rs11914835 locus is positioned at 125486620 th position of human chromosome 3, and the base is C/T; the rs11928839 locus is positioned at 132347347 th position of human chromosome 3, and the base is C/A; the rs16860023 locus is positioned at 148012347 th position of human chromosome 3, and the base is T/C; the rs1801072 locus is positioned at 16254060 th position of a human chromosome 4, and the base is A/T; the rs186097 locus is positioned at 160306119 th position of a human chromosome 4, and the base is C/A; the rs34917904 locus is positioned at 160306240 th position of a human chromosome 4, and the base is T/C; the rs6862443 locus is positioned at 170556069 th position of human chromosome 5, and the base is A/G; the rs768229 locus is positioned at 28851132 th position of human chromosome 7, and the base is A/G; the rs10253374 locus is positioned at 30913434 th position of human chromosome 7, and the base is C/T; the chr7_65737334 locus is positioned at 65737334 th position of human chromosome 7, and the base is T/C; the rs17350191 locus is positioned at 123745421 th position of a human chromosome 8, and the base is C/T; the rs10970305 locus is positioned at 31372585 th position of human chromosome 9, and the base is A/C; the rs60003319 locus is positioned at 115159056 th position of human chromosome 9, and the base is A/G; the rs36008205 locus is positioned at 7238678 th position of human chromosome 16, and the base is G/A.
The invention also provides application of the molecular marker in preparation of a product for predicting genetic risk assessment of motion sickness.
Preferably, the product comprises reagents for detecting expression of the above molecular markers.
The invention also provides a method for establishing a motion sickness genetic risk assessment model by using the molecular marker, which comprises the following steps: carrying out targeted high-throughput sequencing on the sample to obtain the 21 SNP locus information; and inputting the motion sickness condition of the sample and the SNP locus information as training data into a random forest model to obtain a motion sickness genetic risk assessment model.
The invention also provides a method for evaluating motion sickness of people based on the model obtained by the method, which comprises the following steps: and inputting the 21 SNP locus information of the sample into the model to obtain a risk coefficient.
Preferably, when the risk factor is less than 0.33, a low risk individual is determined; when the risk coefficient is 0.33-0.67, judging that the patient is at risk; and when the risk coefficient is higher than 0.67, judging as a high-risk individual.
Preferably, the sample is blood.
Compared with the prior art, the invention has the following beneficial effects:
the invention uses 21 SNP loci as molecular markers of motion sickness, and uses the molecular markers for constructing a model to carry out risk assessment on the motion sickness. Experimental results show that the model evaluation index AUC is 0.811, and the method has the characteristics of strong specificity and high sensitivity; of the 38 low risk individuals with risk factors below 0.33, 31 were individuals without seasickness, with an accuracy of 81.58%, while of the 38 high risk individuals with risk factors above 0.67, 33 were individuals with seasickness, with an accuracy of 86.84%. Therefore, the model has the characteristic of high accuracy in motion sickness genetic risk assessment, and can provide more comprehensive, accurate and individualized scientific basis for motion sickness risk assessment.
Drawings
FIG. 1 is a Manhattan diagram showing associated genetic loci;
FIG. 2 is a graph of sensitivity and specificity of an AUC curve assessment model for assessing risk of motion sickness;
FIG. 3 is a graph of 21 SNP validated 120 individuals risk; wherein the genotypes of 21 loci of each sample are represented by color heatmaps, gray represents the reference allele type, light blue represents heterozygous, and dark blue represents homozygous variant; the samples are ranked from low to high according to the predicted risk from top to bottom; the type column shows the actual onset of motion sickness for each individual, with onset in a dark blue color and no onset in a light blue color.
Detailed Description
The invention provides a molecular marker of motion sickness, which comprises 21 SNP loci; the SNP loci are as follows: rs2476191, rs1475565, rs2551802, rs6748491, rs2318131, rs59171567, rs16836687, rs11914835, rs11928839, rs16860023, rs1801072, rs186097, rs34917904, rs6862443, rs768229, rs10253374, chr7_65737334, rs17350191, rs10970305, rs60003319, and rs36008205.
In the invention, the rs2476191 locus is positioned at 62113791 th position of human chromosome 1, and the base is G/A; the rs1475565 locus is positioned at 62113851 th position of a human chromosome 1, and the base is C/T; the rs2551802 locus is positioned at 176157430 th position of a human chromosome 2, and the base is C/G; the rs6748491 locus is positioned at 207626021 th position of a human chromosome 2, and the base is G/A; the rs2318131 locus is positioned at 237025323 th position of a human chromosome 2, and the base is A/C; the rs59171567 locus is positioned at 63501069 th position of human chromosome 3, and the base is A/G; the rs16836687 locus is positioned at 125271985 th position of human chromosome 3, and the base is C/T; the rs11914835 locus is positioned at 125486620 th position of human chromosome 3, and the base is C/T; the rs11928839 locus is positioned at 132347347 th position of human chromosome 3, and the base is C/A; the rs16860023 locus is positioned at 148012347 th position of human chromosome 3, and the base is T/C; the rs1801072 locus is positioned at 16254060 th position of a human chromosome 4, and the base is A/T; the rs186097 locus is positioned at 160306119 th position of a human chromosome 4, and the base is C/A; the rs34917904 locus is positioned at 160306240 th position of a human chromosome 4, and the base is T/C; the rs6862443 locus is positioned at 170556069 th position of human chromosome 5, and the base is A/G; the rs768229 locus is positioned at 28851132 th position of human chromosome 7, and the base is A/G; the rs10253374 locus is positioned at 30913434 th position of human chromosome 7, and the base is C/T; the chr7_65737334 locus is positioned at 65737334 th position of human chromosome 7, and the base is T/C; the rs17350191 locus is positioned at 123745421 th position of a human chromosome 8, and the base is C/T; the rs10970305 locus is positioned at 31372585 th position of human chromosome 9, and the base is A/C; the rs60003319 locus is positioned at 115159056 th position of human chromosome 9, and the base is A/G; the rs36008205 locus is positioned at 7238678 th position of human chromosome 16, and the base is G/A.
The invention also provides application of the molecular marker in preparation of a product for predicting genetic risk assessment of motion sickness.
In the present invention, the product comprises a reagent for detecting the expression of the above molecular marker; the present invention is not particularly limited in the kind of the product, and any product conventionally used in the art for detecting or diagnosing motion sickness may be used, and a chip or a kit is preferable.
The invention also provides a method for establishing a motion sickness genetic risk assessment model by using the molecular marker, which comprises the following steps: carrying out targeted high-throughput sequencing on the sample to obtain the 21 SNP locus information; and inputting the motion sickness condition of the sample and the SNP locus information as training data into a random forest model to obtain a motion sickness genetic risk assessment model.
The screening of SNP loci for motion sickness according to the present invention preferably comprises: designing a target capture range, searching and inquiring genes related to motion sickness from published papers, databases and the like, combining all the sites, and designing and synthesizing a target capture probe as a potential genetic locus which can play a role in motion sickness; collecting case control samples, screening a motion sickness susceptibility scale of a certain number of people, screening out individuals susceptible to or not susceptible to motion sickness, collecting blood samples, and extracting blood DNA; carrying out targeted high-throughput sequencing and association analysis on samples, establishing a genome sequencing library for the DNA samples, carrying out targeted enrichment on the sequencing library by using the obtained targeted capture probes, carrying out high-throughput sequencing, detecting the mutation sites of each sample, carrying out association analysis to identify genetic variation associated with motion sickness susceptibility, and screening to obtain 21 SNP sites.
The random forest model according to the invention is preferably motion sickness risk prediction software (certificate number: soft-written accession number 10470617).
The invention also provides a method for evaluating motion sickness of people based on the model obtained by the method, which comprises the following steps: and inputting the 21 SNP locus information of the sample into the model to obtain a risk coefficient.
The model according to the invention is preferably obtained by means of motion sickness risk prediction software (certificate number: soft-written accession number 10470617). After 21 SNP locus information of the sample is input into the software, the method also preferably comprises the steps of running shell scripts, generating vcf files, and running command line scripts on the vcf files by using plink software to obtain risk coefficients.
In the present invention, when the risk factor is lower than 0.33, it is judged as a low risk individual; when the risk coefficient is 0.33-0.67, judging that the patient is at risk; and when the risk coefficient is higher than 0.67, judging as a high-risk individual. The independent 120 samples are evaluated by using the evaluation model of the invention and compared with the actual seasickness symptoms, and the experimental result shows that the accuracy of 31 individuals without seasickness in 38 low-risk individuals with risk factors lower than 0.33 is 81.58 percent, and the accuracy of 33 individuals with seasickness in 38 high-risk individuals with risk factors higher than 0.67 is 86.84 percent. Therefore, the model of the invention has the characteristic of high accuracy in motion sickness genetic risk assessment.
The evaluation result of the invention can be used for guiding the selection and the drawing of long-distance aviation personnel and the like, improving the adaptability of long-distance aviation work stations, and particularly, the high-risk individuals do not recommend long-distance aviation operation.
The technical solutions provided by the present invention are described in detail below with reference to examples, but they should not be construed as limiting the scope of the present invention.
Example 1
1. Designing a target capture range
(1) The NCBI database is searched for keywords of "motion position" and 7 genes related to motion sickness reported in known literature are queried, and the total of 411 fragments of exon regions of the genes are found.
(2) The potential enhancers and regulatory elements of the motion sickness gene are searched for by using the genecancer search function of the genegards website, 453 are used for screening the most probable regulatory elements according to the scores of the regulatory elements (possibility of being the regulatory elements) and the association degree of the regulatory elements and the genes. Requiring a regulatory element score of Gencancer score >1, and an associated index of association score >100, the threshold enables selection of more reliable regulatory elements while narrowing down the range of subsequent targeted sequencing, selecting the 12 most relevant regulatory elements.
(3) Genetic loci associated with motion sickness are searched from published papers, databases. 35 SNPs are associated with susceptibility to motion sickness from the known literature (Hum Mol Genet.20150Ay1; 24 (9): 2700-8.Doi:10.1093/hmg/ddv028.Epub 2015058.). The study was included.
(4) The site of linkage disequilibrium was mined using HAPMAP data, thousand person genome project data disclosure data, and sites with regulatory properties were preferred therefrom. In this study example, the GWAS4D website (mulinlab. Tmu. Edu. Cn/GWAS 4D) was selected, which integrates linkage disequilibrium analysis and regulatory SNP prediction functions. Inputting the 35 SNP information, selecting the genome of the east Asia population to carry out linkage disequilibrium expansion, and expanding to obtain 459 potential associated SNP in total.
(5) All the fragments were integrated and deduplicated to obtain 829 fragments, the variation of which may be associated with motion sickness.
(6) 829 fragments were submitted for design of targeting probes. The probes were designed by Ai Jitai Kangshensu Biotech Co., ltd, and the results of the design are shown in Table 1.
TABLE 1 Targeted Probe design results
2. Collecting case control samples
(1) Screening of the motion sickness susceptibility scale was performed on the basis of informed consent.
Individuals who were or were not susceptible to motion sickness were screened and their blood samples were collected, together with 539 samples. Wherein more than a moderate seasickness is 57 persons. 58 persons without family history of seasickness were selected from the non-sickness 358 persons as a control group.
(2) Blood genomic DNA is extracted and basic DNA quality control is performed, including DNA purity, concentration, and integrity.
(3) Targeted high throughput sequencing was performed on 57 of the above populations with moderate susceptibility to motion sickness and 58 of the less susceptible populations to motion sickness.
3. Targeted high throughput sequencing and correlation analysis of case control samples
(1) And (3) establishing a genome sequencing library for the DNA sample, and carrying out targeted enrichment on the sequencing library by using the targeted capture probe obtained in the first step (1, designing a targeted capture range).
(2) High throughput sequencing and basic data quality control.
The original sequencing amount of each sample is 1335Mbp on average, the effective data amount after removing low-quality readings is 1172Mbp on average, the average sequencing mass Q30 reaches 87.8%, the average sequencing reading length is 135bp, the average insert size is 187bp, the average sequencing reading is 8.66M, the effective comparison rate is over 99.4%, the average sequencing reading of a target area is 4.56M, the average effective sequencing amount of the target area is 219Mbp, the sequencing coverage is over 99.8%, the average sequencing depth reaches 4063, and the area reaching 30X coverage exceeds 98.9% of the whole target area.
(3) The mutation site information of all samples was integrated by GATK software, shell scripts were run (see related documents Elgart, m., lyons, g., romiro-Brufau, s., kurniansyah, n., brody, j.a., guo, x., lin, h.j., raffield, l., gao, y., chen, h., et al (2022) Non-linear machine learning models incorporating SNPs and PRS improve polygenic prediction in diverse human positions.com Biol 5,856.10.1038/s 42003-022-03812-z.) and mutation sites with only two alleles were extracted using plink software, resulting in 1468 mutation sites after finishing.
(4) And performing quality control on the obtained variation sites, screening the deletion data, deleting the variation sites with the deletion rate of more than 10% in the samples, simultaneously deleting the samples with the deletion rate of more than 10% of the variation sites, and remaining 115 effective samples and 1458 variation sites after screening.
(5) Variant sites with MAF values less than 0.05 were filtered, as variant sites with too low allele frequencies could introduce false positives. Meanwhile, according to Hardy-Weinberg genetic balance fitness test method, the mutation site with the p value smaller than 0.01 is deleted. After finishing, 730 mutation sites which can be used for subsequent association analysis are obtained.
(6) Through logistic regression analysis, 56 mutation sites with p value less than 0.05 were obtained by screening, as shown in fig. 1.
Example 2
Construction of risk assessment model using associated genetic loci
(1) Determination of genetic loci
And selecting associated genetic variation sites, constructing a Gradient Boosting-based random forest model, and optimizing the super parameters by a Bayesian method. Inputting a high-throughput sequencing sequence original information file of a case sample in motion sickness risk prediction software (certificate number: soft-written sign 10470617), filling in a sample name and a corresponding file directory in a settings. Config file, running a shell script, generating a finished vcf file, containing 56 SNP locus information obtained by sequencing, calculating to obtain the significance degree of 56 SNPs, carrying out feature engineering, and reserving 21 SNPs with importance indexes larger than 0.01. The specific results are shown in Table 2.
TABLE 221 SNP information
Example 3
Constructing a risk assessment model and validating
(1) Establishment of risk assessment model
Carrying out targeted high-throughput sequencing on the blood of the case sample to obtain 21 SNP locus information of the case sample; inputting the 21 mutation sites and mutation site information of all sequencing samples into motion sickness risk prediction software, establishing a random forest model, running shell scripts, generating vcf files, running command line scripts on the vcf files by using plink software to obtain risk coefficients (references Elgart, m, lyons, g., romero-brufauu, s, kurniansyah, n., brody, j.a., guo, x, lin, h.j., raffield, l., gao, y, chen, h., et al (2022), non-linear machine learning models incorporating SNPs and PRS improve polygenic prediction in diverse humanputus, com Biol 5,856.10.1038/s 42003-022-03812-z.), and evaluating the risk of hypoxia intolerance of case samples.
Setting, when the risk coefficient is lower than 0.33, judging as a low-risk individual; when the risk coefficient is 0.33-0.67, judging that the patient is at risk; and when the risk coefficient is higher than 0.67, judging as a high-risk individual.
(2) Verification
The samples (57 cases of people with moderate susceptibility to motion sickness and 58 cases of people with no susceptibility to motion sickness) are divided into training data and test data according to the proportion of 8:2, and multiple cross verification is carried out, and the working characteristic curve (ROC curve, which is a curve with the true positive rate as the vertical axis and the false positive rate as the horizontal axis) of the test subject is shown in fig. 2, and the model evaluation index AUC (area under the ROC curve) is 0.811, which can be used for evaluating the overall capacity of the model, and the greater the AUC value, the higher the model classification accuracy is), so that the evaluation model has the characteristic of high accuracy.
Example 4
An additional 120 samples were independently enrolled, and their risk was assessed using the 21 SNP information and the established assessment model and compared to the actual symptoms of seasickness. The specific results of the genotype thermogram information of the samples are shown in FIG. 3.
Experimental results show that 31 out of 38 low risk individuals with risk factors lower than 0.33 are individuals without seasickness, the accuracy reaches 81.58%, and 33 out of 38 high risk individuals with risk factors higher than 0.67 are individuals with seasickness, the accuracy reaches 86.84%.
The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims (2)

1. A method for establishing a genetic risk assessment model for motion sickness by using molecular markers, which is characterized by comprising the following steps: carrying out targeted high-throughput sequencing on the sample to obtain 21 SNP locus information; inputting the motion sickness condition of the sample and the SNP locus information as training data into a random forest model to obtain a motion sickness genetic risk assessment model;
the 21 SNP loci are: rs2476191, rs1475565, rs2551802, rs6748491, rs2318131, rs59171567, rs16836687, rs11914835, rs11928839, rs16860023, rs1801072, rs186097, rs34917904, rs6862443, rs768229, rs10253374, chr7_65737334, rs17350191, rs10970305, rs60003319, and rs36008205; the rs2476191 locus is positioned at 62113791 th position of a human chromosome 1, and the base is G/A; the rs1475565 locus is positioned at 62113851 th position of a human chromosome 1, and the base is C/T; the rs2551802 locus is positioned at 176157430 th position of a human chromosome 2, and the base is C/G; the rs6748491 locus is positioned at 207626021 th position of a human chromosome 2, and the base is G/A; the rs2318131 locus is positioned at 237025323 th position of a human chromosome 2, and the base is A/C; the rs59171567 locus is positioned at 63501069 th position of human chromosome 3, and the base is A/G; the rs16836687 locus is positioned at 125271985 th position of human chromosome 3, and the base is C/T; the rs11914835 locus is positioned at 125486620 th position of human chromosome 3, and the base is C/T; the rs11928839 locus is positioned at 132347347 th position of human chromosome 3, and the base is C/A; the rs16860023 locus is positioned at 148012347 th position of human chromosome 3, and the base is T/C; the rs1801072 locus is positioned at 16254060 th position of a human chromosome 4, and the base is A/T; the rs186097 locus is positioned at 160306119 th position of a human chromosome 4, and the base is C/A; the rs34917904 locus is positioned at 160306240 th position of a human chromosome 4, and the base is T/C; the rs6862443 locus is positioned at 170556069 th position of human chromosome 5, and the base is A/G; the rs768229 locus is positioned at 28851132 th position of human chromosome 7, and the base is A/G; the rs10253374 locus is positioned at 30913434 th position of human chromosome 7, and the base is C/T; the chr7_65737334 locus is positioned at 65737334 th position of human chromosome 7, and the base is T/C; the rs17350191 locus is positioned at 123745421 th position of a human chromosome 8, and the base is C/T; the rs10970305 locus is positioned at 31372585 th position of human chromosome 9, and the base is A/C; the rs60003319 locus is positioned at 115159056 th position of human chromosome 9, and the base is A/G; the rs36008205 locus is positioned at 7238678 th position of human chromosome 16, and the base is G/A.
2. The application of a reagent for detecting the expression of a molecular marker in the preparation of a chip or a kit for predicting the genetic risk assessment of motion sickness is characterized in that the molecular marker comprises 21 SNP loci; the SNP loci are as follows: rs2476191, rs1475565, rs2551802, rs6748491, rs2318131, rs59171567, rs16836687, rs11914835, rs11928839, rs16860023, rs1801072, rs186097, rs34917904, rs6862443, rs768229, rs10253374, chr7_65737334, rs17350191, rs10970305, rs60003319, and rs36008205; the rs2476191 locus is positioned at 62113791 th position of a human chromosome 1, and the base is G/A; the rs1475565 locus is positioned at 62113851 th position of a human chromosome 1, and the base is C/T; the rs2551802 locus is positioned at 176157430 th position of a human chromosome 2, and the base is C/G; the rs6748491 locus is positioned at 207626021 th position of a human chromosome 2, and the base is G/A; the rs2318131 locus is positioned at 237025323 th position of a human chromosome 2, and the base is A/C; the rs59171567 locus is positioned at 63501069 th position of human chromosome 3, and the base is A/G; the rs16836687 locus is positioned at 125271985 th position of human chromosome 3, and the base is C/T; the rs11914835 locus is positioned at 125486620 th position of human chromosome 3, and the base is C/T; the rs11928839 locus is positioned at 132347347 th position of human chromosome 3, and the base is C/A; the rs16860023 locus is positioned at 148012347 th position of human chromosome 3, and the base is T/C; the rs1801072 locus is positioned at 16254060 th position of a human chromosome 4, and the base is A/T; the rs186097 locus is positioned at 160306119 th position of a human chromosome 4, and the base is C/A; the rs34917904 locus is positioned at 160306240 th position of a human chromosome 4, and the base is T/C; the rs6862443 locus is positioned at 170556069 th position of human chromosome 5, and the base is A/G; the rs768229 locus is positioned at 28851132 th position of human chromosome 7, and the base is A/G; the rs10253374 locus is positioned at 30913434 th position of human chromosome 7, and the base is C/T; the chr7_65737334 locus is positioned at 65737334 th position of human chromosome 7, and the base is T/C; the rs17350191 locus is positioned at 123745421 th position of a human chromosome 8, and the base is C/T; the rs10970305 locus is positioned at 31372585 th position of human chromosome 9, and the base is A/C; the rs60003319 locus is positioned at 115159056 th position of human chromosome 9, and the base is A/G; the rs36008205 locus is positioned at 7238678 th position of human chromosome 16, and the base is G/A.
CN202211600938.8A 2022-12-13 2022-12-13 Molecular marker of motion sickness and method for establishing genetic risk assessment model by molecular marker Active CN116042810B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211600938.8A CN116042810B (en) 2022-12-13 2022-12-13 Molecular marker of motion sickness and method for establishing genetic risk assessment model by molecular marker

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211600938.8A CN116042810B (en) 2022-12-13 2022-12-13 Molecular marker of motion sickness and method for establishing genetic risk assessment model by molecular marker

Publications (2)

Publication Number Publication Date
CN116042810A CN116042810A (en) 2023-05-02
CN116042810B true CN116042810B (en) 2023-10-20

Family

ID=86130639

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211600938.8A Active CN116042810B (en) 2022-12-13 2022-12-13 Molecular marker of motion sickness and method for establishing genetic risk assessment model by molecular marker

Country Status (1)

Country Link
CN (1) CN116042810B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114657269A (en) * 2022-05-08 2022-06-24 公安部物证鉴定中心 High-performance autosomal mini-haplotype genetic markers and primer set for detecting genetic markers

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114657269A (en) * 2022-05-08 2022-06-24 公安部物证鉴定中心 High-performance autosomal mini-haplotype genetic markers and primer set for detecting genetic markers

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
Genetic influences on motion sickness susceptibility in adult women: a classical twin study;Caroline M Reavley等;Aviat Space Environ Med;第77卷(第11期);第1148-1152页 *
Genetic variants associated with motion sickness point to roles for inner ear development, neurological processes and glucose homeostasis;Bethann S Hromatka等;Hum Mol Genet;第24卷(第9期);第2700-2708页 *
Motion Sickness: Current Knowledge and Recent Advance;Li-Li Zhang等;Review CNS Neurosci Ther;第22卷(第1期);第15-24页 *
Non-linear machine learning models incorporating SNPs and PRS improve polygenic prediction in diverse human populations;Michael Elgart等;Commun Biol;第5卷(第1期);第856页 *
The Predictive Role of ADRA2A rs1800544 and HTR3B rs3758987 Polymorphisms in Motion Sickness Susceptibility;Xinchen Zhang等;Int J Environ Res Public Health;第18卷(第24期);第13163页 *
Variants in ACPP are associated with cerebrospinal fluid Prostatic Acid Phosphatase levels;Lyndsay A Staley等;BMC Genomics;第17卷(第3期);第439页 *

Also Published As

Publication number Publication date
CN116042810A (en) 2023-05-02

Similar Documents

Publication Publication Date Title
CN109859801B (en) Model for predicting lung squamous carcinoma prognosis by using seven genes as biomarkers and establishing method
Schaid et al. From genome-wide associations to candidate causal variants by statistical fine-mapping
Reich et al. A whole-genome admixture scan finds a candidate locus for multiple sclerosis susceptibility
US7107155B2 (en) Methods for the identification of genetic features for complex genetics classifiers
KR101542529B1 (en) Examination methods of the bio-marker of allele
KR101460520B1 (en) Detecting method for disease markers of NGS data
CN112553327B (en) Construction method of pulmonary thromboembolism risk prediction model based on single nucleotide polymorphism, SNP site combination and application
US20070111247A1 (en) Systems and methods for the biometric analysis of index founder populations
WO2006065658A2 (en) A physiogenomic method for predicting clinical outcomes of treatments in patients
KR101693504B1 (en) Discovery system for disease cause by genetic variants using individual whole genome sequencing data
CN107025384A (en) A kind of construction method of complex data forecast model
Camastra et al. Statistical and computational methods for genetic diseases: An overview
CN113593630A (en) Family coronary heart disease risk assessment and risk factor identification system
KR20150024232A (en) Examination methods of the origin marker of resistance from drug resistance gene about disease
KR102085169B1 (en) Analysis system for personalized medicine based personal genome map and Analysis method using thereof
CN115011687A (en) Biomarker group, kit and system for predicting adverse prognosis of ischemic stroke patient
Simonin-Wilmer et al. An overview of strategies for detecting genotype-phenotype associations across ancestrally diverse populations
Sung et al. Integrative analysis of risk factors for immune-related adverse events of checkpoint blockade therapy in cancer
CN116042810B (en) Molecular marker of motion sickness and method for establishing genetic risk assessment model by molecular marker
Doss et al. Application of evolutionary based in silico methods to predict the impact of single amino acid substitutions in vitelliform macular dystrophy
US20080140320A1 (en) Biometric analysis populations defined by homozygous marker track length
CN105886609A (en) Risk assessment of adverse drug reaction and device thereof
Yamaguchi-Kabata et al. Genetic differences in the two main groups of the Japanese population based on autosomal SNPs and haplotypes
CN117789819B (en) Construction method of VTE risk assessment model
CN115976195B (en) Method for constructing acute hypoxia intolerance molecular marker and genetic risk assessment model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant