WO2022022663A1 - Method for constructing alcohol tolerance prediction model - Google Patents

Method for constructing alcohol tolerance prediction model Download PDF

Info

Publication number
WO2022022663A1
WO2022022663A1 PCT/CN2021/109450 CN2021109450W WO2022022663A1 WO 2022022663 A1 WO2022022663 A1 WO 2022022663A1 CN 2021109450 W CN2021109450 W CN 2021109450W WO 2022022663 A1 WO2022022663 A1 WO 2022022663A1
Authority
WO
WIPO (PCT)
Prior art keywords
drinking
alcohol
prediction model
sample
locus
Prior art date
Application number
PCT/CN2021/109450
Other languages
French (fr)
Chinese (zh)
Inventor
朱慧彬
何荣军
王丽香
赵宗宝
Original Assignee
苏州因顿医学检验实验室有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州因顿医学检验实验室有限公司 filed Critical 苏州因顿医学检验实验室有限公司
Publication of WO2022022663A1 publication Critical patent/WO2022022663A1/en

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/172Haplotypes

Definitions

  • the invention relates to the technical field of biological genes, in particular to a method for constructing an alcohol quantity prediction model.
  • alcohol After alcohol enters the human body, it enters the blood circulation directly through the biofilm through the oral cavity, esophagus, stomach, intestine and other organs, and is quickly transported to various tissues and organs of the body for metabolism and utilization.
  • Alcohol metabolism is mainly completed by two enzymes (alcohol dehydrogenase and aldehyde dehydrogenase), and the difference in drinking ability (alcohol amount) between individuals is mainly determined by the activities of these two enzymes, and the activity of the enzymes is determined by the gene In the final analysis, the amount of alcohol a person can drink is determined by genetics.
  • the present invention aims to solve one of the technical problems in the above technologies at least to a certain extent.
  • the purpose of the present invention is to propose a method for constructing an alcohol consumption prediction model, constructing an alcohol consumption prediction model through machine learning, and by collecting the relationship between the drinking ability and the drinking quantity of the sample, the relationship between the drinking ability and the drinking quantity of the sample is calculated.
  • ⁇ Drinking rank to establish the first database to improve the accuracy of the data, collect the genetic data of the sample and format the genetic data, match the genetic data of the sample with the data in the first database, and construct the first drinking through the machine learning model.
  • the quantity prediction model provides drinking quantity judgment criteria, quantifies the drinking ability of individuals, and is conducive to effective alcohol consumption prediction for users.
  • the embodiment of the present invention proposes a method for constructing an alcohol volume prediction model, including:
  • the drinking ability and the drinking quantity of the sample are obtained through the questionnaire survey method, and the data is analyzed to obtain the corresponding relationship between the drinking ability and the drinking quantity, and the drinking quantity is divided into the first Presetting a number of drinking grades, establishing a first database, classifying drinking levels into drinking levels, and specifically quantifying drinking ability is conducive to giving more valuable drinking suggestions.
  • a machine learning model is used to construct the first drinking volume prediction model, which provides the drinking volume judgment standard and quantifies the drinking ability of the individual, which is beneficial to Effective alcohol prediction for users.
  • the acquiring genetic data of the sample and formatting the genetic data includes:
  • S23 Process the gene data after gene sequencing to obtain the genotype of the locus related to the alcohol consumption of each sample;
  • genetic locus screening is performed on the genetic data of the formatted sample, including:
  • S241 calculate the characteristic value of each gene locus respectively and each data subset obtained after the described first database is divided and the purity promotion value or the uncertainty reduction value of the data set before division;
  • the loci related to alcohol consumption include the rs1229984 locus and the rs671 locus, wherein the rs1229984 locus is located on the ADH1B gene, and when the result of the rs1229984 locus is TT type, The activity of alcohol dehydrogenase is strong, and the metabolism of ethanol is fast; the result is CT type, the activity of alcohol dehydrogenase is moderate, and the rate of alcohol metabolism is moderate; the result is that the activity of alcohol dehydrogenase is weak, and the rate of alcohol metabolism is slow; the rs671 gene locus is located in In the ALDH2 gene, when the rs671 gene locus was GG type, the activity of acetaldehyde dehydrogenase was strong, and the acetaldehyde metabolism was fast; when the result was GA ⁇ AA type, the activity of acetaldehyde dehydrogenase was weak, and the metabolism of acetalde
  • the method further includes:
  • DNA extraction is performed according to the saliva of the sample, including:
  • it also includes:
  • S71 Acquire second information that affects the drinking amount of the user, where the second information includes: disease history, drinking type, drinking degree, and drinking frequency;
  • Fig. 1 is the flow chart of the construction method of a kind of alcohol quantity prediction model according to an embodiment of the present invention
  • FIG. 2 is a flowchart of processing genetic data of a sample according to an embodiment of the present invention
  • FIG. 3 is a flowchart of the screening of genetic loci related to alcohol consumption according to an embodiment of the present invention
  • FIG. 4 is a flowchart of establishing a second alcohol consumption prediction model according to an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of a decision tree of drinking quantity-related genes and drinking rank according to an embodiment of the present invention.
  • the following describes a method for constructing an alcohol volume prediction model proposed by an embodiment of the present invention with reference to FIG. 1 to FIG. 6 .
  • Fig. 1 is the flow chart of the construction method of a kind of alcohol quantity prediction model according to an embodiment of the present invention. As shown in Fig. 1, the embodiment of the present invention proposes a kind of construction method of alcohol quantity forecast model, including:
  • the drinking ability and the drinking quantity of the sample are obtained through the questionnaire survey method, and the data is analyzed to obtain the corresponding relationship between the drinking ability and the drinking quantity, and the drinking quantity is divided into the first Presetting a number of drinking grades, establishing a first database, classifying drinking levels into drinking levels, and specifically quantifying drinking ability is conducive to giving more valuable drinking suggestions.
  • Obtain the genetic data of the sample and format the genetic data select a machine learning model to construct the first alcohol consumption prediction model according to the formatted sample genetic data and the first database, and construct an alcohol consumption prediction model through the decision tree model in the machine learning, It provides drinking quantity judgment criteria and quantifies the drinking ability of individuals, which is conducive to effective alcohol consumption prediction for users.
  • the machine learning model includes: at least one of linear classification, linear regression, support vector machine (SVM), decision tree classification model, naive Bayes, random forest, and neural network model.
  • SVM support vector machine
  • decision tree classification model is easy to understand and implement, and the training results are easy to deduce the corresponding logical expressions, which has good readability.
  • FIG. 2 is a flowchart of processing genetic data of a sample according to an embodiment of the present invention. As shown in FIG. 2 , the acquiring and formatting the genetic data of the sample includes:
  • S23 Process the gene data after gene sequencing to obtain the genotype of the locus related to the alcohol consumption of each sample;
  • DNA extraction, gene sequencing, and genotyping are performed on the saliva of the sample to obtain the genetic data of the sample; the gene sequencing methods include: chip sequencing, second-generation sequencing, third-generation sequencing, PCR sequencing, At least one of panel sequencing.
  • the genotypes of the loci related to the alcohol consumption of each sample are obtained.
  • the loci are formatted into numbers according to the genotype. Illustratively, wild type is 0, heterozygous mutant is 1, and homozygous mutant is 2.
  • CC is a homozygous mutant, formatted as a number 2;
  • TT is a wild type, formatted as a number 0;
  • CT is a heterozygous mutant, formatted as a number 1; such as at the rs671 locus
  • AA is a homozygous mutant, formatted as a number 2;
  • GG is a wild type, formatted as a number 0;
  • AG is a heterozygous mutant, formatted as a number 1.
  • gene locus screening is performed on the genetic data of the formatted sample, including:
  • the method for measuring the purity and uncertainty of the data set before and after dividing the data set includes calculating at least one parameter in the information gain, the information gain rate, and the Gini coefficient. Specifically, the method is determined according to the Gini coefficient. In the method of purity and uncertainty, the larger the Gini coefficient, the higher the uncertainty of the data, and the lower the sample purity, indicating that the proportion of the target sample in the data set to the total sample is smaller; the smaller the Gini coefficient, the uncertainty of the data.
  • the Gini coefficient is equal to 0, all samples in the dataset are of the same class.
  • the result of the rs671 gene locus of the sample is GG, that is, it is determined whether the format of the rs671 gene locus of the sample is 0, and according to whether the rs671 gene locus is 0, the first
  • the database is divided into two data sets, the first data set and the second data set, wherein the rs671 gene locus of the samples in the first data set is GG, and the rs671 gene locus results of the samples in the second data set are AA and AG; Calculate the Gini coefficient of the eigenvalues of each gene locus in the first data set and the second data set, and select the gene locus A with the smallest calculated Gini coefficient and the eigenvalue a of the gene locus A, wherein the all The gene locus A is used as a child node, and the data set is split again according to the grouping of the characteristic value a of the gene locus A; for example, in the first data set, it
  • the drinking segment is 8 segments.
  • each group is a sample of the same type, that is, the Gini coefficient is 0
  • the splitting is stopped, and the gene loci related to the amount of drinking and the relationship between the locus and the drinking rank are finally obtained.
  • the genes related to alcohol consumption are divided into corresponding drinking grades according to their gene types, which is convenient for memory, and can accurately reflect the corresponding relationship between gene types and alcohol consumption, which is clear at a glance and improves user experience.
  • the loci related to alcohol consumption include the rs1229984 locus and the rs671 locus, wherein the rs1229984 locus is located on the ADH1B gene, and when the result of the rs1229984 locus is TT type, The activity of alcohol dehydrogenase is strong, and the metabolism of ethanol is fast; the result is CT type, the activity of alcohol dehydrogenase is moderate, and the rate of alcohol metabolism is moderate; the result is that the activity of alcohol dehydrogenase is weak, and the rate of alcohol metabolism is slow; the rs671 gene locus is located in In the ALDH2 gene, when the rs671 gene locus was GG type, the activity of acetaldehyde dehydrogenase was strong, and the acetaldehyde metabolism was fast; when the result was GA ⁇ AA type, the activity of acetaldehyde dehydrogenase was weak, and the metabolism of acetalde
  • the genetic loci directly related to alcohol consumption further include: rs6413413, rs698, rs2298755 and other loci; alcohol dependence-related loci include: rs2066702, rs55768019, rs1789891 and other genetic loci; alcohol use disorder related genes
  • the loci include: rs4975012, rs7078436, rs3114045 and other loci; alcoholism-related loci include: rs9556711, rs2140418, rs8040009 and other loci; alcohol sensitivity-related loci include: rs112834343, rs75536499, rs146298733 alleles
  • the loci related to drinking response include rs143894582, rs200848948, rs397813807 and other loci; the loci related to alcohol-dependent impulsive behavior include: rs34997829 locus; Gene loci, according to the different permutations and combinations of the genotyp
  • a decision tree model is used to construct a drinking quantity prediction model
  • Algorithms include:
  • DecisionTreeClassifier module main parameter settings:
  • criterion 'gini': select the Gini coefficient as the metric for the quality of node division;
  • splitter 'best': find the best split point among all features
  • max_depth None: Set the maximum depth of the decision tree, None means that the maximum depth of the decision tree is not constrained until the samples on each leaf node belong to the same class;
  • min_samples_split 2: When dividing an internal node, the minimum number of samples on the node is required to be 2;
  • min_samples_leaf 1: Set the minimum number of samples on the leaf node to 1;
  • rs1229984 locus and the rs671 locus were found to be related to alcohol consumption.
  • Fig. 4 is a flow chart of establishing a second drinking amount prediction model according to an embodiment of the present invention; as shown in Fig. 4, after constructing the first drinking amount prediction model, it also includes:
  • the working principle and beneficial effects of the above technical solutions optimizing the first drinking amount prediction model, converting the first preset number of drinking tiers into a second preset number of drinking tiers, and the second preset number may be 9, according to The relationship between the drinking ability of the sample and the amount of drinking, and the second preset number of drinking tiers are used to establish a second database.
  • the genetic locus and the second database select a machine learning model to reconstruct the drinking volume prediction model to obtain the second drinking volume prediction model, which can reduce the calculation amount and complexity, and improve the prediction accuracy of the second drinking volume prediction model.
  • the relationship between the rs1229984 locus and the rs671 locus and alcohol consumption is shown in Table 1.
  • the second preset number is 7, and the drinking rank is 7 ranks, and the relationship between the rs1229984 locus and the rs671 locus and the drinking amount is obtained as shown in Table 2.
  • the working principle and beneficial effects of the above technical solutions when the drinking rank is 0, there are three situations: 1.
  • the rs1229984 locus is CC, and the rs671 locus is AA; 2.
  • the rs1229984 locus is TT and the rs671 locus 3.
  • the rs1229984 locus is CT, and the rs671 locus is AA.
  • the naming of drinking tiers is done in a discontinuous way. If 3 and 6 tiers are missing, the discontinuous naming can match the drinking tier with the specific amount of alcohol consumed. For example, when the drinking tier is 9, the user The amount of alcohol consumed is 9 taels or more.
  • the user's drinking amount is predicted based on the second drinking amount prediction model according to the user's genetic data. It can provide the judgment standard of drinking quantity, quantify the drinking ability of the individual, and give more intuitive and valuable drinking quantity evaluation and drinking advice according to the user's physical condition, so as to improve the user's experience.
  • FIG. 5 is a flow chart of DNA extraction from saliva according to an embodiment of the present invention. As shown in Figure 5, DNA extraction from saliva includes:
  • saliva contains oral cavity exfoliated cells, and the cells contain genetic material DNA.
  • the user's DNA can be extracted from the saliva for corresponding processing and analysis.
  • the genetic data of the sample can also be obtained by collecting blood for DNA extraction.
  • DNA extraction through blood has high sensitivity, and the extracted DNA data is more accurate.
  • it also includes:
  • S71 Acquire second information that affects the drinking amount of the user, where the second information includes: disease history, drinking type, drinking degree, and drinking frequency;
  • the prediction result output by the second drinking volume prediction model based on the user's genetic data does not consider the actual situation of the user, and the prediction of drinking volume needs to be revised according to the second information affecting the drinking volume of the user
  • the establishment of a third alcohol consumption prediction model requires the establishment of a correction mechanism for alcohol consumption based on the second information.
  • the genotype of the rs1229984 locus is CC and the genotype of the rs671 locus is GG
  • the user's alcohol consumption is predicted to be 7 segments, that is, the user can drink 7 Two or more wines (take 50° liquor as an example), but the user has recently suffered from stomach problems and cannot drink alcohol.
  • Drinking alcohol can easily cause gastric perforation and seriously endanger health. Similarly, according to the user's drinking type, drinking degree, and drinking frequency, the prediction of the user's drinking amount will also be affected. According to the actual situation of the user, a more effective alcohol consumption prediction is made, so that the prediction result of the third alcohol consumption prediction model is more accurate.
  • the method for revising the second alcohol consumption prediction model according to the second information includes:
  • V 1 A ⁇ c
  • A is the drinking volume (ml) output by the second drinking volume prediction model based on the user's genetic data
  • c is the preset alcohol concentration (%vol) in the second drinking volume prediction model
  • V 2 V 1 ⁇ d ⁇ t ⁇ f
  • d is the correlation coefficient between the user's disease history and drinking amount
  • t is the correlation coefficient between drinking type and drinking amount
  • f is the correlation coefficient between drinking frequency and drinking amount
  • cu is the drinking degree input by the user.
  • the correlation coefficient d between the user's disease history and the amount of alcohol consumption is 0, That is, the user cannot drink alcohol; the correlation coefficient d between the disease history of other users and the amount of drinking is between 0 and 1; the value of the correlation coefficient t between the type of drinking and the amount of drinking is shown in Table 3; The value of the correlation coefficient f is shown in Table 4; through the above algorithm, the second drinking volume prediction model is modified according to the second information to obtain the third drinking volume prediction model, which can be used for more effective drinking according to the actual situation of the user. Prediction results are more accurate, giving users the most correct drinking advice and improving user experience.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Organic Chemistry (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Zoology (AREA)
  • Evolutionary Biology (AREA)
  • Wood Science & Technology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Epidemiology (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Public Health (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Disclosed is a method for constructing an alcohol tolerance prediction model, the method comprising: S1, acquiring relationships between alcohol consumption capacities and alcohol consumption amounts of respective samples, dividing the alcohol consumption amounts into a first preset number of alcohol consumption tiers, and establishing a first database according to the alcohol consumption tiers and the relationships between the alcohol consumption capacities and the alcohol consumption amounts of the samples; S2, acquiring genetic data of the samples and formatting the genetic data; and S3, selecting, according to the formatted genetic data of the samples and the first database, a machine learning model, and constructing a first alcohol consumption amount prediction model. The beneficial effects are as follows: an alcohol tolerance prediction model is constructed by means of machine learning, so as to provide a standard for determining the alcohol consumption amount and to quantify the alcohol consumption capacity of an individual, thereby facilitating effective prediction of alcohol tolerance levels of users.

Description

一种酒量预测模型的构建方法A method of constructing an alcohol volume prediction model 技术领域technical field
本发明涉及生物基因技术领域,特别涉及一种酒量预测模型的构建方法。The invention relates to the technical field of biological genes, in particular to a method for constructing an alcohol quantity prediction model.
背景技术Background technique
酒精进入人体后经口腔、食道、胃、肠等器官直接通过生物膜进入血液循环,迅速的被运输到全身各组织器官进行代谢利用。人体内有两种酶来进行酒精代谢:在乙醇脱氢酶催化下,乙醇被氧化成乙醛;乙醛经过乙醛脱氢酶转化为乙酸。在酒精代谢主要由两种酶(乙醇脱氢酶和乙醛脱氢酶)共同完成,个体之间的饮酒能力(酒量)差异主要由这两种酶的活性决定,而酶的多少活性由基因决定,归根结底人的酒量由基因决定。After alcohol enters the human body, it enters the blood circulation directly through the biofilm through the oral cavity, esophagus, stomach, intestine and other organs, and is quickly transported to various tissues and organs of the body for metabolism and utilization. There are two enzymes in the human body for alcohol metabolism: ethanol is oxidized to acetaldehyde under the catalysis of alcohol dehydrogenase; acetaldehyde is converted to acetic acid by acetaldehyde dehydrogenase. Alcohol metabolism is mainly completed by two enzymes (alcohol dehydrogenase and aldehyde dehydrogenase), and the difference in drinking ability (alcohol amount) between individuals is mainly determined by the activities of these two enzymes, and the activity of the enzymes is determined by the gene In the final analysis, the amount of alcohol a person can drink is determined by genetics.
酒作为部分人们生活中重要的饮品,衍生出各种酒文化,成为特定场合不可或缺的存在。但研究表明不是人人适宜饮酒,饮酒过度对身体的危害极大;且不同的人的饮酒能力也有较大区别,正确认知自我的酒精代谢能力,有一个健康的饮酒标准就非常重要了。而在现有技术中缺乏饮酒标准,不能有效的为用户的饮酒量进行预测。As an important drink in some people's lives, wine has derived various wine cultures and has become an indispensable existence in specific occasions. However, studies have shown that drinking is not suitable for everyone, and excessive drinking is extremely harmful to the body; and the drinking ability of different people is also quite different. It is very important to have a correct understanding of one's ability to metabolize alcohol and have a healthy drinking standard. However, there is a lack of drinking standards in the prior art, so it cannot effectively predict the drinking amount of a user.
发明内容SUMMARY OF THE INVENTION
本发明旨在至少一定程度上解决上述技术中的技术问题之一。为此,本发明的目的在于提出一种酒量预测模型的构建方法,通过机器学习构建一种酒量预测模型,通过收集样本的饮酒能力与饮酒量的关系,将样本的饮酒能力与饮酒量的关系、饮酒段位建立第一数据库,提高数据的准确性,收集样本的基因数据并进行基因数据格式化,将样本的基因数据与第一数据库中的数据进行相互匹配,通过机器学习模型构建第一饮酒量预测模型,提供饮酒量判断标准, 量化个体的饮酒能力,有利于对用户进行有效的酒量预测。The present invention aims to solve one of the technical problems in the above technologies at least to a certain extent. To this end, the purpose of the present invention is to propose a method for constructing an alcohol consumption prediction model, constructing an alcohol consumption prediction model through machine learning, and by collecting the relationship between the drinking ability and the drinking quantity of the sample, the relationship between the drinking ability and the drinking quantity of the sample is calculated. 、Drinking rank to establish the first database to improve the accuracy of the data, collect the genetic data of the sample and format the genetic data, match the genetic data of the sample with the data in the first database, and construct the first drinking through the machine learning model. The quantity prediction model provides drinking quantity judgment criteria, quantifies the drinking ability of individuals, and is conducive to effective alcohol consumption prediction for users.
为达到上述目的,本发明实施例提出了一种酒量预测模型的构建方法,包括:In order to achieve the above purpose, the embodiment of the present invention proposes a method for constructing an alcohol volume prediction model, including:
S1、获取样本的饮酒能力与饮酒量的关系,根据所述饮酒量分为第一预设数量个饮酒段位,并根据所述样本的饮酒能力与饮酒量的关系、饮酒段位建立第一数据库;S1. Obtain the relationship between the drinking ability and the drinking amount of the sample, divide the sample into a first preset number of drinking rank according to the drinking amount, and establish a first database according to the relationship between the drinking ability and the drinking amount of the sample, and the drinking rank;
S2、获取样本的基因数据并进行基因数据格式化;S2. Obtain the genetic data of the sample and format the genetic data;
S3、根据格式化后的样本的基因数据和所述第一数据库选用机器学习模型构建第一饮酒量预测模型。S3. Select a machine learning model to construct a first alcohol consumption prediction model according to the gene data of the formatted sample and the first database.
根据本发明提出的一种酒量预测模型的构建方法,通过问卷调查方法,获取样本的饮酒能力与饮酒量,并进行数据分析,得到饮酒能力与饮酒量的对应关系,将饮酒量分为第一预设数量个饮酒段位,建立第一数据库,对饮酒量划分饮酒等级,对饮酒能力进行具体量化,有利于给出更有价值的饮酒建议。获取样本的基因数据并进行基因数据格式化,根据格式化后的样本基因数据和第一数据库选用机器学习模型构建第一饮酒量预测模型,提供饮酒量判断标准,量化个体的饮酒能力,有利于对用户进行有效的酒量预测。According to the construction method of the alcohol consumption prediction model proposed by the present invention, the drinking ability and the drinking quantity of the sample are obtained through the questionnaire survey method, and the data is analyzed to obtain the corresponding relationship between the drinking ability and the drinking quantity, and the drinking quantity is divided into the first Presetting a number of drinking grades, establishing a first database, classifying drinking levels into drinking levels, and specifically quantifying drinking ability is conducive to giving more valuable drinking suggestions. Obtain the genetic data of the sample and format the genetic data. According to the formatted sample genetic data and the first database, a machine learning model is used to construct the first drinking volume prediction model, which provides the drinking volume judgment standard and quantifies the drinking ability of the individual, which is beneficial to Effective alcohol prediction for users.
根据本发明的一些实施例,所述获取样本的基因数据并进行基因数据格式化包括:According to some embodiments of the present invention, the acquiring genetic data of the sample and formatting the genetic data includes:
S21、采集样本的唾液;S21, collecting saliva from the sample;
S22、根据所述样本的唾液进行DNA提取,对提取DNA进行基因测序;S22, performing DNA extraction according to the saliva of the sample, and performing gene sequencing on the extracted DNA;
S23、对基因测序后的基因数据进行处理,得到每个样本饮酒量相关的基因位点的基因型;S23. Process the gene data after gene sequencing to obtain the genotype of the locus related to the alcohol consumption of each sample;
S24、将所述基因位点按照基因型格式化成数字。S24. Format the gene loci into numbers according to the genotype.
根据本发明的一些实施例,对所述格式化后的样本的基因数据进行基因位点筛选,包括:According to some embodiments of the present invention, genetic locus screening is performed on the genetic data of the formatted sample, including:
S241、分别计算每个基因位点的特征值对所述第一数据库进行划分后得到 的各数据子集与划分前的数据集的纯度提升值或不确定性降低值;S241, calculate the characteristic value of each gene locus respectively and each data subset obtained after the described first database is divided and the purity promotion value or the uncertainty reduction value of the data set before division;
S242、选取最大纯度提升值或最大不确定性降低值的基因位点N和所述基因位点N的特征值n,其中,将所述基因位点N作为节点,按照所述基因位点N的特征值n的分组将所述第一数据库拆分成两个子数据集;S242. Select the gene locus N with the maximum purity improvement value or the maximum uncertainty reduction value and the characteristic value n of the gene locus N, wherein the gene locus N is used as a node, according to the gene locus N The grouping of the eigenvalue n splits the first database into two sub-data sets;
S243、依次在两个子数据集中,计算各基因位点的特征值在子数据集中的纯度提升值或不确定性降低值;选取最大纯度提升值或最大不确定性降低值的基因位点M和所述基因位点M的特征值m,其中,将所述基因位点M作为子节点,按照所述基因位点M的特征值m的分组对子数据集再次拆分;S243. In the two sub-data sets in turn, calculate the purity improvement value or the uncertainty reduction value of the characteristic value of each gene locus in the sub-data set; select the gene locus M and the uncertainty reduction value with the largest purity improvement value or the largest uncertainty reduction value. The eigenvalue m of the gene locus M, wherein the gene locus M is used as a sub-node, and the sub-data set is split again according to the grouping of the eigenvalue m of the gene locus M;
S244、在确定划分后的子数据集的纯度大于预设纯度阈值或不确定性值小于预设不确定性阈值时,停止拆分,最终得到与饮酒量相关的基因位点及基因位点与饮酒段位的关系。S244. When it is determined that the purity of the divided sub-data set is greater than the preset purity threshold or the uncertainty value is less than the preset uncertainty threshold, stop the splitting, and finally obtain the gene locus related to the alcohol consumption and the gene locus and the gene locus The relationship between drinking grades.
根据本发明的一些实施例,所述与饮酒量相关的基因位点包括rs1229984基因位点和rs671基因位点,其中,rs1229984基因位点位于ADH1B基因上,rs1229984基因位点结果为TT型时,乙醇脱氢酶活性强,乙醇代谢快;结果为CT型时乙醇脱氢酶活性中等,乙醇代谢速度中等;结果为CC型时乙醇脱氢酶活性弱,乙醇代谢速度慢;rs671基因位点位于ALDH2基因上,rs671基因位点结果为GG型时乙醛脱氢酶活性强,乙醛代谢快;结果为GA\AA型时乙醛脱氢酶活性弱,乙醛代谢慢。According to some embodiments of the present invention, the loci related to alcohol consumption include the rs1229984 locus and the rs671 locus, wherein the rs1229984 locus is located on the ADH1B gene, and when the result of the rs1229984 locus is TT type, The activity of alcohol dehydrogenase is strong, and the metabolism of ethanol is fast; the result is CT type, the activity of alcohol dehydrogenase is moderate, and the rate of alcohol metabolism is moderate; the result is that the activity of alcohol dehydrogenase is weak, and the rate of alcohol metabolism is slow; the rs671 gene locus is located in In the ALDH2 gene, when the rs671 gene locus was GG type, the activity of acetaldehyde dehydrogenase was strong, and the acetaldehyde metabolism was fast; when the result was GA\AA type, the activity of acetaldehyde dehydrogenase was weak, and the metabolism of acetaldehyde was slow.
根据本发明的一些实施例,在构建第一饮酒量预测模型后,还包括:According to some embodiments of the present invention, after constructing the first alcohol consumption prediction model, the method further includes:
S4、将饮酒量重新分成第二预设数量的饮酒段位,根据所述样本的饮酒能力与饮酒量的关系、第二预设数量的饮酒段位建立第二数据库;S4. Re-divide the drinking amount into a second preset number of drinking grades, and establish a second database according to the relationship between the drinking ability of the sample and the drinking amount, and the second preset number of drinking grades;
S5、根据样本的与饮酒量相关的基因位点和所述第二数据库选用机器学习模型重新构建饮酒量预测模型,得到第二饮酒量预测模型。S5. Select a machine learning model to reconstruct a drinking amount prediction model according to the loci of the sample related to the drinking amount and the second database, and obtain a second drinking amount prediction model.
根据本发明的一些实施例,根据所述样本的唾液进行DNA提取,包括:According to some embodiments of the present invention, DNA extraction is performed according to the saliva of the sample, including:
S421、将0.2ml唾液置于离心管中,加0.01mol/L PBS溶液600μL,通过离心机10000*g高速离心3min;S421. Put 0.2 ml of saliva in a centrifuge tube, add 600 μL of 0.01 mol/L PBS solution, and centrifuge at high speed at 10000*g for 3 min in a centrifuge;
S422、取沉淀加入50ul,5mol/l的碘化钾溶液、75ul,0.9%的氯化钠溶液、120ul酚:氯仿(20:13)溶液,摇晃震荡1min,通过离心机10000*g高速离心3min;S422, get the precipitate and add 50ul, 5mol/l potassium iodide solution, 75ul, 0.9% sodium chloride solution, 120ul phenol:chloroform (20:13) solution, shake and shake for 1min, and centrifuge at high speed at 10000*g for 3min by centrifuge;
S423、取上清液80ul,加入80ul的异丙醇,摇晃震荡30s,通过离心机10000*g高速离心3min;S423, take 80 ul of the supernatant, add 80 ul of isopropanol, shake for 30 s, and centrifuge at a high speed of 10000*g for 3 min in a centrifuge;
S424、取沉淀,加入500μl无水乙醇洗涤,通过离心机10000*g高速离心3min;S424, taking the precipitate, adding 500 μl of absolute ethanol to wash, and centrifuging at 10000*g for 3 min in a centrifuge;
S425、取沉淀室温晾干,用TE缓冲液溶解。S425, take the precipitate to dry at room temperature, and dissolve it with TE buffer.
根据本发明的一些实施例,还包括:According to some embodiments of the present invention, it also includes:
S71、获取影响用户饮酒量的第二信息,所述第二信息包括:疾病史、饮酒种类、饮酒度数、饮酒频率;S71. Acquire second information that affects the drinking amount of the user, where the second information includes: disease history, drinking type, drinking degree, and drinking frequency;
S72、根据所述第二信息对第二饮酒量预测模型进行修正,得到第三饮酒量预测模型。S72. Modify the second alcohol consumption prediction model according to the second information to obtain a third alcohol consumption prediction model.
本发明的其它特征和优点将在随后的说明书中阐述,并且,部分地从说明书中变得显而易见,或者通过实施本发明而了解。本发明的目的和其他优点可通过在所写的说明书以及附图中所特别指出的结构来实现和获得。Other features and advantages of the present invention will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and drawings.
下面通过附图和实施例,对本发明的技术方案做进一步的详细描述。The technical solutions of the present invention will be further described in detail below through the accompanying drawings and embodiments.
附图说明Description of drawings
附图用来提供对本发明的进一步理解,并且构成说明书的一部分,与本发明的实施例一起用于解释本发明,并不构成对本发明的限制。在附图中:The accompanying drawings are used to provide a further understanding of the present invention, and constitute a part of the specification, and are used to explain the present invention together with the embodiments of the present invention, and do not constitute a limitation to the present invention. In the attached image:
图1是根据本发明一个实施例的一种酒量预测模型的构建方法的流程图;Fig. 1 is the flow chart of the construction method of a kind of alcohol quantity prediction model according to an embodiment of the present invention;
图2是根据本发明一个实施例的对样本的基因数据的处理的流程图;FIG. 2 is a flowchart of processing genetic data of a sample according to an embodiment of the present invention;
图3是根据本发明一个实施例的对于饮酒量相关的基因位点筛选的流程图;FIG. 3 is a flowchart of the screening of genetic loci related to alcohol consumption according to an embodiment of the present invention;
图4是根据本发明一个实施例的建立第二饮酒量预测模型的流程图;4 is a flowchart of establishing a second alcohol consumption prediction model according to an embodiment of the present invention;
图5是根据本发明一个实施例的对唾液进行DNA提取的流程图;5 is a flow chart of DNA extraction from saliva according to an embodiment of the present invention;
图6是根据本发明一个实施例的饮酒量相关基因与饮酒段位的决策树的示意图。FIG. 6 is a schematic diagram of a decision tree of drinking quantity-related genes and drinking rank according to an embodiment of the present invention.
具体实施方式detailed description
以下结合附图对本发明的优选实施例进行说明,应当理解,此处所描述的优选实施例仅用于说明和解释本发明,并不用于限定本发明。The preferred embodiments of the present invention will be described below with reference to the accompanying drawings. It should be understood that the preferred embodiments described herein are only used to illustrate and explain the present invention, but not to limit the present invention.
下面参考图1至图6来描述本发明实施例提出的一种酒量预测模型的构建方法。The following describes a method for constructing an alcohol volume prediction model proposed by an embodiment of the present invention with reference to FIG. 1 to FIG. 6 .
图1是根据本发明一个实施例的一种酒量预测模型的构建方法的流程图;如图1所示,本发明实施例提出了一种酒量预测模型的构建方法,包括:Fig. 1 is the flow chart of the construction method of a kind of alcohol quantity prediction model according to an embodiment of the present invention; As shown in Fig. 1, the embodiment of the present invention proposes a kind of construction method of alcohol quantity forecast model, including:
S1、获取样本的饮酒能力与饮酒量的关系,根据所述饮酒量分为第一预设数量个饮酒段位,并根据所述样本的饮酒能力与饮酒量的关系、饮酒段位建立第一数据库;S1. Obtain the relationship between the drinking ability and the drinking amount of the sample, divide the sample into a first preset number of drinking rank according to the drinking amount, and establish a first database according to the relationship between the drinking ability and the drinking amount of the sample, and the drinking rank;
S2、获取样本的基因数据并进行基因数据格式化;S2. Obtain the genetic data of the sample and format the genetic data;
S3、根据格式化后的样本的基因数据和所述第一数据库选用机器学习模型构建第一饮酒量预测模型。S3. Select a machine learning model to construct a first alcohol consumption prediction model according to the gene data of the formatted sample and the first database.
根据本发明提出的一种酒量预测模型的构建方法,通过问卷调查方法,获取样本的饮酒能力与饮酒量,并进行数据分析,得到饮酒能力与饮酒量的对应关系,将饮酒量分为第一预设数量个饮酒段位,建立第一数据库,对饮酒量划分饮酒等级,对饮酒能力进行具体量化,有利于给出更有价值的饮酒建议。获取样本的基因数据并进行基因数据格式化,根据格式化后的样本基因数据和第一数据库选用机器学习模型构建第一饮酒量预测模型,通过机器学习中决策树模型构建一种酒量预测模型,提供饮酒量判断标准,量化个体的饮酒能力,有利于对用户进行有效的酒量预测。机器学习模型包括:线性分类、线性回归、支持向量机(SVM)、决策树分类模型、朴素贝叶斯、随机森林、神经网络模型 中的至少一种。其中,决策树分类模型有易于理解和实现同时训练结果很容易推出相应的逻辑表达式,具有较好的可阅读性。According to the construction method of the alcohol consumption prediction model proposed by the present invention, the drinking ability and the drinking quantity of the sample are obtained through the questionnaire survey method, and the data is analyzed to obtain the corresponding relationship between the drinking ability and the drinking quantity, and the drinking quantity is divided into the first Presetting a number of drinking grades, establishing a first database, classifying drinking levels into drinking levels, and specifically quantifying drinking ability is conducive to giving more valuable drinking suggestions. Obtain the genetic data of the sample and format the genetic data, select a machine learning model to construct the first alcohol consumption prediction model according to the formatted sample genetic data and the first database, and construct an alcohol consumption prediction model through the decision tree model in the machine learning, It provides drinking quantity judgment criteria and quantifies the drinking ability of individuals, which is conducive to effective alcohol consumption prediction for users. The machine learning model includes: at least one of linear classification, linear regression, support vector machine (SVM), decision tree classification model, naive Bayes, random forest, and neural network model. Among them, the decision tree classification model is easy to understand and implement, and the training results are easy to deduce the corresponding logical expressions, which has good readability.
图2是根据本发明一个实施例的对样本的基因数据的处理的流程图;如图2所示,所述获取样本的基因数据并进行基因数据格式化包括:FIG. 2 is a flowchart of processing genetic data of a sample according to an embodiment of the present invention; as shown in FIG. 2 , the acquiring and formatting the genetic data of the sample includes:
S21、采集样本的唾液;S21, collecting saliva from the sample;
S22、根据所述样本的唾液进行DNA提取,对提取DNA进行基因测序;S22, performing DNA extraction according to the saliva of the sample, and performing gene sequencing on the extracted DNA;
S23、对基因测序后的基因数据进行处理,得到每个样本饮酒量相关的基因位点的基因型;S23. Process the gene data after gene sequencing to obtain the genotype of the locus related to the alcohol consumption of each sample;
S24、将所述基因位点按照基因型格式化成数字。S24. Format the gene loci into numbers according to the genotype.
上述技术方案的工作原理及有益效果:获取样本的基因数据通过对样本的唾液进行DNA提取、基因测序、基因分型;基因测序的方法包括:芯片测序、二代测序、三代测序、PCR测序、panel测序中的至少一种。最终得到每个样本饮酒量相关的基因位点的基因型,为了对基因位点对饮酒量的影响进行有效计算,将基因位点按照基因型格式化成数字。示例的,野生型为0、杂合突变型为1、纯合突变型为2。如在rs1229984基因位点上,CC为纯合突变型,格式化成数字为2;TT为野生型,格式化成数字为0;CT为杂合突变型,格式化成数字为1;如在rs671基因位点上,AA为纯合突变型,格式化成数字为2;GG为野生型,格式化成数字为0;AG为杂合突变型,格式化成数字为1。The working principle and beneficial effects of the above technical solutions: DNA extraction, gene sequencing, and genotyping are performed on the saliva of the sample to obtain the genetic data of the sample; the gene sequencing methods include: chip sequencing, second-generation sequencing, third-generation sequencing, PCR sequencing, At least one of panel sequencing. Finally, the genotypes of the loci related to the alcohol consumption of each sample are obtained. In order to effectively calculate the influence of the loci on the alcohol consumption, the loci are formatted into numbers according to the genotype. Illustratively, wild type is 0, heterozygous mutant is 1, and homozygous mutant is 2. For example, at the rs1229984 locus, CC is a homozygous mutant, formatted as a number 2; TT is a wild type, formatted as a number 0; CT is a heterozygous mutant, formatted as a number 1; such as at the rs671 locus On the dot, AA is a homozygous mutant, formatted as a number 2; GG is a wild type, formatted as a number 0; AG is a heterozygous mutant, formatted as a number 1.
在一实施例中,对格式化后的样本的基因数据进行基因位点筛选,包括:In one embodiment, gene locus screening is performed on the genetic data of the formatted sample, including:
S241、分别计算每个基因位点的特征值对所述第一数据库进行划分后得到的各数据子集与划分前的数据集的纯度提升值或不确定性降低值;S241, respectively calculating the characteristic value of each gene locus and dividing the first database to obtain the purity improvement value or uncertainty reduction value of each data subset and the data set before division;
S242、选取最大纯度提升值或最大不确定性降低值的基因位点N和所述基因位点N的特征值n,其中,将所述基因位点N作为节点,按照所述基因位点N的特征值n的分组将所述第一数据库拆分成两个子数据集;S242. Select the gene locus N with the maximum purity improvement value or the maximum uncertainty reduction value and the characteristic value n of the gene locus N, wherein the gene locus N is used as a node, according to the gene locus N The grouping of the eigenvalue n splits the first database into two sub-data sets;
S243、依次在两个子数据集中,计算各基因位点的特征值在子数据集中的纯度提升值或不确定性降低值;选取最大纯度提升值或最大不确定性降低值的 基因位点M和所述基因位点M的特征值m,其中,将所述基因位点M作为子节点,按照所述基因位点M的特征值m的分组对子数据集再次拆分;S243. In the two sub-data sets in turn, calculate the purity improvement value or the uncertainty reduction value of the characteristic value of each gene locus in the sub-data set; select the gene locus M and the uncertainty reduction value with the largest purity improvement value or the largest uncertainty reduction value. The eigenvalue m of the gene locus M, wherein the gene locus M is used as a sub-node, and the sub-data set is split again according to the grouping of the eigenvalue m of the gene locus M;
S244、在确定划分后的子数据集的纯度大于预设纯度阈值或不确定性值小于预设不确定性阈值时,停止拆分,最终得到与饮酒量相关的基因位点及基因位点与饮酒段位的关系。S244. When it is determined that the purity of the divided sub-data set is greater than the preset purity threshold or the uncertainty value is less than the preset uncertainty threshold, stop the splitting, and finally obtain the gene locus related to the alcohol consumption and the gene locus and the gene locus The relationship between drinking grades.
上述技术方案的工作原理及有益效果:度量划分数据集前后的数据集的纯度以及不确定性的方法包括计算信息增益、信息增益率、基尼系数中的至少一个参数,具体的在根据基尼系数确定纯度及不确定性的方法中,基尼系数越大,数据的不确定性越高,样本纯度越低,表示数据集中目标样本所占总样本的比例越小;基尼系数越小,数据的不确定性越低,样本纯度越高,表示数据集中目标样本所占总样本的比例越高;在基尼系数小于预设数值时,表示划分后的子数据集的纯度大于预设纯度阈值或不确定性值小于预设不确定性阈值时,停止拆分,最终得到与饮酒量相关的基因位点及基因位点与饮酒段位的关系。示例的,在基尼系数等于0时,数据集中的所有样本都是同一类别。The working principle and beneficial effect of the above technical solution: the method for measuring the purity and uncertainty of the data set before and after dividing the data set includes calculating at least one parameter in the information gain, the information gain rate, and the Gini coefficient. Specifically, the method is determined according to the Gini coefficient. In the method of purity and uncertainty, the larger the Gini coefficient, the higher the uncertainty of the data, and the lower the sample purity, indicating that the proportion of the target sample in the data set to the total sample is smaller; the smaller the Gini coefficient, the uncertainty of the data. The lower the stability, the higher the sample purity, which means that the proportion of the target sample in the total sample is higher; when the Gini coefficient is less than the preset value, it means that the purity of the divided sub-data set is greater than the preset purity threshold or uncertainty When the value is less than the preset uncertainty threshold, the splitting is stopped, and the gene locus related to the drinking amount and the relationship between the locus and the drinking rank are finally obtained. Exemplarily, when the Gini coefficient is equal to 0, all samples in the dataset are of the same class.
在一实施例中,如图6所示,判断样本的rs671基因位点结果是否为GG,即判断样本的rs671基因位点格式化是否为0,根据rs671基因位点是否为0,将第一数据库分成两个数据集,为第一数据集和第二数据集,其中,第一数据集中样本的rs671基因位点结果为GG,第二数据集中样本的rs671基因位点结果为AA、AG;在第一数据集和第二数据集中计算各基因位点的特征值的基尼系数,选取计算出的基尼系数最小的基因位点A和所述基因位点A的特征值a,其中,将所述基因位点A作为子节点,按照所述基因位点A的特征值a的分组将数据集再次进行拆分;示例的,在第一数据集中,判断样本rs1229984基因位点是否为CC或CT,在判断为False时,样本rs1229984基因位点为TT,即该分组中样本rs671基因位点结果为GG,样本rs1229984基因位点为TT,如表一所示,饮酒段位为8段。在确定每个分组都为同一类型的样本即基尼系数为0时,停止拆分,最终得到与饮酒量相关的基因位点及基因位点与饮酒段位 的关系。将与饮酒量相关的基因根据其基因类型,划分相对应的饮酒段位,方便记忆,且能准确的反应出基因类型与饮酒量的对应关系,一目了然,提高用户体验。In one embodiment, as shown in FIG. 6 , it is determined whether the result of the rs671 gene locus of the sample is GG, that is, it is determined whether the format of the rs671 gene locus of the sample is 0, and according to whether the rs671 gene locus is 0, the first The database is divided into two data sets, the first data set and the second data set, wherein the rs671 gene locus of the samples in the first data set is GG, and the rs671 gene locus results of the samples in the second data set are AA and AG; Calculate the Gini coefficient of the eigenvalues of each gene locus in the first data set and the second data set, and select the gene locus A with the smallest calculated Gini coefficient and the eigenvalue a of the gene locus A, wherein the all The gene locus A is used as a child node, and the data set is split again according to the grouping of the characteristic value a of the gene locus A; for example, in the first data set, it is judged whether the rs1229984 gene locus of the sample is CC or CT , when the judgment is False, the sample rs1229984 locus is TT, that is, the sample rs671 locus in this grouping is GG, and the sample rs1229984 locus is TT. As shown in Table 1, the drinking segment is 8 segments. When it is determined that each group is a sample of the same type, that is, the Gini coefficient is 0, the splitting is stopped, and the gene loci related to the amount of drinking and the relationship between the locus and the drinking rank are finally obtained. The genes related to alcohol consumption are divided into corresponding drinking grades according to their gene types, which is convenient for memory, and can accurately reflect the corresponding relationship between gene types and alcohol consumption, which is clear at a glance and improves user experience.
根据本发明的一些实施例,所述与饮酒量相关的基因位点包括rs1229984基因位点和rs671基因位点,其中,rs1229984基因位点位于ADH1B基因上,rs1229984基因位点结果为TT型时,乙醇脱氢酶活性强,乙醇代谢快;结果为CT型时乙醇脱氢酶活性中等,乙醇代谢速度中等;结果为CC型时乙醇脱氢酶活性弱,乙醇代谢速度慢;rs671基因位点位于ALDH2基因上,rs671基因位点结果为GG型时乙醛脱氢酶活性强,乙醛代谢快;结果为GA\AA型时乙醛脱氢酶活性弱,乙醛代谢慢。According to some embodiments of the present invention, the loci related to alcohol consumption include the rs1229984 locus and the rs671 locus, wherein the rs1229984 locus is located on the ADH1B gene, and when the result of the rs1229984 locus is TT type, The activity of alcohol dehydrogenase is strong, and the metabolism of ethanol is fast; the result is CT type, the activity of alcohol dehydrogenase is moderate, and the rate of alcohol metabolism is moderate; the result is that the activity of alcohol dehydrogenase is weak, and the rate of alcohol metabolism is slow; the rs671 gene locus is located in In the ALDH2 gene, when the rs671 gene locus was GG type, the activity of acetaldehyde dehydrogenase was strong, and the acetaldehyde metabolism was fast; when the result was GA\AA type, the activity of acetaldehyde dehydrogenase was weak, and the metabolism of acetaldehyde was slow.
在一实施例中,与饮酒量直接相关的基因位点还包括:rs6413413、rs698、rs2298755等基因位点;酒精依赖相关位点包括:rs2066702、rs55768019、rs1789891等基因位点;酒精使用障碍相关基因位点包括:rs4975012、rs7078436、rs3114045等基因位点;酒精中毒相关基因位点包括:rs9556711、rs2140418、rs8040009等基因位点;酒精敏感性相关基因位点包括:rs112834343、rs75536499、rs146298733等基因位点;饮酒反应相关基因位点包括rs143894582、rs200848948、rs397813807等基因位点;酒精依赖冲动型行为相关基因位点包括:rs34997829基因位点;可以理解的是,本领域技术人员可以根据与饮酒量相关的基因位点,按照相关的基因位点的基因型的不同排列组合,建立饮酒量预测模型。本领域技术人员通过对相关的位点的基因型的不同排列组合,建立饮酒量预测模型,是在本发明的保护范围内。In one embodiment, the genetic loci directly related to alcohol consumption further include: rs6413413, rs698, rs2298755 and other loci; alcohol dependence-related loci include: rs2066702, rs55768019, rs1789891 and other genetic loci; alcohol use disorder related genes The loci include: rs4975012, rs7078436, rs3114045 and other loci; alcoholism-related loci include: rs9556711, rs2140418, rs8040009 and other loci; alcohol sensitivity-related loci include: rs112834343, rs75536499, rs146298733 alleles The loci related to drinking response include rs143894582, rs200848948, rs397813807 and other loci; the loci related to alcohol-dependent impulsive behavior include: rs34997829 locus; Gene loci, according to the different permutations and combinations of the genotypes of the related loci, to establish a drinking quantity prediction model. It is within the protection scope of the present invention that a person skilled in the art establishes a drinking quantity prediction model by different arrangement and combination of genotypes of related loci.
在一实施例中,选用决策树模型构建饮酒量预测模型;In one embodiment, a decision tree model is used to construct a drinking quantity prediction model;
算法包括:Algorithms include:
使用Python进行编程调用Sklearn的DecisionTreeClassifier模块进行数据挖掘和构建饮酒量预测模型;Use Python to program and call the DecisionTreeClassifier module of Sklearn to perform data mining and build a drinking volume prediction model;
DecisionTreeClassifier模块主要参数设置:DecisionTreeClassifier module main parameter settings:
criterion='gini':选用基尼系数作为节点划分质量的度量标准;criterion='gini': select the Gini coefficient as the metric for the quality of node division;
splitter=’best’:在所有特征中找最好的切分点;splitter='best': find the best split point among all features;
max_depth=None:设置决策树的最大深度,None表示不对决策树的最大深度作约束,直到每个叶子节点上的样本均属于同一类;max_depth=None: Set the maximum depth of the decision tree, None means that the maximum depth of the decision tree is not constrained until the samples on each leaf node belong to the same class;
min_samples_split=2:当对一个内部节点划分时,要求该节点上的最小样本数为2;min_samples_split=2: When dividing an internal node, the minimum number of samples on the node is required to be 2;
min_samples_leaf=1:设置叶子节点上的最小样本数为1;min_samples_leaf=1: Set the minimum number of samples on the leaf node to 1;
最终得到rs1229984基因位点和rs671基因位点与饮酒量有关。Finally, the rs1229984 locus and the rs671 locus were found to be related to alcohol consumption.
图4是根据本发明一个实施例的建立第二饮酒量预测模型的流程图;如图4所示,在构建第一饮酒量预测模型后,还包括:Fig. 4 is a flow chart of establishing a second drinking amount prediction model according to an embodiment of the present invention; as shown in Fig. 4, after constructing the first drinking amount prediction model, it also includes:
S4、将饮酒量重新分成第二预设数量的饮酒段位,根据所述样本的饮酒能力与饮酒量的关系、第二预设数量的饮酒段位建立第二数据库;S4. Re-divide the drinking amount into a second preset number of drinking grades, and establish a second database according to the relationship between the drinking ability of the sample and the drinking amount, and the second preset number of drinking grades;
S5、根据样本的与饮酒量相关的基因位点和所述第二数据库选用机器学习模型重新构建饮酒量预测模型,得到第二饮酒量预测模型。S5. Select a machine learning model to reconstruct a drinking amount prediction model according to the loci of the sample related to the drinking amount and the second database, and obtain a second drinking amount prediction model.
上述技术方案的工作原理及有益效果:对第一饮酒量预测模型进行优化,将第一预设数量的饮酒段位转换成第二预设数量的饮酒段位,第二预设数量可以为9,根据样本的饮酒能力与饮酒量的关系、第二预设数量的饮酒段位建立第二数据库,在重新构建饮酒量预测模型时,只选用与饮酒量相关的基因位点,通过样本的与饮酒量相关的基因位点和所述第二数据库选用机器学习模型重新构建饮酒量预测模型,得到第二饮酒量预测模型,可以降低计算量及复杂度,第二饮酒量预测模型的预测准确性提高。得到rs1229984基因位点和rs671基因位点与饮酒量的关系如表一所示。The working principle and beneficial effects of the above technical solutions: optimizing the first drinking amount prediction model, converting the first preset number of drinking tiers into a second preset number of drinking tiers, and the second preset number may be 9, according to The relationship between the drinking ability of the sample and the amount of drinking, and the second preset number of drinking tiers are used to establish a second database. When rebuilding the drinking amount prediction model, only the loci related to the amount of drinking are selected, and the correlation between the amount of drinking and the amount of drinking is determined by the sample. The genetic locus and the second database select a machine learning model to reconstruct the drinking volume prediction model to obtain the second drinking volume prediction model, which can reduce the calculation amount and complexity, and improve the prediction accuracy of the second drinking volume prediction model. The relationship between the rs1229984 locus and the rs671 locus and alcohol consumption is shown in Table 1.
表一Table I
Figure PCTCN2021109450-appb-000001
Figure PCTCN2021109450-appb-000001
Figure PCTCN2021109450-appb-000002
Figure PCTCN2021109450-appb-000002
在一实施例中,第二预设数量为7,饮酒段位为7个段位,得到rs1229984基因位点和rs671基因位点与饮酒量的关系如表二所示。In an embodiment, the second preset number is 7, and the drinking rank is 7 ranks, and the relationship between the rs1229984 locus and the rs671 locus and the drinking amount is obtained as shown in Table 2.
表二Table II
Figure PCTCN2021109450-appb-000003
Figure PCTCN2021109450-appb-000003
上述技术方案的工作原理及有益效果:饮酒段位为0段时包括3种情形:1、rs1229984基因位点为CC,rs671基因位点为AA;2、rs1229984基因位点为TT、rs671基因位点为AA;3、rs1229984基因位点为CT,rs671基因位点为AA。饮酒段位的命名使用不连续的方式进行命名,如缺少3段及6段,该不连续方式命名可以将饮酒段位与饮酒量的具体酒量进行相匹配,示例的,饮酒段位为9段时,用户饮酒量为9两以上。The working principle and beneficial effects of the above technical solutions: when the drinking rank is 0, there are three situations: 1. The rs1229984 locus is CC, and the rs671 locus is AA; 2. The rs1229984 locus is TT and the rs671 locus 3. The rs1229984 locus is CT, and the rs671 locus is AA. The naming of drinking tiers is done in a discontinuous way. If 3 and 6 tiers are missing, the discontinuous naming can match the drinking tier with the specific amount of alcohol consumed. For example, when the drinking tier is 9, the user The amount of alcohol consumed is 9 taels or more.
在得到第二饮酒量预测模型后,根据用户的基因数据基于所述第二饮酒量预测模型对用户的饮酒量进行预测。可以提供饮酒量判断标准,量化个体的饮酒能力,根据用户的身体情况给出给出更直观有价值的酒量评价及饮酒建议,提高用户的体验。After the second drinking amount prediction model is obtained, the user's drinking amount is predicted based on the second drinking amount prediction model according to the user's genetic data. It can provide the judgment standard of drinking quantity, quantify the drinking ability of the individual, and give more intuitive and valuable drinking quantity evaluation and drinking advice according to the user's physical condition, so as to improve the user's experience.
图5是根据本发明一个实施例的对唾液进行DNA提取的流程图;如图5所示,对唾液进行DNA提取,包括:Figure 5 is a flow chart of DNA extraction from saliva according to an embodiment of the present invention; as shown in Figure 5, DNA extraction from saliva includes:
S421、将0.2ml唾液置于离心管中,加0.01mol/L PBS溶液600μL, 通过离心机10000*g高速离心3min;S421, put 0.2 ml of saliva in a centrifuge tube, add 600 μL of 0.01 mol/L PBS solution, and centrifuge at high speed at 10000*g for 3 min in a centrifuge;
S422、取沉淀加入50ul,5mol/l的碘化钾溶液、75ul,0.9%的氯化钠溶液、120ul酚:氯仿(20:13)溶液,摇晃震荡1min,通过离心机10000*g高速离心3min;S422, get the precipitate and add 50ul, 5mol/l potassium iodide solution, 75ul, 0.9% sodium chloride solution, 120ul phenol:chloroform (20:13) solution, shake and shake for 1min, and centrifuge at high speed at 10000*g for 3min by centrifuge;
S423、取上清液80ul,加入80ul的异丙醇,摇晃震荡30s,通过离心机10000*g高速离心3min;S423, take 80 ul of the supernatant, add 80 ul of isopropanol, shake for 30 s, and centrifuge at a high speed of 10000*g for 3 min in a centrifuge;
S424、取沉淀,加入500μl无水乙醇洗涤,通过离心机10000*g高速离心3min;S424, taking the precipitate, adding 500 μl of absolute ethanol to wash, and centrifuging at 10000*g for 3 min in a centrifuge;
S425、取沉淀室温晾干,用TE缓冲液溶解。S425, take the precipitate to dry at room temperature, and dissolve it with TE buffer.
上述技术方案的工作原理及有益效果:唾液中含有口腔脱落细胞,细胞里有遗传物质DNA,通过上述方案,可以通过唾液提取用户的DNA,以便进行相应的处理分析。The working principle and beneficial effects of the above technical scheme: saliva contains oral cavity exfoliated cells, and the cells contain genetic material DNA. Through the above scheme, the user's DNA can be extracted from the saliva for corresponding processing and analysis.
根据本发明的一些实施例,获取样本的基因数据还可以通过采集血液进行DNA提取。通过血液进行DNA提取,灵敏度高,提取的DNA数据更加准确。According to some embodiments of the present invention, the genetic data of the sample can also be obtained by collecting blood for DNA extraction. DNA extraction through blood has high sensitivity, and the extracted DNA data is more accurate.
在一实施例中,还包括:In one embodiment, it also includes:
S71、获取影响用户饮酒量的第二信息,所述第二信息包括:疾病史、饮酒种类、饮酒度数、饮酒频率;S71. Acquire second information that affects the drinking amount of the user, where the second information includes: disease history, drinking type, drinking degree, and drinking frequency;
S72、根据所述第二信息对第二饮酒量预测模型进行修正,得到第三饮酒量预测模型。S72. Modify the second alcohol consumption prediction model according to the second information to obtain a third alcohol consumption prediction model.
上述技术方案的工作原理及有益效果:第二饮酒量预测模型基于用户的基因数据输出的预测结果没有考虑该用户的实际情况,需要根据影响用户饮酒量的第二信息对饮酒量的预测进行修正,建立第三饮酒量预测模型,需要建立根据第二信息对饮酒量的修正机制。示例的,如表二所示,用户的基因数据为rs1229984基因位点的基因型为CC、rs671基因位点的基因型为GG,则对用户的饮酒量预测为7段,即用户能饮用7两以上的酒(以50°的白酒为例),但是用户最近在犯胃病,不能喝酒,喝酒容易引发胃穿孔,严重危害身体健康。 同样的,根据用户饮酒种类、饮酒度数、饮酒频率的不同也会影响对用户饮酒量的预测。根据用户的实际情况进行更加有效的饮酒量预测,使得第三饮酒量预测模型预测结果更加精准。The working principle and beneficial effects of the above technical solutions: the prediction result output by the second drinking volume prediction model based on the user's genetic data does not consider the actual situation of the user, and the prediction of drinking volume needs to be revised according to the second information affecting the drinking volume of the user , the establishment of a third alcohol consumption prediction model requires the establishment of a correction mechanism for alcohol consumption based on the second information. For example, as shown in Table 2, the genotype of the rs1229984 locus is CC and the genotype of the rs671 locus is GG, and the user's alcohol consumption is predicted to be 7 segments, that is, the user can drink 7 Two or more wines (take 50° liquor as an example), but the user has recently suffered from stomach problems and cannot drink alcohol. Drinking alcohol can easily cause gastric perforation and seriously endanger health. Similarly, according to the user's drinking type, drinking degree, and drinking frequency, the prediction of the user's drinking amount will also be affected. According to the actual situation of the user, a more effective alcohol consumption prediction is made, so that the prediction result of the third alcohol consumption prediction model is more accurate.
在一实施例中,根据所述第二信息对第二饮酒量预测模型进行修正的方法,包括:In one embodiment, the method for revising the second alcohol consumption prediction model according to the second information includes:
计算第二饮酒量预测模型给出的第一预测结果中的乙醇量:Calculate the amount of ethanol in the first prediction result given by the second alcohol consumption prediction model:
V 1=A×c V 1 =A×c
其中,A为第二饮酒量预测模型基于用户的基因数据输出的饮酒量(ml);c为第二饮酒量预测模型中预设酒精浓度(%vol);Wherein, A is the drinking volume (ml) output by the second drinking volume prediction model based on the user's genetic data; c is the preset alcohol concentration (%vol) in the second drinking volume prediction model;
计算根据第二信息用户能饮用的乙醇量:Calculate the amount of ethanol that the user can drink according to the second information:
V 2=V 1×d×t×f V 2 =V 1 ×d×t×f
其中,d为用户疾病史与饮酒量的相关系数;t为饮酒种类与饮酒量的相关系数;f为饮酒频率与饮酒量的相关系数;Among them, d is the correlation coefficient between the user's disease history and drinking amount; t is the correlation coefficient between drinking type and drinking amount; f is the correlation coefficient between drinking frequency and drinking amount;
第三饮酒量预测模型给出的第二预测结果的饮酒量:Alcohol consumption of the second prediction result given by the third alcohol consumption prediction model:
Figure PCTCN2021109450-appb-000004
Figure PCTCN2021109450-appb-000004
其中,c u为用户输入的饮酒度数。 Among them, cu is the drinking degree input by the user.
上述技术方案的工作原理及有益效果:用户在属于胃病患者、肝病患者、心脑血管疾病患者、孕妇、服用感冒药、安眠药、镇定药时,用户疾病史与饮酒量的相关系数d为0,即用户不能够饮酒;其他用户疾病史与饮酒量的相关系数d的取值在0-1之间;饮酒种类与饮酒量的相关系数t取值如表三所示;饮酒频率与饮酒量的相关系数f取值如表四所示;通过上述算法,根据所述第二信息对第二饮酒量预测模型进行修正,得到第三饮酒量预测模型,能根据用户的实际情况进行更加有效的饮酒量预测,预测结果更加精准,给出用户最正确的饮酒建议,提升用户体验。The working principle and beneficial effects of the above technical solutions: when the user belongs to patients with stomach disease, liver disease, cardiovascular and cerebrovascular diseases, pregnant women, taking cold medicine, sleeping pills, and tranquilizers, the correlation coefficient d between the user's disease history and the amount of alcohol consumption is 0, That is, the user cannot drink alcohol; the correlation coefficient d between the disease history of other users and the amount of drinking is between 0 and 1; the value of the correlation coefficient t between the type of drinking and the amount of drinking is shown in Table 3; The value of the correlation coefficient f is shown in Table 4; through the above algorithm, the second drinking volume prediction model is modified according to the second information to obtain the third drinking volume prediction model, which can be used for more effective drinking according to the actual situation of the user. Prediction results are more accurate, giving users the most correct drinking advice and improving user experience.
表三Table 3
饮酒种类type of drinking 相关系数tCorrelation coefficient t
白酒Liquor 11
啤酒beer 1.51.5
葡萄酒wine 1.81.8
表四Table 4
饮酒频率frequency of drinking 相关系数fCorrelation coefficient f
每天饮酒drink every day 0.30.3
三天一次饮酒Drink once every three days 0.60.6
7天一次饮酒Drink once in 7 days 0.80.8
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit and scope of the invention. Thus, provided that these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include these modifications and variations.

Claims (7)

  1. 一种酒量预测模型的构建方法,其特征在于,包括:A method for constructing an alcohol volume prediction model, comprising:
    S1、获取样本的饮酒能力与饮酒量的关系,根据所述饮酒量分为第一预设数量个饮酒段位,并根据所述样本的饮酒能力与饮酒量的关系、饮酒段位建立第一数据库;S1. Obtain the relationship between the drinking ability and the drinking amount of the sample, divide the sample into a first preset number of drinking rank according to the drinking amount, and establish a first database according to the relationship between the drinking ability and the drinking amount of the sample, and the drinking rank;
    S2、获取样本的基因数据并进行基因数据格式化;S2. Obtain the genetic data of the sample and format the genetic data;
    S3、根据格式化后的样本的基因数据和所述第一数据库选用机器学习模型构建第一饮酒量预测模型。S3. Select a machine learning model to construct a first alcohol consumption prediction model according to the formatted gene data of the sample and the first database.
  2. 如权利要求1所述的酒量预测模型的构建方法,其特征在于,所述获取样本的基因数据并进行基因数据格式化包括:The method for constructing an alcohol volume prediction model according to claim 1, wherein the obtaining the genetic data of the sample and formatting the genetic data comprises:
    S21、采集样本的唾液;S21, collecting saliva from the sample;
    S22、根据所述样本的唾液进行DNA提取,对提取DNA进行基因测序;S22, performing DNA extraction according to the saliva of the sample, and performing gene sequencing on the extracted DNA;
    S23、对基因测序后的基因数据进行处理,得到每个样本饮酒量相关的基因位点的基因型;S23. Process the gene data after gene sequencing to obtain the genotype of the locus related to the alcohol consumption of each sample;
    S24、将所述基因位点按照基因型格式化成数字。S24. Format the gene locus into a number according to the genotype.
  3. 如权利要求2所述的酒量预测模型的构建方法,其特征在于,The construction method of alcohol quantity prediction model as claimed in claim 2, is characterized in that,
    对所述格式化后的样本的基因数据进行基因位点筛选,包括:Perform gene locus screening on the genetic data of the formatted sample, including:
    S241、分别计算每个基因位点的特征值对所述第一数据库进行划分后得到的各数据子集与划分前的数据集的纯度提升值或不确定性降低值;S241, respectively calculating the characteristic value of each gene locus and dividing the first database to obtain the purity improvement value or uncertainty reduction value of each data subset and the data set before division;
    S242、选取最大纯度提升值或最大不确定性降低值的基因位点N和所述基因位点N的特征值n,其中,将所述基因位点N作为节点,按照所述基因位点N的特征值n的分组将所述第一数据库拆分成两个子数据集;S242. Select the gene locus N with the maximum purity improvement value or the maximum uncertainty reduction value and the characteristic value n of the gene locus N, wherein the gene locus N is used as a node, according to the gene locus N The grouping of the eigenvalue n splits the first database into two sub-data sets;
    S243、依次在两个子数据集中,计算各基因位点的特征值在子数据集中的纯度提升值或不确定性降低值;选取最大纯度提升值或最大不确定性降低值的基因位点M和所述基因位点M的特征值m,其中,将所述基因位点M作为子节点,按照所述基因位点M的特征值m的分组对子数据集再次拆分;S243. In the two sub-data sets in turn, calculate the purity improvement value or the uncertainty reduction value of the characteristic value of each gene locus in the sub-data set; select the gene locus M and the uncertainty reduction value with the largest purity improvement value or the largest uncertainty reduction value. The eigenvalue m of the gene locus M, wherein the gene locus M is used as a sub-node, and the sub-data set is split again according to the grouping of the eigenvalue m of the gene locus M;
    S244、在确定划分后的子数据集的纯度大于预设纯度阈值或不确定性值小于预设不确定性阈值时,停止拆分,最终得到与饮酒量相关的基因位点及基因位点与饮酒段位的关系。S244. When it is determined that the purity of the divided sub-data set is greater than the preset purity threshold or the uncertainty value is less than the preset uncertainty threshold, stop the splitting, and finally obtain the gene locus related to the alcohol consumption and the gene locus and the gene locus The relationship between drinking grades.
  4. 如权利要求3所述的酒量预测模型的构建方法,其特征在于,所述与饮酒量相关的基因位点包括rs1229984基因位点和rs671基因位点,其中,rs1229984基因位点位于ADH1B基因上,rs1229984基因位点结果为TT型时,乙醇脱氢酶活性强,乙醇代谢快;结果为CT型时乙醇脱氢酶活性中等,乙醇代谢速度中等;结果为CC型时乙醇脱氢酶活性弱,乙醇代谢速度慢;rs671基因位点位于ALDH2基因上,rs671基因位点结果为GG型时乙醛脱氢酶活性强,乙醛代谢快;结果为GA\AA型时乙醛脱氢酶活性弱,乙醛代谢慢。The method for constructing an alcohol consumption prediction model according to claim 3, wherein the genetic locus related to alcohol consumption comprises the rs1229984 genetic locus and the rs671 genetic locus, wherein the rs1229984 genetic locus is located on the ADH1B gene, When the rs1229984 gene locus is TT type, the activity of alcohol dehydrogenase is strong and the alcohol metabolism is fast; when the result is CT type, the activity of alcohol dehydrogenase is moderate, and the rate of alcohol metabolism is moderate; when the result is CC type, the activity of alcohol dehydrogenase is weak, Alcohol metabolism is slow; the rs671 gene locus is located on the ALDH2 gene. When the rs671 gene locus is GG type, the activity of acetaldehyde dehydrogenase is strong, and acetaldehyde metabolism is fast; when the result is GA\AA type, the activity of acetaldehyde dehydrogenase is weak. , Acetaldehyde metabolism is slow.
  5. 如权利要求4所述的酒量预测模型的构建方法,其特征在于,在构建第一饮酒量预测模型后,还包括:The method for constructing an alcohol consumption prediction model as claimed in claim 4, characterized in that, after constructing the first alcohol consumption prediction model, the method further comprises:
    S4、将饮酒量重新分成第二预设数量的饮酒段位,根据所述样本的饮酒能力与饮酒量的关系、第二预设数量的饮酒段位建立第二数据库;S4. Re-divide the drinking amount into a second preset number of drinking grades, and establish a second database according to the relationship between the drinking ability of the sample and the drinking amount, and the second preset number of drinking grades;
    S5、根据样本的与饮酒量相关的基因位点和所述第二数据库选用机器学习模型重新构建饮酒量预测模型,得到第二饮酒量预测模型。S5. Select a machine learning model to reconstruct the alcohol consumption prediction model according to the genetic locus related to the alcohol consumption of the sample and the second database, and obtain a second alcohol consumption prediction model.
  6. 如权利要求2所述的酒量预测模型的构建方法,其特征在于,根据所述样本的唾液进行DNA提取,包括:The method for constructing an alcohol quantity prediction model according to claim 2, wherein DNA extraction is performed according to the saliva of the sample, comprising:
    S421、将0.2ml唾液置于离心管中,加0.01mol/L PBS溶液600μL,通过离心机10000*g高速离心3min;S421, put 0.2ml of saliva in a centrifuge tube, add 600μL of 0.01mol/L PBS solution, and centrifuge at high speed at 10000*g for 3min through a centrifuge;
    S422、取沉淀加入50ul,5mol/l的碘化钾溶液、75ul,0.9%的氯化钠溶液、120ul酚:氯仿(20:13)溶液,摇晃震荡1min,通过离心机10000*g高速离心3min;S422, get the precipitate and add 50ul, 5mol/l potassium iodide solution, 75ul, 0.9% sodium chloride solution, 120ul phenol:chloroform (20:13) solution, shake and shake for 1min, and centrifuge at high speed at 10000*g for 3min by centrifuge;
    S423、取上清液80ul,加入80ul的异丙醇,摇晃震荡30s,通过离心机10000*g高速离心3min;S423, take 80ul of the supernatant, add 80ul of isopropanol, shake and shake for 30s, and centrifuge at a high speed of 10000*g for 3min by a centrifuge;
    S424、取沉淀,加入500μl无水乙醇洗涤,通过离心机10000*g高速离心3min;S424, taking the precipitate, adding 500 μl of absolute ethanol to wash, and centrifuging at 10000*g for 3 min in a centrifuge;
    S425、取沉淀室温晾干,用TE缓冲液溶解。S425, take the precipitate to dry at room temperature, and dissolve it with TE buffer.
  7. 如权利要求5所述的酒量预测模型的构建方法,其特征在于,还包括:The construction method of alcohol quantity prediction model as claimed in claim 5, is characterized in that, also comprises:
    S71、获取影响用户饮酒量的第二信息,所述第二信息包括:疾病史、饮酒种类、饮酒度数、饮酒频率;S71. Acquire second information that affects the drinking amount of the user, where the second information includes: disease history, drinking type, drinking degree, and drinking frequency;
    S72、根据所述第二信息对第二饮酒量预测模型进行修正,得到第三饮酒量预测模型。S72. Amend the second alcohol consumption prediction model according to the second information to obtain a third alcohol consumption prediction model.
PCT/CN2021/109450 2020-07-30 2021-07-30 Method for constructing alcohol tolerance prediction model WO2022022663A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010748369.6 2020-07-30
CN202010748369.6A CN112002375B (en) 2020-07-30 2020-07-30 Construction method of alcohol capacity prediction model

Publications (1)

Publication Number Publication Date
WO2022022663A1 true WO2022022663A1 (en) 2022-02-03

Family

ID=73462482

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/109450 WO2022022663A1 (en) 2020-07-30 2021-07-30 Method for constructing alcohol tolerance prediction model

Country Status (2)

Country Link
CN (1) CN112002375B (en)
WO (1) WO2022022663A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116735811A (en) * 2023-08-14 2023-09-12 山东百脉泉酒业股份有限公司 Method and system for measuring total acid and total ester content of wine

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112102920B (en) * 2020-07-30 2023-11-10 苏州因顿医学检验实验室有限公司 Drinking volume prediction system based on gene screening
CN112002375B (en) * 2020-07-30 2022-10-14 苏州因顿医学检验实验室有限公司 Construction method of alcohol capacity prediction model
CN114908146A (en) * 2022-05-31 2022-08-16 因顿健康科技(苏州)有限公司 Method for rapidly detecting and judging alcohol content by gene

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101109705A (en) * 2006-07-21 2008-01-23 上海主健生物工程有限公司 Reagent kit for detecting alcoholism and alcohol addiction susceptibility
CN106319052A (en) * 2016-08-22 2017-01-11 广东药科大学 Method for detecting gene type of polymorphic site rs671 of gene ALDH2 (Acetaldehyde Dehydrogenase 2) and kit
KR101925096B1 (en) * 2018-05-02 2018-12-04 아미코젠주식회사 A manufacturing method of hangover-eliminating enzyme powder and a composition for relieving hangover comprising thereof
CN111312396A (en) * 2020-02-21 2020-06-19 光瀚健康咨询管理(上海)有限公司 Individual wine capacity evaluation method and system based on wine capacity related factors
CN112002375A (en) * 2020-07-30 2020-11-27 苏州因顿医学检验实验室有限公司 Construction method of alcohol capacity prediction model

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101210266A (en) * 2006-12-30 2008-07-02 苏州市长三角系统生物交叉科学研究院有限公司 Measuring method for relativity of interaction and genetic character between genome genetic markers
US8697361B2 (en) * 2008-02-28 2014-04-15 University Of Virginia Patent Foundation Serotonin transporter gene and treatment of alcoholism

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101109705A (en) * 2006-07-21 2008-01-23 上海主健生物工程有限公司 Reagent kit for detecting alcoholism and alcohol addiction susceptibility
CN106319052A (en) * 2016-08-22 2017-01-11 广东药科大学 Method for detecting gene type of polymorphic site rs671 of gene ALDH2 (Acetaldehyde Dehydrogenase 2) and kit
KR101925096B1 (en) * 2018-05-02 2018-12-04 아미코젠주식회사 A manufacturing method of hangover-eliminating enzyme powder and a composition for relieving hangover comprising thereof
CN111312396A (en) * 2020-02-21 2020-06-19 光瀚健康咨询管理(上海)有限公司 Individual wine capacity evaluation method and system based on wine capacity related factors
CN112002375A (en) * 2020-07-30 2020-11-27 苏州因顿医学检验实验室有限公司 Construction method of alcohol capacity prediction model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116735811A (en) * 2023-08-14 2023-09-12 山东百脉泉酒业股份有限公司 Method and system for measuring total acid and total ester content of wine
CN116735811B (en) * 2023-08-14 2023-10-10 山东百脉泉酒业股份有限公司 Method and system for measuring total acid and total ester content of wine

Also Published As

Publication number Publication date
CN112002375B (en) 2022-10-14
CN112002375A (en) 2020-11-27

Similar Documents

Publication Publication Date Title
WO2022022663A1 (en) Method for constructing alcohol tolerance prediction model
WO2022022665A1 (en) Method for predicting alcohol consumption amount on the basis of gene screening
Speed et al. Evaluating and improving heritability models using summary statistics
Yuan et al. The development an artificial intelligence algorithm for early sepsis diagnosis in the intensive care unit
Terao et al. Two susceptibility loci to Takayasu arteritis reveal a synergistic role of the IL12B and HLA-B regions in a Japanese population
Fergusson et al. Life stress, 5-HTTLPR and mental disorder: findings from a 30-year longitudinal study
Arnold et al. Multiple imputation of baseline data in the cardiovascular health study
WO2022022667A1 (en) Gene screening-based alcohol tolerance prediction system
CN106636398A (en) Improved Alzheimer disease onset risk prediction method
CN105296590A (en) Colorectal cancer marker and application thereof
WO2022170909A1 (en) Drug sensitivity prediction method, electronic device and computer-readable storage medium
Rossi et al. Variable costs of ICU patients: a multicenter prospective study
CN108345768B (en) Method for determining maturity of intestinal flora of infants and marker combination
CN113012761B (en) Method and device for constructing stroke polygene genetic risk comprehensive score and application
Lin et al. Association between comorbid diabetes mellitus and prognosis of patients with sepsis in the intensive care unit: a retrospective cohort study
Liang et al. The association between lactate dehydrogenase to serum albumin ratio and the 28-day mortality in patients with sepsis-associated acute kidney injury in intensive care: a retrospective cohort study
JP2024507978A (en) Polygenic genetic risk score and onset risk assessment device for stroke and its use
van der Steen et al. Dementia, lower respiratory tract infection, and long-term mortality
Pallesen et al. Evidence for the Scarr–Rowe effect on genetic expressivity in a large US sample
CN116525116A (en) Real-time risk early warning and monitoring system, equipment and storable medium for cardiogenic shock
CN117409963A (en) Premature infant feeding intolerance risk prediction method and system
CN113506594B (en) Construction method, device and application of polygene genetic risk comprehensive score of coronary heart disease
O’Hara et al. Long-term functional outcomes at 1-year after hospital discharge in critically ill neonates with congenital diaphragmatic hernia
CN113643753A (en) Coronary heart disease polygene genetic risk scoring and combined clinical risk assessment application
Thomas et al. Alcohol metabolizing polygenic risk for alcohol consumption in European American college students

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21850353

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21850353

Country of ref document: EP

Kind code of ref document: A1