CN106960133B - Disease prediction method and device - Google Patents

Disease prediction method and device Download PDF

Info

Publication number
CN106960133B
CN106960133B CN201710371749.0A CN201710371749A CN106960133B CN 106960133 B CN106960133 B CN 106960133B CN 201710371749 A CN201710371749 A CN 201710371749A CN 106960133 B CN106960133 B CN 106960133B
Authority
CN
China
Prior art keywords
predicted
disease
probability
user
population
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710371749.0A
Other languages
Chinese (zh)
Other versions
CN106960133A (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Vishuo Medical Data Technology Beijing Co ltd
Original Assignee
Vishuo Medical Data Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vishuo Medical Data Technology Beijing Co ltd filed Critical Vishuo Medical Data Technology Beijing Co ltd
Priority to CN201710371749.0A priority Critical patent/CN106960133B/en
Publication of CN106960133A publication Critical patent/CN106960133A/en
Application granted granted Critical
Publication of CN106960133B publication Critical patent/CN106960133B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Public Health (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • Theoretical Computer Science (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Genetics & Genomics (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Chemical & Material Sciences (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a disease prediction method and a device, wherein the method comprises the following steps: after a disease prediction request sent by a terminal is received, acquiring a gene sequencing result of a user to be predicted, wherein the disease prediction request carries an identification of a disease to be predicted; determining the variation site information of a user to be predicted according to a gene sequencing result; calculating the occurrence probability of the mutation sites in the disease population to be predicted, the occurrence probability in the random population and the incidence probability of the disease to be predicted; and predicting the probability of the user to be predicted suffering from the disease to be predicted according to the occurrence probability of the mutation sites in the disease population to be predicted, the occurrence probability in the random population and the incidence probability of the disease to be predicted, and sending the probability to the terminal. According to the invention, the probability of the user suffering from the disease to be predicted can be obtained, the prediction result is a specific probability value, the reference is high, the manual mode is avoided, and the accuracy and the efficiency are high.

Description

Disease prediction method and device
Technical Field
The invention relates to the technical field of biological information and communication, in particular to a disease prediction method and device.
Background
Genetic variation refers to sudden heritable variation of genomic Deoxyribonucleic acid (DNA) molecules. At present, researches show that the pathogenesis of common and multiple diseases such as tumor, hypertension, coronary heart disease, diabetes, cardiovascular disease, osteoceuropathies and the like is related to the genetic variation of patients. The development of such diseases involves alterations in the structure or regulation of expression of more than two genes.
Based on the above findings, people have begun to predict the onset of diseases according to genetic variation, but in the prior art, in the prediction of diseases, documents or databases are generally searched manually to determine variation sites which may have harmful effects on the diseases, and whether patients may suffer from the diseases is predicted according to the gene sequencing results of the patients.
However, in the prior art, when a disease is predicted, a mutation site which has a harmful effect on the disease is found manually, so that the efficiency and the accuracy are low, and in the prior art, only whether the mutation site carried by a patient possibly causes the patient to suffer from some diseases or not can be predicted, only a qualitative prediction result can be obtained, and the reference is poor.
Disclosure of Invention
In view of the above, an object of the embodiments of the present invention is to provide a disease prediction method and apparatus, so as to solve or try to alleviate the above technical problems.
In a first aspect, an embodiment of the present invention provides a disease prediction method, where the method includes:
after a disease prediction request sent by a terminal is received, acquiring a gene sequencing result of a user to be predicted, wherein the disease prediction request carries an identification of a disease to be predicted;
determining the variation site information of the user to be predicted according to the gene sequencing result;
determining the occurrence probability of the variant locus in the disease population to be predicted, the occurrence probability of the variant locus in the random population and the incidence probability of the disease to be predicted;
and predicting the probability that the user to be predicted has the disease to be predicted according to the occurrence probability of the mutation sites in the disease population to be predicted, the occurrence probability of the mutation sites in the random population and the incidence probability of the disease to be predicted, and sending the probability that the user to be predicted has the disease to be predicted to the terminal.
With reference to the first aspect, an embodiment of the present invention provides a first possible implementation manner of the first aspect, where the determining, according to the gene sequencing result, variation site information of the user to be predicted includes:
comparing the gene sequencing result with a reference genome to obtain a comparison result;
and determining the genetic variation information of the user according to the comparison result.
With reference to the first aspect, the present invention provides a second possible implementation manner of the first aspect, wherein the determining the occurrence probability of the mutation site in the population with the disease to be predicted includes:
counting the number of people with the disease to be predicted in a pre-established database and the number of people with the mutation sites in the people with the disease to be predicted;
calculating a first ratio of the number of people with the variant sites in the people with the diseases to be predicted to the number of people with the diseases to be predicted;
and determining the first ratio as the occurrence probability of the variation site in the population with the disease to be predicted.
With reference to the first aspect, an embodiment of the present invention provides a third possible implementation manner of the first aspect, where the predicting, according to the occurrence probability of the mutation site in the population with the disease to be predicted, the occurrence probability of the mutation site in the random population, and the incidence probability of the disease to be predicted, the probability that the user with the disease to be predicted has the disease to be predicted includes:
calculating the product of the occurrence probability of the mutation site in the disease population to be predicted and the occurrence probability of the disease to be predicted;
and calculating a second ratio of the product to the occurrence probability of the mutation site in the random population, and determining the second ratio as the probability that the user to be predicted has the disease to be predicted.
With reference to the first aspect, an embodiment of the present invention provides a fourth possible implementation manner of the first aspect, where after the sending, to the terminal, the probability that the user to be predicted has the disease to be predicted, the method further includes:
receiving a suggestion acquisition request sent by the user to be predicted, wherein the suggestion acquisition request carries the probability that the user to be predicted has the disease to be predicted;
acquiring suggestion information corresponding to the probability that the user to be predicted has the disease to be predicted from a third-party server associated with the disease to be predicted;
and sending the suggestion information to the terminal.
In a second aspect, an embodiment of the present invention provides a disease prediction apparatus, where the apparatus includes:
the system comprises a first obtaining module, a second obtaining module and a third obtaining module, wherein the first obtaining module is used for obtaining a gene sequencing result of a user to be predicted after receiving a disease prediction request sent by a terminal, and the disease prediction request carries an identification of a disease to be predicted;
the first determination module is used for determining the mutation site information of the user to be predicted according to the gene sequencing result;
a second determination module, configured to determine occurrence probabilities of the variant loci in the population with the disease to be predicted, occurrence probabilities of the variant loci in the random population, and occurrence probabilities of the disease to be predicted;
the prediction module is used for predicting the probability that the user to be predicted has the disease to be predicted according to the occurrence probability of the mutation sites in the disease population to be predicted, the occurrence probability of the mutation sites in the random population and the incidence probability of the disease to be predicted;
and the sending module is used for sending the probability that the user to be predicted has the disease to be predicted to the terminal.
With reference to the second aspect, an embodiment of the present invention provides a first possible implementation manner of the second aspect, where the first determining module includes:
the comparison unit is used for comparing the gene sequencing result with a reference genome to obtain a comparison result;
and the first determining unit is used for determining the genetic variation information of the user according to the comparison result.
With reference to the second aspect, an embodiment of the present invention provides a second possible implementation manner of the second aspect, where the second determining module includes:
the statistical unit is used for counting the number of people suffering from the disease to be predicted in a pre-established database and the number of people carrying the variation sites in the people suffering from the disease to be predicted;
the first calculation unit is used for calculating a first ratio of the number of people with the variant loci in the people with the diseases to be predicted to the number of the people with the diseases to be predicted;
and the second determining unit is used for determining the first ratio as the occurrence probability of the mutation site in the disease population to be predicted.
With reference to the second aspect, an embodiment of the present invention provides a third possible implementation manner of the second aspect, where the prediction module includes:
the second calculation unit is used for calculating the product of the occurrence probability of the mutation site in the disease population to be predicted and the incidence probability of the disease to be predicted;
and the third calculating unit is used for calculating a second ratio of the product to the occurrence probability of the mutation site in the random population, and determining the second ratio as the probability that the user to be predicted has the disease to be predicted.
With reference to the second aspect, an embodiment of the present invention provides a fourth possible implementation manner of the second aspect, where the apparatus further includes:
a receiving module, configured to receive a suggestion acquisition request sent by the user to be predicted, where the suggestion acquisition request carries a probability that the user to be predicted has the disease to be predicted;
the second obtaining module is used for obtaining suggestion information corresponding to the probability that the user to be predicted has the disease to be predicted from a third-party server associated with the disease to be predicted;
the sending module is further configured to send the recommendation information to the terminal.
According to the disease prediction method and device provided by the embodiment of the invention, the probability of the disease to be predicted of the user can be obtained, the prediction result is a specific probability value, the reference is high, the manual mode is avoided, and the accuracy and the efficiency are high.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
FIG. 1 shows a flow chart of a disease prediction method provided by an embodiment of the invention;
FIG. 2 is a flow chart illustrating the determination of the occurrence probability of mutation sites in a disease population to be predicted in the disease prediction method according to the embodiment of the present invention;
fig. 3 illustrates a probability that a user to be predicted has a disease to be predicted in the disease prediction method provided by the embodiment of the present invention;
fig. 4 is a schematic structural diagram of a disease prediction apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
Considering that in the prior art, when the disease is predicted, most of the mutation sites which have harmful influence on the disease are searched in a manual mode, the efficiency and the accuracy are low, and in the prior art, only whether the mutation sites carried by the patient possibly cause the patient to suffer from certain diseases can be predicted, only qualitative prediction results can be obtained, and the reference is poor. Based on this, embodiments of the present invention provide a disease prediction method and apparatus, which are described below by way of examples.
Referring to fig. 1, an embodiment of the present invention provides a disease prediction method, which includes steps S110 to S140 as follows:
s110: after a disease prediction request sent by a terminal is received, acquiring a gene sequencing result of a user to be predicted, wherein the disease prediction request carries an identification of a disease to be predicted;
s120: determining the variation site information of the user to be predicted according to the gene sequencing result;
s130: determining the occurrence probability of the mutation sites in a disease population to be predicted, the occurrence probability of the mutation sites in a random population and the occurrence probability of the disease to be predicted;
s140: and predicting the probability of the user to be predicted to suffer from the disease to be predicted according to the occurrence probability of the mutation sites in the disease population to be predicted, the occurrence probability of the mutation sites in the random population and the incidence probability of the disease to be predicted, and sending the probability of the user to be predicted to suffer from the disease to be predicted to the terminal.
The execution subject of the method provided by the embodiment of the invention is the server.
The terminal can be a computer, a mobile phone or a tablet computer.
In the embodiment of the invention, when medical workers need to predict the probability that a user has a certain disease, a disease prediction request needs to be sent to a server through a terminal.
In step S110, the disease prediction request carries an identifier of a disease to be predicted and personal information of a user to be predicted, where the personal information includes information such as a name, a gender, and an identification number of the user.
The mark of the disease to be predicted can be the name of the disease to be predicted, such as liver cancer, coronary heart disease, diabetes, and the like.
Wherein the gene sequencing result comprises a plurality of gene sequences.
In step S110, a gene sequencing result of the user to be predicted is obtained, which includes the following two cases:
in the first case of the process, the first,
and storing the gene sequencing result of the user to be predicted in the server, and searching the gene sequencing result of the user to be predicted according to the personal information of the user to be predicted.
In the second case of the present invention, the first case,
the server is connected with the gene sequencing device, and the gene sequencing result of the user to be predicted is obtained from the gene sequencing device.
The gene sequencing device may be a gene sequencer.
In step S120, the mutation site information of the user to be predicted may be determined according to the gene sequencing result by the following steps:
comparing the gene sequencing result with a reference genome to obtain a comparison result; and determining the variation site information of the user to be predicted according to the comparison result.
After a gene sequencing result of a user is obtained, firstly, simply processing the gene sequencing result, specifically removing a joint sequence and low-quality reading of a gene sequence in the gene sequencing result to obtain a series of gene short sequences, comparing the obtained gene short sequences with a reference genome to determine the position of the gene short sequences on the genome, and assembling the gene short sequences into a complete human genome; wherein, the Alignment of the short sequence and the reference genome is realized by BWA (Burrows-Wheeler Alignment tool) software.
After the complete human genome is assembled, the sequence of the short sequences of the genes in the human genome is adjusted by using Samtools software, and data format conversion is carried out, namely, the original Sam format is converted into a bam file.
Next, redundant information and noise in the data obtained above are removed using the Picard software.
Then, the GATK software is used for searching the data for the difference between the data and the reference sequence, and finally, the mutation site information is annotated to generate the mutation site information comprising a chromosome where the mutation is located, the initial physical position of the mutation on the chromosome, the termination physical position of the mutation on the chromosome, the reference sequence information and a list of observed sequence information.
Thus, the mutation site information includes the chromosome where the mutation is located and the position of the mutation on the chromosome, including the initial physical location of the mutation on the chromosome and the terminal physical location of the mutation on the chromosome.
In step S130, determining the occurrence probability of the mutation site in the disease population to be predicted is performed through steps S210-S230, which are shown in fig. 2 and specifically described as follows:
s210, counting the number of people with diseases to be predicted in a pre-established database and the number of people with variant sites in the people with the diseases to be predicted;
s220, calculating a first ratio of the number of people with the variant sites in the people with the diseases to be predicted to the number of people with the diseases to be predicted;
and S230, determining the first ratio as the occurrence probability of the mutation site in the disease population to be predicted.
In the embodiment of the present invention, the database is established in advance, and specifically established by the following steps: collecting personal basic information and a gene sequencing result of a patient suffering from the same disease, converting the gene sequencing result of the patient into variation site information of the patient, and storing the variation site information and the personal basic information of the patient suffering from the same disease in subdata; establishing a plurality of sub-databases by the same method, wherein each sub-database stores the personal basic information and the genetic variation information of a patient with a certain disease;
in addition, it is also necessary to collect the personal basic information and the gene sequencing result of the random population, convert the gene sequencing result of the random population into variant site information, and store the personal basic information and the variant site information of the random population in another sub-database, so that a plurality of sub-databases can be obtained, and the sub-databases constitute the database.
The personal information of the patient includes information such as the name, sex, age, identification number, and disease of the patient.
For example, if the mutation site carried by the user to be predicted is marked as a, in step S210, the number of the patients with the disease to be predicted in the database is counted and marked as m1And the number of people carrying the mutation site A, which is marked as m2The first ratio is calculated by the following formula:
Figure BDA0001302955320000091
wherein, in the above formula, P (mutation site a | disease to be predicted) represents the occurrence probability of the mutation site a in the population of disease to be predicted.
The probability of the mutation sites in the random population is calculated by the following process:
similarly, for example, the mutation site carried by the user to be predicted is marked as a, and the number of random people in the pre-established database is counted and marked as n1Counting the data of random population carrying the variation site A in a pre-established database, and recording as n2Then, the occurrence probability of the mutation site A in the random population is calculated by the following formula:
Figure BDA0001302955320000101
in the above formula, P (mutation site a) represents the occurrence probability of a mutation site of a user to be predicted in a random population.
The incidence rate of the disease to be predicted refers to the incidence rate of the disease to be predicted in the population, and specific values can be obtained from some disease research documents or reports.
In the step S140, the probability that the user to be predicted suffers from the disease to be predicted is predicted according to the occurrence probability of the mutation site in the population to be predicted, the occurrence probability of the mutation site in the random population and the incidence probability of the disease to be predicted, as shown in fig. 3, the method includes steps S310 to S320, and specifically includes the following steps:
s310, calculating the product of the occurrence probability of the mutation sites in the disease population to be predicted and the incidence rate of the disease to be predicted;
and S320, calculating a second ratio of the product to the occurrence probability of the mutation site in the random population, and determining the second ratio as the probability that the user to be predicted has the disease to be predicted.
In the embodiment of the present invention, the mutation site information of the user to be predicted includes the following two cases:
in the first case of the process, the first,
the mutation site information of the user to be predicted only includes information of one mutation site, for example, only includes information of a mutation site a, and then the probability that the user to be predicted suffers from the disease to be predicted is calculated by the following formula:
Figure BDA0001302955320000111
in the formula, P (disease | mutation site a to be predicted) represents the probability that a user to be predicted carrying the mutation site a suffers from a disease to be predicted, P (disease | to be predicted) represents the occurrence probability of the mutation site a in a disease population to be predicted, P (disease to be predicted) represents the occurrence probability of the disease to be predicted, and P (disease | to be predicted) represents the occurrence probability of the mutation site a in a random population.
In the second case of the present invention, the first case,
the variation locus information of the user to be predicted comprises information of two or more variation loci, the variation loci of the user to be predicted are marked as variation locus 1 and variation locus 2 … variation locus N, N is 1,2,3 and 4 …, and then the probability that the user to be predicted suffers from the disease to be predicted is calculated through the following formula:
Figure BDA0001302955320000112
in the formula, P (i.e., disease | mutation site 1, mutation site 2.. mutation site N) represents the probability of a user to be predicted who carries the mutation site 1 and the mutation site 2 … to suffer from a disease to be predicted, P (i.e., mutation site 1, mutation site 2.. mutation site N | disease to be predicted) represents the occurrence probability of the mutation site 1 and the mutation site 2 … to the disease to be predicted population, P (i.e., disease to be predicted) represents the incidence probability of the disease to be predicted, and P (i.e., mutation site 1, mutation site 2.. mutation site N) represents the incidence probability of the mutation site 1 and the mutation site 2 … to the random population.
Specifically, in the embodiment of the present invention, the occurrence of each mutation site is independent from each other, and therefore, P (mutation site 1, mutation site 2.. mutation site N | disease to be predicted) can be calculated by the following formula:
Figure BDA0001302955320000121
the formula represents that the occurrence probability of the mutation site 1 and the mutation site 2 … in the disease population to be predicted is equal to the product of the occurrence probability of the mutation site 1 in the disease population to be predicted and the occurrence probability of the mutation site 2 in the disease population to be predicted and the occurrence probability of the mutation site N in the disease population to be predicted.
Specifically, in the embodiment of the present invention, P (mutation site 1, mutation site 2.. mutation site N) can be calculated by the following formula:
Figure BDA0001302955320000122
wherein, the formula shows that the incidence probability of the mutation site 1 and the mutation site 2 … in the random population is equal to the product of the incidence probability of the mutation site 1 in the random population and the incidence probability of the mutation site 2 in the random population and the incidence probability of the mutation site N in the random population.
In the embodiment of the invention, when the probability that the user to be predicted has the disease to be predicted is sent to the terminal and displayed to the user, the user can also obtain corresponding suggestions, and the specific process comprises the following steps:
receiving a suggestion acquisition request sent by a user to be predicted, wherein the suggestion acquisition request carries the probability that the user to be predicted has a disease to be predicted; acquiring suggestion information corresponding to the probability that the user to be predicted has the disease to be predicted from a third-party server associated with the disease to be predicted; and sending the recommendation information to the terminal.
Specifically, the suggested acquiring request carries, in addition to the probability that the user to be predicted suffers from the disease to be predicted, an identifier of the terminal, where the identifier may be an Internet Protocol (IP) address of the terminal, or may also be account information of the user to be predicted.
In the embodiment of the present invention, after receiving a request sent by a terminal, a server may link to a third-party server associated with a disease to be predicted according to an identifier of the disease to be predicted carried in the request, for example, the disease to be predicted is breast Cancer, and may link to a website associated with breast Cancer, for example, link to a National Comprehensive Cancer Network (NCCN) for searching for relevant recommendation information.
In the embodiment of the present invention, first, recommendation information corresponding to the probability of the user to be predicted having the disease to be predicted is obtained from the third-party server according to the probability of the user to be predicted having the disease to be predicted, for example, if the probability of the user to be predicted having the disease to be predicted is greater than or equal to the first preset value, it may be determined that the probability of the user to be predicted having the disease to be predicted is higher, at this time, recommendation information related to the probability may be obtained, and the recommendation information may be a diet recommendation, an exercise recommendation, a treatment recommendation, or the like, and the obtained recommendation information is sent to the terminal.
Besides, the method can be realized by the following steps: firstly, acquiring all recommendation information associated with the disease to be predicted from a third-party server, then, screening recommendation information matched with the probability of the user to be predicted from all the recommendation information according to the probability of the user to be predicted suffering from the disease to be predicted, sending the recommendation information to a terminal, and presenting the recommendation information to the user to be predicted.
The disease prediction method provided by the embodiment of the invention can obtain the probability that the user suffers from the disease to be predicted, the prediction result is a specific probability value, the reference is higher, the manual mode is avoided, and the accuracy and the efficiency are higher.
Referring to fig. 4, an embodiment of the present invention provides a disease prediction apparatus, which is used for executing a disease prediction method provided by an embodiment of the present invention, and the apparatus may be a server, and includes a first obtaining module 410, a first determining module 420, a second determining module 430, a predicting module 440, and a sending module 450;
the first obtaining module 410 is configured to obtain a gene sequencing result of a user to be predicted after receiving a disease prediction request sent by a terminal, where the disease prediction request carries an identifier of a disease to be predicted;
the first determining module 420 is configured to determine mutation site information of the user to be predicted according to the gene sequencing result;
the second determining module 430 is configured to calculate an occurrence probability of the mutation site in a disease group to be predicted, an occurrence probability of the mutation site in a random group, and an occurrence probability of a disease to be predicted;
the prediction module 440 is configured to predict the probability that the user to be predicted suffers from the disease to be predicted according to the occurrence probability of the mutation site in the disease to be predicted population, the occurrence probability of the mutation site in the random population, and the incidence probability of the disease to be predicted;
the sending module 450 is configured to send the probability that the user to be predicted has the disease to be predicted to the terminal.
Specifically, the determining, by the first determining module 420, the mutation site information of the user to be predicted according to the gene sequencing result is implemented by a comparing unit and a first determining unit, and specifically includes:
the comparison unit is used for comparing the gene sequencing result with a reference genome to obtain a comparison result; the first determining unit is configured to determine mutation site information of the user to be predicted according to the comparison result.
As an embodiment, the determining, by the second determining module 430, the occurrence probability of the mutation site in the disease group to be predicted is implemented by a statistical unit, a first calculating unit, and a second determining unit, which specifically includes:
the statistical unit is used for counting the number of people with diseases to be predicted in a pre-established database and the number of people with the mutation sites in the people with the diseases to be predicted; the first calculating unit is used for calculating a first ratio of the number of people with the mutation sites in the people with the diseases to be predicted to the number of people with the diseases to be predicted; the second determining unit is configured to determine the first ratio as an occurrence probability of the mutation site in a disease group to be predicted.
Specifically, the predicting module 440 predicts the probability that the user to be predicted has the disease to be predicted by using the second calculating unit, the third calculating unit and the third determining unit, and specifically includes:
the second calculating unit is used for calculating the product of the occurrence probability of the mutation site in the disease population to be predicted and the incidence probability of the disease to be predicted; the third calculating unit is configured to calculate a second ratio of the product to the occurrence probability of the mutation site in the random population; the third determining unit is configured to determine the second ratio as a probability that the user to be predicted has a disease to be predicted.
As an embodiment, the apparatus provided in the embodiment of the present invention further includes a receiving module and a second obtaining module;
the receiving module is configured to receive a suggestion acquisition request sent by the user to be predicted, where the suggestion acquisition request carries a probability that the user to be predicted has a disease to be predicted; the second obtaining module is configured to obtain recommendation information corresponding to a probability that the user to be predicted has the disease to be predicted from a third-party server associated with the disease to be predicted; the sending module is further configured to send the recommendation information to a terminal.
The disease prediction device provided by the embodiment of the invention can obtain the probability that the user suffers from the disease to be predicted, the prediction result is a specific probability value, the reference is higher, the manual mode is avoided, and the accuracy and the efficiency are higher.
The disease prediction device provided by the embodiment of the invention can be specific hardware on the equipment or software or firmware installed on the equipment. The device provided by the embodiment of the present invention has the same implementation principle and technical effect as the method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the method embodiments without reference to the device embodiments. It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the foregoing systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments provided by the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus once an item is defined in one figure, it need not be further defined and explained in subsequent figures, and moreover, the terms "first", "second", "third", etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the present invention in its spirit and scope. Are intended to be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (4)

1. A disease prediction apparatus, characterized in that the apparatus comprises:
the system comprises a first obtaining module, a second obtaining module and a third obtaining module, wherein the first obtaining module is used for obtaining a gene sequencing result of a user to be predicted after receiving a disease prediction request sent by a terminal, and the disease prediction request carries an identification of a disease to be predicted;
the first determination module is used for determining the mutation site information of the user to be predicted according to the gene sequencing result;
a second determination module, configured to determine occurrence probabilities of the variant loci in the population with the disease to be predicted, occurrence probabilities of the variant loci in the random population, and occurrence probabilities of the disease to be predicted;
the prediction module is used for predicting the probability that the user to be predicted has the disease to be predicted according to the occurrence probability of the mutation sites in the disease population to be predicted, the occurrence probability in random population and the incidence probability of the disease to be predicted;
wherein the prediction module comprises:
the second calculation unit is used for calculating the product of the occurrence probability of the mutation site in the disease population to be predicted and the incidence probability of the disease to be predicted;
a third calculating unit, configured to calculate a second ratio of the product to an occurrence probability of the mutation site in a random population, and determine the second ratio as a probability that the user to be predicted has the disease to be predicted;
and the sending module is used for sending the probability that the user to be predicted has the disease to be predicted to the terminal.
2. The apparatus of claim 1, wherein the first determining module comprises:
the comparison unit is used for comparing the gene sequencing result with a reference genome to obtain a comparison result;
and the first determining unit is used for determining the genetic variation information of the user according to the comparison result.
3. The apparatus of claim 1, wherein the second determining module comprises:
the statistical unit is used for counting the number of people suffering from the disease to be predicted in a pre-established database and the number of people carrying the variation sites in the people suffering from the disease to be predicted;
the first calculation unit is used for calculating a first ratio of the number of people with the variant loci in the people with the diseases to be predicted to the number of the people with the diseases to be predicted;
and the second determining unit is used for determining the first ratio as the occurrence probability of the mutation site in the disease population to be predicted.
4. The apparatus of claim 1, further comprising:
a receiving module, configured to receive a suggestion acquisition request sent by the user to be predicted, where the suggestion acquisition request carries a probability that the user to be predicted has the disease to be predicted;
the second obtaining module is used for obtaining suggestion information corresponding to the probability that the user to be predicted has the disease to be predicted from a third-party server associated with the disease to be predicted;
the sending module is further configured to send the recommendation information to the terminal.
CN201710371749.0A 2017-05-24 2017-05-24 Disease prediction method and device Active CN106960133B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710371749.0A CN106960133B (en) 2017-05-24 2017-05-24 Disease prediction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710371749.0A CN106960133B (en) 2017-05-24 2017-05-24 Disease prediction method and device

Publications (2)

Publication Number Publication Date
CN106960133A CN106960133A (en) 2017-07-18
CN106960133B true CN106960133B (en) 2020-08-11

Family

ID=59482352

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710371749.0A Active CN106960133B (en) 2017-05-24 2017-05-24 Disease prediction method and device

Country Status (1)

Country Link
CN (1) CN106960133B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107577907B (en) * 2017-09-08 2021-04-02 成都奇恩生物科技有限公司 Rare disease auxiliary diagnosis system based on Internet and use method
CN108986878A (en) * 2018-08-03 2018-12-11 武汉白原科技有限公司 Health guidance report-generating method, device and server based on genetic test
CN109620147A (en) * 2018-11-29 2019-04-16 深圳市衣信互联网科技有限公司 A kind of monitoring mammary gland early warning system and its monitoring method based on NFC
CN111554404B (en) * 2020-04-13 2023-09-08 吾征智能技术(北京)有限公司 Disease prediction system and method based on indoor environment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102542179A (en) * 2010-10-27 2012-07-04 三星Sds株式会社 Apparatus and method for extracting biomarkers
WO2014149972A1 (en) * 2013-03-15 2014-09-25 The Scripps Research Institute Systems and methods for genomic annotation and distributed variant interpretation
CN105229649A (en) * 2013-03-15 2016-01-06 百世嘉(上海)医疗技术有限公司 For the human genome analysis of variance of disease association and the system and method for report
CN106096331A (en) * 2016-06-12 2016-11-09 中南大学 A kind of method inferring lncRNA and disease contact
CN106650241A (en) * 2016-11-24 2017-05-10 首都医科大学附属北京胸科医院 Prediction and diagnosis method for complex disease based on Multi-Omics
CN106702018A (en) * 2017-03-21 2017-05-24 为朔医学数据科技(北京)有限公司 Single gene inheritance disease detection method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102542179A (en) * 2010-10-27 2012-07-04 三星Sds株式会社 Apparatus and method for extracting biomarkers
WO2014149972A1 (en) * 2013-03-15 2014-09-25 The Scripps Research Institute Systems and methods for genomic annotation and distributed variant interpretation
CN105229649A (en) * 2013-03-15 2016-01-06 百世嘉(上海)医疗技术有限公司 For the human genome analysis of variance of disease association and the system and method for report
CN106096331A (en) * 2016-06-12 2016-11-09 中南大学 A kind of method inferring lncRNA and disease contact
CN106650241A (en) * 2016-11-24 2017-05-10 首都医科大学附属北京胸科医院 Prediction and diagnosis method for complex disease based on Multi-Omics
CN106702018A (en) * 2017-03-21 2017-05-24 为朔医学数据科技(北京)有限公司 Single gene inheritance disease detection method and device

Also Published As

Publication number Publication date
CN106960133A (en) 2017-07-18

Similar Documents

Publication Publication Date Title
CN106960133B (en) Disease prediction method and device
CN110021364B (en) Analysis and detection system for screening single-gene genetic disease pathogenic genes based on patient clinical symptom data and whole exome sequencing data
Ilie et al. HiTEC: accurate error correction in high-throughput sequencing data
Clima et al. HmtDB 2016: data update, a better performing query system and human mitochondrial DNA haplogroup predictor
Riggs et al. Phenotypic information in genomic variant databases enhances clinical care and research: the International Standards for Cytogenomic Arrays Consortium experience
CN108121896B (en) Disease relation analysis method and device based on miRNA
US20120078901A1 (en) Personal Genome Indexer
US20170147753A1 (en) Method for searching for similar case of multi-dimensional health data and apparatus for the same
CN109299356B (en) Activity recommendation method and device based on big data, electronic equipment and storage medium
de Oliveira et al. Comparing co-evolution methods and their application to template-free protein structure prediction
CN112860997A (en) Medical resource recommendation method, device, equipment and storage medium
US20190026432A1 (en) Genomic services platform supporting multiple application providers
Tammi et al. Correcting errors in shotgun sequences
US20210397996A1 (en) Methods and systems for classification using expert data
EP2602734A1 (en) Robust variant identification and validation
US20190026428A1 (en) Genomic services platform supporting multiple application providers
JP6384130B2 (en) Information processing apparatus and information processing program
CN111612357A (en) Method and device for matching merchants for riders, storage medium and electronic equipment
KR101961438B1 (en) Data providing apparatus, and method
US10861587B2 (en) Cross-network genomic data user interface
JP2011134106A (en) Medical information collection system, medical information collection processing method and display control method for medical information collection screen
CN115346634A (en) Physical examination report interpretation prediction method and system, electronic equipment and storage medium
CN110476215A (en) Signature-hash for multisequencing file
Vanderbilt et al. Role of bioinformatics in molecular medicine
US20160342732A1 (en) Systems and methods for haplotyping

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant