WO2020220545A1 - 基于长短期记忆模型的疾病预测方法、装置和计算机设备 - Google Patents

基于长短期记忆模型的疾病预测方法、装置和计算机设备 Download PDF

Info

Publication number
WO2020220545A1
WO2020220545A1 PCT/CN2019/103547 CN2019103547W WO2020220545A1 WO 2020220545 A1 WO2020220545 A1 WO 2020220545A1 CN 2019103547 W CN2019103547 W CN 2019103547W WO 2020220545 A1 WO2020220545 A1 WO 2020220545A1
Authority
WO
WIPO (PCT)
Prior art keywords
term memory
short
disease
long
hidden state
Prior art date
Application number
PCT/CN2019/103547
Other languages
English (en)
French (fr)
Inventor
贾文笑
谭克为
李响
谢国彤
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Priority to US17/264,299 priority Critical patent/US11710571B2/en
Priority to SG11202008385YA priority patent/SG11202008385YA/en
Publication of WO2020220545A1 publication Critical patent/WO2020220545A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Definitions

  • This application relates to the computer field, and in particular to a disease prediction method, device, computer equipment and storage medium based on a long and short-term memory model.
  • Disease risk prediction is produced by the combination of artificial intelligence and medicine. Its core is to predict the probability of a certain disease in the future.
  • the main calculation methods include classic regression analysis, traditional machine learning methods and emerging deep learning methods.
  • real-world medical data is of poor quality, high data dimensions, unbalanced data and discontinuous time series of medical data, which makes it difficult to accurately predict the risk of diseases.
  • the existing disease risk prediction system requires the patient’s physical examination data, but the physical examination process is time-consuming and laborious; at the same time, the existing products can only analyze the risk of a single disease and cannot consider the relationship between diseases; in addition, the risk prediction of existing products can only To predict the risk of illness for a period of time in the future, the time information of the input variable cannot be considered; and, the existing disease prediction only considers the medical data of the tested object, and the prediction accuracy needs to be improved; the existing disease prediction product only uses a single prediction The model has insufficient prediction accuracy.
  • the main purpose of this application is to provide a disease prediction method, device, computer equipment and storage medium based on a long and short-term memory model, aiming to improve the accuracy of disease prediction.
  • this application proposes a disease prediction method based on a long and short-term memory model, which includes the following steps:
  • the first medical data of the target object and the second medical data of the associated object where the target object has a blood relationship with the associated object, the first medical data includes medication history, disease history, and operation history, and the second medical data includes genetic disease Treatment history
  • the memory model includes a first long short-term memory network for encoding and a second long-short-term memory network for decoding;
  • the hidden state vector sequence into the second long and short-term memory network for operation to obtain a disease prediction result, wherein the disease prediction result includes the predicted disease type and the corresponding incidence probability;
  • a predicted disease whose incidence rate is higher than a preset threshold is screened out from the disease prediction result and recorded as a designated disease, and the associated disease directly connected to the designated disease is obtained according to the preset disease association network, where the network of the association network
  • the nodes are different types of diseases
  • This application provides a disease prediction device based on a long and short-term memory model, including:
  • a medical data acquisition unit for acquiring first medical data of a target object and second medical data of an associated object, wherein the target object has a blood relationship with the associated object, and the first medical data includes medication history, disease history, and surgical history,
  • the second medical data includes the history of treatment of genetic diseases
  • the hidden state vector sequence acquisition unit is used to input the first medical data and the second medical data into the first long-short-term memory network in the trained long-term short-term memory model to obtain Hidden state vector sequence
  • the long short-term memory model includes a first long short-term memory network used for encoding and a second long short-term memory network used for decoding;
  • a disease prediction result obtaining unit configured to input the hidden state vector sequence into the second long-short-term memory network for operation to obtain a disease prediction result, wherein the disease prediction result includes the predicted disease type and the corresponding incidence probability;
  • the associated disease acquisition unit is used to screen out the predicted diseases whose incidence rate is higher than the preset threshold from the disease prediction results, record them as designated diseases, and obtain the correlation directly connected with the designated diseases according to the preset disease association network Diseases, where the network nodes of the associated network are of different disease types;
  • the output unit is used to output the disease prediction result and the associated disease.
  • the present application provides a computer device including a memory and a processor, the memory stores a computer program, and the processor implements the steps of any one of the above methods when the computer program is executed.
  • the present application provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of any of the above methods are implemented.
  • the disease prediction method, device, computer equipment, and storage medium of the present application based on the long and short-term memory model obtain the first medical data of the target object and the second medical data of the associated object; input the first medical data and the second medical data into the Operate in the first long-short-term memory network in the trained long-term short-term memory model to obtain the hidden state vector sequence in the first long-short-term memory network; input the hidden state vector sequence into the second long-short-term memory network
  • the disease prediction result is obtained; from the disease prediction result, the predicted disease whose incidence rate is higher than the preset threshold is selected and recorded as the designated disease, and the disease directly connected to the designated disease is obtained according to the preset disease association network Associated disease; outputting the disease prediction result and the associated disease, thereby improving the accuracy of prediction.
  • FIG. 1 is a schematic flowchart of a disease prediction method based on a long and short-term memory model according to an embodiment of the application;
  • FIG. 2 is a schematic block diagram of the structure of a disease prediction device based on a long and short-term memory model according to an embodiment of the application;
  • FIG. 3 is a schematic block diagram of the structure of a computer device according to an embodiment of the application.
  • an embodiment of the present application provides a disease prediction method based on a long and short-term memory model, including the following steps:
  • the long short-term memory model includes a first long-short-term memory network for encoding and a second long-short-term memory network for decoding;
  • the first medical data of the target object and the second medical data of the associated object are acquired, wherein the target object has a blood relationship with the associated object, and the first medical data includes medication history, disease history, and surgical history,
  • the second medical data includes the history of treatment of genetic diseases.
  • the application is used to predict the disease of the target object, and the second medical data of the associated object is used to assist in predicting the disease of the target object, and the second medical data includes the treatment history of genetic diseases. Since genetic diseases are related to blood relationship, even if the objects with blood relationship do not show obvious characteristics of the genetic disease, they may have recessive physiological characteristics (hidden disease). Therefore, the treatment history of the genetic disease of the related object can help predict the target object’s disease.
  • the first medical data includes medication history, disease history, and surgical history. Since medication history, disease history, and surgical history have an impact on the human body, it can be used as a basis for disease prediction. For example, for patients who have used pioglitazone, captopril, and nitrendipine as historical medications for the treatment of diabetes, hypertension, and atrial fibrillation, they may be at risk of myocardial infarction, coronary heart disease, and stroke in the future.
  • the traditional technology only adopts a one-to-one analysis strategy, that is, only considers the medical data of the target object to predict the future disease of the target object, and this application also uses the second medical data of the associated object as the prediction data to increase the accuracy of prediction .
  • the traditional technology only models a single disease, so it can only predict a single disease, and this application can realize the prediction of multiple diseases in different time intervals.
  • the medical data in this application not only includes the first medical data of the target object, but also considers the second medical treatment of the associated object, which makes the prediction more accurate, and overcomes the shortcomings of the target object’s panic reporting of medical history and concealment of family medical history (that is, even if the target The genetic disease of is recessive, but it also exists at the genetic level. It may be dominant in the future, and it may also have an impact on body functions and other diseases. Therefore, through the second medical data of the associated object, In order to obtain data such as genetic diseases of the target object, thereby improving the accuracy of disease prediction).
  • the long short-term memory model includes a first long short-term memory network used for encoding and a second long short-term memory network used for decoding.
  • the long and short-term memory model is a model that uses a long and short-term memory network.
  • the long and short-term memory network is a time recurrent neural network, which is suitable for processing and predicting important events with relatively long intervals and delays in a time series.
  • the long and short-term memory model used in this application includes a first long-short-term memory network for encoding and a second long-short-term memory network for decoding, so as to realize the temporal prediction of multiple diseases.
  • encoding in this application refers to converting input information into a vector sequence of a specified length
  • decoding refers to converting an input vector sequence into a predicted vector sequence.
  • the hidden state vector sequence is input into the second long-term short-term memory network for operation to obtain a disease prediction result, wherein the disease prediction result includes the predicted disease type and the corresponding incidence probability.
  • the predicted disease whose incidence rate is higher than the preset threshold is selected from the disease prediction result, and recorded as the designated disease, and the association directly connected with the designated disease is obtained according to the preset disease association network Diseases, where the network nodes of the associated network are of different disease types.
  • the disease association network may be any association network, such as a knowledge graph network.
  • the knowledge graph network is constructed, for example, by using a preset knowledge graph construction tool to identify the initial entity from the designated information collected in advance, Wherein the designated information records at least the designated disease, and the initial entity includes at least the designated disease; the initial entity is deduplicated to obtain the final entity; and the final entity is extracted from the designated information The relationship between them is formed into a triplet, and the knowledge graph network is generated according to the triplet. Thus, on the basis of the long and short-term memory model, further predictions are made to further improve the accuracy of predictions.
  • the disease prediction result and the associated disease are output.
  • the disease prediction result is the output result of the long and short-term memory model
  • the associated disease is the output result of the disease association network, so that the accuracy of prediction is further improved by combining the long and short-term memory model and the disease association network.
  • Step S2 of the state vector sequence includes:
  • the hidden state vector sequence in the first long short-term memory network is obtained.
  • the designated impact factor has the same value or dynamic change in value in different time periods (because the degree of influence of genetic diseases on other diseases changes with time).
  • the first medical data is medical data in a period of time, which is divided into multiple data sequences in different periods of time, and in order to accommodate the second medical data into the long and short-term memory model, this application responds to the second medical data as The influence factor is designated, so that the multiple data sequences and the designated influence factor are used together as the input of the long and short-term memory model.
  • a single piece of data and the corresponding designated impact factor will generate a high-dimensional vector, so multiple pieces of data sequence and the corresponding designated impact factor will generate a high-dimensional vector sequence, which will be used as a calculation in the long-term short-term memory model that has been trained basis.
  • Other diseases refer to diseases other than the said genetic diseases.
  • the impact factor refers to data obtained by quantifying the impact of genetic diseases on other diseases, and is used for calculation in a long and short-term memory network.
  • the impact factor can exist in any form, for example, in the form of a separate vector.
  • said inputting said multi-segment data sequence and said designated influence factor into the first long-short-term memory network in the long-short-term memory model that has been trained for operation to obtain Step S203 of the hidden state vector sequence includes:
  • h t LSTM enc (x t , h t-1 )
  • t the t-th time period
  • h t the corresponding
  • h t-1 the hidden state vector corresponding to the t-1 time period
  • X t the input data in the t-th time period
  • LSTM enc refers to the use of the first long and short term
  • the memory network performs encoding operations, where X t includes the first medical data of the t-th time period and the designated impact factor of the t-th time period;
  • the hidden state vectors corresponding to a plurality of preset time periods are formed into a hidden state vector sequence h 1 , h 2 ..., h n , in which there are n time periods in total.
  • the multi-segment data sequence and the specified influence factor are input into the first long-short-term memory network in the trained long-term short-term memory model to obtain the hiding State vector sequence.
  • the vectors form a sequence of hidden state vectors h 1 , h 2 ..., h n , in which there are n time periods in total.
  • the first long short-term memory network encodes the multiple data sequences and the designated influence factor into a hidden state vector sequence, which serves as a decoding basis for the second long short-term memory network.
  • the hidden state vector sequence h 1 , h 2 ..., h n is formed by the hidden state vectors corresponding to a plurality of preset time periods, wherein the step S2032 of n time periods includes:
  • e ij score(s i , h j )
  • a ij is a weight parameter, there are n time periods in total
  • s i is the second long
  • the i-th hidden state vector in the short-term memory network, score(s i , h j ) refers to the score calculated based on s i and h j using a preset score function;
  • the hidden state vectors corresponding to a plurality of preset time periods are formed into a hidden state vector sequence h 1 , h 2 , h n .
  • the vector sequence serves as the decoding basis of the second long short-term memory network. Due to the adoption of the attention mechanism, the weight distribution is more accurate, which helps to improve the accuracy of prediction.
  • the hidden state vector sequence is input into the second long-short-term memory network for operation to obtain a disease prediction result, wherein the disease prediction result includes the step S3 of predicting the disease type and the corresponding incidence probability ,include:
  • S301 Input the hidden state vector sequence into the second long short-term memory network for operation, so as to obtain a high-dimensional vector sequence output by the second long short-term memory network;
  • the hidden state vector sequence is input into the second long and short-term memory network for operation, and the disease prediction result is obtained.
  • the output of the second long and short-term memory network is a high-dimensional vector sequence, which represents the prediction results of different time periods, and the sub-vectors of the high-dimensional vector represent the predicted disease type and corresponding disease. Probability of disease. According to the preset corresponding relationship between the score vector and the meaning of the prediction result, the predicted disease type and the corresponding incidence probability in different time periods in the future can be known.
  • the hidden state vector sequence is input into the second long-short-term memory network for operation to obtain a disease prediction result, wherein the disease prediction result includes the step S3 of predicting the disease type and the corresponding incidence probability After that, include:
  • the final improved disease prediction result is screened from the multiple groups of improved disease prediction results, and a treatment recommendation plan is generated, wherein the treatment recommendation plan is accompanied by the corresponding final improvement disease prediction result Improvement factor group.
  • the generation of treatment recommendations is achieved. Since the input of this application is medical data including medication history and surgery history, hypothetical medication or surgery can also be accepted, and the long and short-term memory model of this application can simulate the curative effect of the treatment plan. Therefore, a plurality of input improvement factor groups are received, and the improvement factor groups and the first medical data and the second medical data are input into the trained long and short-term memory model for calculation, wherein the improvement factor groups are included in the specified time Point to perform medication or surgery; obtain multiple groups of improved disease prediction results corresponding to the multiple improvement factor groups output by the long-term short-term memory model, wherein the improved disease prediction results include predicted disease types and corresponding incidence rates; According to preset selection rules, the final improved disease prediction results are screened from the multiple groups of improved disease prediction results, and a treatment recommendation plan is generated, wherein the treatment recommendation plan is accompanied by improvement factors corresponding to the final improved disease prediction result group.
  • the preset selection rule is, for example, that the predicted disease type is the least, or the incidence of the predicted disease type is less
  • the disease-associated network is a knowledge graph network, the step of obtaining the associated diseases directly connected to the specified disease according to a preset disease-associated network, wherein the network nodes of the associated network are of different disease types Before S4, including:
  • S321. Use a preset knowledge graph construction tool to identify an initial entity from pre-collected designated information, where the designated information at least records the designated disease, and the initial entity includes at least the designated disease;
  • S323 Extract the relationship between the final entities from the specified information to form a triplet, and generate the knowledge graph network according to the triplet.
  • the preset knowledge graph construction tool can be any tool, such as existing SPSS, UcinetNetDraw, VOSviewer, etc., since the above tools are existing knowledge graph construction tools, they will not be repeated.
  • the designated information records disease information, and the relationship between the diseases can be learned from this.
  • the entity is the knowledge node in the knowledge graph, and the initial entity refers to the knowledge node that has not been reprocessed in the past.
  • the process of identifying the initial entity is, for example, performing word segmentation processing on the specified information to obtain a word sequence composed of multiple words, and inputting the word sequence into a preset sentence structure model, thereby obtaining the initial entity from the word sequence .
  • the process of deduplication processing is, for example: synonymous judgment is performed on all initial entities, and the initial entities belonging to the same synonym group are replaced with a word in the synonym group.
  • the relationship between the final entities is extracted from the designated information to form a triplet, and the knowledge graph including the designated member is generated according to the triplet.
  • the triples refer to the relationship between two entities, for example.
  • the method for extracting the relationship between the final entities from the specified information is, for example: nesting the specified information into a preset sentence structure, so that the relationship between multiple entities is expressed through the sentence structure
  • the vocabulary is extracted. Based on this, the relationship between diseases is expressed in the form of a knowledge map network, where the disease type is used as a knowledge node in the knowledge map. Further, the knowledge nodes of the knowledge graph may also include entities of non-disease type.
  • the disease prediction method based on the long and short-term memory model of the present application obtains the first medical data of the target object and the second medical data of the associated object; and inputs the first medical data and the second medical data into the long and short-term memory model that has been trained Operate in the first long short-term memory network to obtain the hidden state vector sequence in the first long-short-term memory network; input the hidden state vector sequence into the second long-short-term memory network for operation to obtain the disease prediction result; A predicted disease whose incidence rate is higher than a preset threshold is selected from the disease prediction result, and recorded as a designated disease, and related diseases directly connected to the designated disease are obtained according to the preset disease-related network; output the disease prediction The result is associated with the disease, thereby improving the accuracy of prediction.
  • an embodiment of the present application provides a disease prediction device based on a long and short-term memory model, including:
  • the medical data acquisition unit 10 is used to acquire first medical data of a target object and second medical data of an associated object, wherein the target object has a blood relationship with the associated object, and the first medical data includes medication history, disease history, and surgical history ,
  • the second medical data includes the history of treatment of genetic diseases;
  • the hidden state vector sequence acquisition unit 20 is used to input the first medical data and the second medical data into the first long-short-term memory network in the trained long- and short-term memory model to obtain Hidden state vector sequence, the long-short-term memory model includes a first long-short-term memory network for encoding and a second long-short-term memory network for decoding;
  • the disease prediction result obtaining unit 30 is configured to input the hidden state vector sequence into the second long-short-term memory network for operation to obtain a disease prediction result, wherein the disease prediction result includes the predicted disease type and the corresponding incidence probability;
  • the associated disease acquisition unit 40 is configured to screen out predicted diseases with an incidence rate higher than a preset threshold from the disease prediction results, record them as designated diseases, and acquire the diseases directly connected to the designated diseases according to the preset disease association network Associated diseases, where the network nodes of the associated network are of different disease types;
  • the output unit 50 is configured to output the disease prediction result and the associated disease.
  • the hidden state vector sequence acquiring unit 20 includes:
  • a multi-segment data sequence acquisition subunit configured to divide the first medical data into multiple data sequences according to a preset time period
  • the designated influence factor acquisition subunit is used to acquire the designated influence factors of the genetic disease on other diseases in the second medical data according to the preset correspondence between the genetic disease and the influence factors of other diseases;
  • the hidden state vector sequence acquisition subunit is used to input the multiple pieces of data sequence and the specified influence factor into the first long short-term memory network in the long short-term memory model that has been trained for operation to obtain the first long short-term memory
  • the hidden state vector sequence in the network is used to input the multiple pieces of data sequence and the specified influence factor into the first long short-term memory network in the long short-term memory model that has been trained for operation to obtain the first long short-term memory.
  • the hidden state vector sequence acquiring subunit includes:
  • the hidden state vector sequence acquisition module is used to form a hidden state vector sequence h 1 , h 2 ..., h n from the hidden state vectors corresponding to a plurality of preset time periods, in which there are n time periods in total.
  • the hidden state vector sequence acquisition module includes:
  • score(s i , h j ) refers to the score calculated based on s i and h j using a preset score function;
  • the hidden state vector sequence acquisition sub-module is used to form the hidden state vector sequence c 1 , c 2 ..., c n from the final hidden state vectors corresponding to multiple preset time periods.
  • the disease prediction result obtaining unit 30 includes:
  • the high-dimensional vector sequence obtaining subunit is used to input the hidden state vector sequence into the second long-short-term memory network for operation, thereby obtaining the high-dimensional vector sequence output by the second long-short-term memory network;
  • the disease prediction result acquisition subunit is used to interpret the high-dimensional vector sequence according to the preset corresponding relationship between the component vector and the meaning of the prediction result, so as to obtain disease prediction results in different time periods in the future, wherein the disease prediction results include Predict the type of disease and the corresponding incidence rate.
  • the device includes:
  • the improvement factor group receiving unit is configured to receive multiple input improvement factor groups, and input the improvement factor groups and the first medical data and the second medical data into the long-term short-term memory model that has been trained for calculation, wherein
  • the improvement factor group includes medication or surgery at a specified time point;
  • the improved disease prediction result obtaining unit is configured to obtain multiple groups of improved disease prediction results output by the long-term short-term memory model and corresponding to the multiple improvement factor groups, wherein the improved disease prediction results include predicted disease types and corresponding Incidence rate
  • a unit for generating a treatment recommendation plan is used to screen out the final improved disease prediction results from the multiple groups of improved disease prediction results according to preset selection rules, and generate a treatment recommendation plan, wherein the treatment recommendation plan is accompanied by the final The improvement factor group corresponding to the improved disease prediction result.
  • the disease-associated network is a knowledge graph network
  • the device includes:
  • the initial entity identification unit is configured to use a preset knowledge graph construction tool to identify an initial entity from designated information collected in advance, wherein the designated information at least records the designated disease, and the initial entity includes at least the designated disease ;
  • the final entity obtaining unit is used to de-duplicate the initial entity, thereby obtaining the final entity
  • the knowledge graph network generating unit is used to extract the relationship between the final entities from the specified information to form a triplet, and generate the knowledge graph network according to the triplet.
  • the disease prediction device based on the long and short-term memory model of the present application obtains the first medical data of the target object and the second medical data of the associated object; inputs the first medical data and the second medical data into the trained long- and short-term memory model Operate in the first long short-term memory network to obtain the hidden state vector sequence in the first long-short-term memory network; input the hidden state vector sequence into the second long-short-term memory network for operation to obtain the disease prediction result; A predicted disease whose incidence rate is higher than a preset threshold is selected from the disease prediction result, and recorded as a designated disease, and related diseases directly connected to the designated disease are obtained according to the preset disease-related network; output the disease prediction The result is associated with the disease, thereby improving the accuracy of prediction.
  • an embodiment of the present invention also provides a computer device.
  • the computer device may be a server, and its internal structure may be as shown in the figure.
  • the computer equipment includes a processor, a memory, a network interface and a database connected through a system bus. Among them, the computer designed processor is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, a computer program, and a database.
  • the memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
  • the database of the computer equipment is used to store data used in the disease prediction method based on the long and short-term memory model.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer program is executed by the processor to realize a disease prediction method based on the long and short-term memory model.
  • the above-mentioned processor executes the above-mentioned disease prediction method based on the long and short-term memory model, wherein the steps included in the method respectively correspond to the steps of executing the disease prediction method based on the long- and short-term memory model of the aforementioned embodiment, and will not be repeated here.
  • An embodiment of the present application also provides a computer-readable storage medium on which a computer program is stored.
  • a computer program is executed by a processor, a disease prediction method based on a long and short-term memory model is implemented, wherein the steps included in the method are respectively executed
  • the steps of the disease prediction method based on the long and short-term memory model in the foregoing embodiment correspond one to one, and will not be repeated here.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual-rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

一种基于长短期记忆模型的疾病预测方法、装置、计算机设备和存储介质,包括:获取目标对象的第一医疗数据和关联对象的第二医疗数据,其中所述目标对象与关联对象存在血缘关系,第一医疗数据包括用药史、疾病史和手术史,第二医疗数据包括遗传病治疗史(S1);将第一医疗数据和第二医疗数据输入已训练完成的长短期记忆模型中的第一长短期记忆网络中运算,得到所述第一长短期记忆网络中的隐藏状态向量序列,所述长短期记忆模型包括用于编码的第一长短期记忆网络和用于解码的第二长短期记忆网络(S2);将所述隐藏状态向量序列输入所述第二长短期记忆网络中运算,得到疾病预测结果,其中所述疾病预测结果包括预测疾病类型与对应的发病机率(S3);从所述疾病预测结果中筛选出发病机率高于预设阈值的预测疾病,记为指定疾病,并根据预设的疾病关联网络获取与所述指定疾病直接连接的关联疾病(S4);输出所述疾病预测结果与所述关联疾病(S5),从而提高预测的准确性。

Description

基于长短期记忆模型的疾病预测方法、装置和计算机设备
本申请要求于2019年6月27日提交中国专利局、申请号为201910570055.9,发明名称为“基于长短期记忆模型的疾病预测方法、装置和计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及到计算机领域,特别是涉及到一种基于长短期记忆模型的疾病预测方法、装置、计算机设备和存储介质。
背景技术
疾病风险预测是人工智能和医学结合产生的,它的核心是预测未来一段时间内患某种疾病的概率,主要的计算方法包括经典的回归分析,传统的机器学习方法和新兴的深度学习方法。但是真实世界的医学数据质量差,数据维度高,数据不平衡和医学数据不连续的时序性,导致准确地预测疾病的风险难度很大。现有的疾病风险预测系统需要患者的体检数据,但是体检过程耗时费力;同时现有产品只能分析单一疾病的风险,不能考虑疾病之间的关联;此外,现有产品的风险预测只能预测未来一段时间的患病风险,不能考虑输入变量的时间信息;并且,现有的疾病预测只考虑被测对象的医疗数据,预测准确度有待提高;现有的疾病预测产品仅采用单一的预测模型,预测准确度不足。
技术问题
本申请的主要目的为提供一种基于长短期记忆模型的疾病预测方法、装置、计算机设备和存储介质,旨在提高疾病预测的准确度。
技术解决方案
为了实现上述发明目的,本申请提出一种基于长短期记忆模型的疾病预测方法,包括以下步骤:
获取目标对象的第一医疗数据和关联对象的第二医疗数据,其中所述目标对象与关联对象存在血缘关系,第一医疗数据包括用药史、疾病史和手术史,第二医疗数据包括遗传病治疗史;
将第一医疗数据和第二医疗数据输入已训练完成的长短期记忆模型中的第一长短期记忆网络中运算,得到所述第一长短期记忆网络中的隐藏状态向量序列,所述长短期记忆模型包括用于编码的第一长短期记忆网络和用于解码的第二长短期记忆网络;
将所述隐藏状态向量序列输入所述第二长短期记忆网络中运算,得到疾病预测结果,其中所述疾病预测结果包括预测疾病类型与对应的发病机率;
从所述疾病预测结果中筛选出发病机率高于预设阈值的预测疾病,记为指定疾病,并根据预设的疾病关联网络获取与所述指定疾病直接连接的关联疾病,其中关联网络的网络节点为不同的疾病类型;
输出所述疾病预测结果与所述关联疾病。
本申请提供一种基于长短期记忆模型的疾病预测装置,包括:
医疗数据获取单元,用于获取目标对象的第一医疗数据和关联对象的第二医疗数据,其中所述目标对象与关联对象存在血缘关系,第一医疗数据包括用药史、疾病史和手术史,第二医疗数据包括遗传病治疗史;
隐藏状态向量序列获取单元,用于将第一医疗数据和第二医疗数据输入已训练完成的长短期记忆模型中的第一长短期记忆网络中运算,得到所述第一长短期记忆网络中的隐藏状态向量序列,所述长短期记忆模型包括用于编码的第一长短期记忆网络和用于解码的第二长短期记忆网络;
疾病预测结果获取单元,用于将所述隐藏状态向量序列输入所述第二长短期记忆网络中运算,得到疾病预测结果,其中所述疾病预测结果包括预测疾病类型与对应的发病机率;
关联疾病获取单元,用于从所述疾病预测结果中筛选出发病机率高于预设阈值的预测疾病,记为指定疾病,并根据预设的疾病关联网络获取与所述指定疾病直接连接的关联疾病,其中关联网络的网络节点为不同的疾病类型;
输出单元,用于输出所述疾病预测结果与所述关联疾病。
本申请提供一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现上述任一项所述方法的步骤。
本申请提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述任一项所述的方法的步骤。
有益效果
本申请的基于长短期记忆模型的疾病预测方法、装置、计算机设备和存储介质,获取目标对象的第一医疗数据和关联对象的第二医疗数据;将第一医疗数据和第二医疗数据输入已训练完成的长短期记忆模型中的第一长短期记忆网络中运算,得到所述第一长短期记忆网络中的隐藏状态向量序列;将所述隐藏状态向量序列输入所述第二长短期记忆网络中运算,得到疾病预测结果;从所述疾病预测结果中筛选出发病机率高于预设阈值的预测疾病,记为指定疾病,并根据预设的疾病关联网络获取与所述指定疾病直接连接的关联疾病;输出所述疾病预测结果与所述关联疾病,从而提高预测的准确性。
附图说明
图1为本申请一实施例的基于长短期记忆模型的疾病预测方法的流程示意图;
图2为本申请一实施例的基于长短期记忆模型的疾病预测装置的结构示意框图;
图3为本申请一实施例的计算机设备的结构示意框图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
本发明的最佳实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
参照图1,本申请实施例提供一种基于长短期记忆模型的疾病预测方法,包括以下步骤:
S1、获取目标对象的第一医疗数据和关联对象的第二医疗数据,其中所述目标对象与关联对象存在血缘关系,第一医疗数据包括用药史、疾病史和手术史,第二医疗数据包括遗传病治疗史;
S2、将第一医疗数据和第二医疗数据输入已训练完成的长短期记忆模型中的第一长短期记忆网络中运算,得到所述第一长短期记忆网络中的隐藏状态向量序列,所述长短期记忆模型包括用于编码的第一长短期记忆网络和用于解码的第二长短期记忆网络;
S3、将所述隐藏状态向量序列输入所述第二长短期记忆网络中运算,得到疾病预测结果,其中所述疾病预测结果包括预测疾病类型与对应的发病机率;
S4、从所述疾病预测结果中筛选出发病机率高于预设阈值的预测疾病,记为指定疾病,并根据预设的疾病关联网络获取与所述指定疾病直接连接的关联疾病,其中关联网络的网络节点为不同的疾病类型;
S5、输出所述疾病预测结果与所述关联疾病。
如上述步骤S1所述,获取目标对象的第一医疗数据和关联对象的第二医疗数据,其中所述目标对象与关联对象存在血缘关系,第一医疗数据包括用药史、疾病史和手术史,第二医疗数据包括遗传病治疗史。本申请用于预测目标对象的疾病,而关联对象的第二医疗数据用于辅助预测目标对象的疾病,第二医疗数据包括遗传病治疗史。由于遗传病与血缘相关联,具有血缘关系的对象即使没有表现出显现的遗传病特征,也可能具有隐性的生理特征(隐疾),因此关联对象的遗传病治疗史有助于预测目标对象的疾病。其中第一医疗数据包括用药史、疾病史和手术史,由于用药史、疾病史和手术史会对人体具有影响,可以作为疾病预测的依据。例如,对于曾经采用吡格列酮、卡托普利和尼群地平作为治疗糖尿病、高血压和房颤的历史用药的病人,其在未来有可能患有心梗、冠心病、脑卒中等风险。传统技术仅采取一对一分析的策略,即仅考虑目标对象的医疗数据对目标对象将来的疾病进行预测,而本申请还采用关联对象的第二医疗数据作为预测用数据,以增加预测准确度。并且传统技术仅对单一疾病进行建模,因此只能预测单一疾病,而本申请能实现多种疾病在不同时间区段内的预测。并且本申请的医疗数据不仅包括目标对象的第一医疗数据,还考虑关联对象的第二医疗,使预测更精准,且克服了目标对象慌报病史、隐瞒家族病史的缺陷(即即使目标对象身上的遗传病为隐性表示,但在基因层面上也是存在的,在将来可能呈显性表性,并天也可能存在对身体机能、其他疾病产生影响,因此通过关联对象的第二医疗数据,以获知目标对象的遗传病等数据,从而提高疾病预测的准确性)。
如上述步骤S2所述,将第一医疗数据和第二医疗数据输入已训练完成的长短期记忆模型中的第一长短期记忆网络中运算,得到所述第一长短期记忆网络中的隐藏状态向量序列,所述长短期记忆模型包括用于编码的第一长短期记忆网络和用于解码的第二长短期记忆网络。长短期记忆模型是使用长短期记忆网络的模型,其中长短期记忆网络是一种时间递归神经网络,适合于处理和预测时间序列中间隔和延迟相对较长的重要事件,相对于普通的循环神经网络,其加入了一个判断信息有用与否的“处理器”,只有符合算法认证的信息才会留下,不符的信息则通过遗忘门被遗忘,从而解决了长序依赖问题。本申请采用的长短期记忆模型包括用于编码的第一长短期记忆网络和用于解码的第二长短期记忆网络,以实现多种疾病的时序性预测。其中,本申请中的编码指将输入的信息转换为指定长度的向量序列,解码指将输入的向量序列转换为预测的向量序列。
如上述步骤S3所述,将所述隐藏状态向量序列输入所述第二长短期记忆网络中运算,得到疾病预测结果,其中所述疾病预测结果包括预测疾病类型与对应的发病机率。其中第二长短期记忆网络可采用任意的方法进行运算,例如采用公式:
Figure PCTCN2019103547-appb-000001
e ij=score(s i,h j),
Figure PCTCN2019103547-appb-000002
其中c i为所述第一长短期记忆网络中最终隐藏状态向量c i,a ij为权重参数,其中共有n个时间段,si为所述第二长短期记忆网络中的第i个隐藏状态向量,score(s i,h j)指采用预设的score函数根据si和hj计算出的分数,WC为权值,p为输出概率,yt为第二长短期记忆网络的对应第t个时间段的输出,x为输入(与第一医疗数据和第二医疗数据直接相关)。
如上述步骤S4所述,从所述疾病预测结果中筛选出发病机率高于预设阈值的预测疾病,记为指定疾病,并根据预设的疾病关联网络获取与所述指定疾病直接连接的关联疾病,其中关联网络的网络节点为不同的疾病类型。其中,所述疾病关联网络可以为任意关联网络,例如为知识图谱网络,所述知识图谱网络的构建方式例如为:采用预设的知识图谱构建工具从预先收集的指定信息中识别出初始实体,其中所述指定信息至少记载了所述指定疾病,所述初始实体至少包括所述指定疾病;对所述初始实体进行去重处理,从而获取最终实体;从所述指定信息中提取出最终实体之间的关系,从而形成三元组,并依据所述三元组生成所述知识图谱网络。从而在长短期记忆模型的基础上,进一步进行预测,进一步提高预测的准确性。
如上述步骤S5所述,输出所述疾病预测结果与所述关联疾病。其中所述疾病预测结果是所述长短期记忆模型的输出结果,关联疾病是疾病关联网络的输出结果,从而通过结合所述长短期记忆模型和疾病关联网络,进一步提高预测的准确性。
在一个实施方式中,所述将第一医疗数据和第二医疗数据输入已训练完成的长短期记忆模型中的第一长短期记忆网络中运算,得到所述第一长短期记忆网络中的隐藏状态向量序列的步骤S2,包括:
S201、将所述第一医疗数据根据预设的时间段分为多段数据序列;
S202、根据预设的遗传病与其他疾病的影响因子对应关系,获取所述第二医疗数据中的遗传病对其他疾病的指定影响因子;
S203、将所述多段数据序列与所述指定影响因子输入已训练完成的长短期记忆模型中的第一长短期记忆网络中运算,得到所述第一长短期记忆网络中的隐藏状态向量序列。
如上所述,实现了得到所述第一长短期记忆网络中的隐藏状态向量序列。其中所述指定影响因子在不同时间段中取值相同或者取值动态变化(因为在随着时间的变化,遗传病对其他疾病的影响程度也随之变化)。其中第一医疗数据是一段时间内的医疗数据,将其分为不同时间段内的多段数据序列,并且为了将第二医疗数据容纳入长短期记忆模型中,本申请将第二医疗数据反应为指定影响因子,从而将所述多段数据序列与所述指定影响因子共同作为长短期记忆模型的输入。具体地,单段数据与对应的指定影响因子将生成一个高维向量,因此多段数据序列与分别对应的指定影响因子将生成高维向量序列,从而作为已训练完成的长短期记忆模型中的计算基础。其他疾病指除所述遗传病之外的疾病。所述影响因子指将遗传病对其他疾病的影响数值化得到的数据,用于在长短期记忆网络进行计算,所述影响因子可以任意形式存在,例如以分离的向量形式。
在一个实施方式中,所述将所述多段数据序列与所述指定影响因子输入已训练完成的长短期记忆模型中的第一长短期记忆网络中运算,得到所述第一长短期记忆网络中的隐藏状态向量序列的步骤S203,包括:
S2031、根据公式:h t=LSTM enc(x t,h t-1),获得所述第一长短期记忆网络中的隐藏状态向量h t,其中t为第t个时间段,h t为对应于第t个时间段的隐藏状态向量,h t-1为对应于第t-1个时间段的隐藏状态向量,X t为第t个时间段的输入数据,LSTM enc指利用第一长短期记忆网络进行编码运算,其中X t包括第t个时间段的第一医疗数据和第t个时间段的指定影响因子;
S2032、将多个预设的时间段对应的隐藏状态向量构成隐藏状态向量序列h 1、h 2…、h n,其中共有n个时间段。
如上所述,实现了将所述多段数据序列与所述指定影响因子输入已训练完成的长短期记忆模型中的第一长短期记忆网络中运算,得到所述第一长短期记忆网络中的隐藏状态向量序列。本申请采用公式:h t=LSTM enc(x t,h t-1),获得所述第一长短期记忆网络中的隐藏状态向量h t,并将多个预设的时间段对应的隐藏状态向量构成隐藏状态向量序列h 1、h 2…、h n,其中共有n个时间段。据此,第一长短期记忆网络将所述多段数据序列与所述指定影响因子编码为隐藏状态向量序列,作为第二长短期记忆网络的解码基础。
在一个实施方式中,所述将多个预设的时间段对应的隐藏状态向量构成隐藏状态向量序列h 1、h 2…、h n,其中共有n个时间段的步骤S2032,包括:
S20321、根据公式:
Figure PCTCN2019103547-appb-000003
e ij=score(s i,h j),获取所述第一长短期记忆网络中最终隐藏状态向量c i,a ij为权重参数,其中共有n个时间段,s i为所述第二长短期记忆网络中的第i个隐藏状态向量,score(s i,h j)指采用预设的score函数根据s i和h j计算出的分数;
S20322、将多个预设的时间段对应的最终隐藏状态向量构成隐藏状态向量序列c 1、c 2…、c n
如上所述,实现了将多个预设的时间段对应的隐藏状态向量构成隐藏状态向量序列h 1、h 2…、h n。本申请根据公式:
Figure PCTCN2019103547-appb-000004
e ij=score(s i,h j),获取所述第一长短期记忆网络中最终隐藏状态向量c i,也即是引入了注意力机制自动捕捉对结局重要的信息,从而将最终隐藏状态向量序列作为第二长短期记忆网络的解码基础。由于采用了注意力机制,使得权重分配更为准确,有利于改善预测的准确性。
在一个实施方式中,所述将所述隐藏状态向量序列输入所述第二长短期记忆网络中运算,得到疾病预测结果,其中所述疾病预测结果包括预测疾病类型与对应的发病机率的步骤S3,包括:
S301、将所述隐藏状态向量序列输入所述第二长短期记忆网络中运算,从而得到所述第二长短期记忆网络输出的高维向量序列;
S302、根据预设的分向量与预测结果含义对应关系,解读所述高维向量序列,从而得到在将来不同时间段内的疾病预测结果,其中所述疾病预测结果包括预测疾病类型与对应的发病机率。
如上所述,实现了将所述隐藏状态向量序列输入所述第二长短期记忆网络中运算,得到疾病预测结果。其中所述第二长短期记忆网络输出的是高维向量序列,所述高维向量序列代表了不同时间段的预测结果,其中的高维向量的分向量代表了预测的疾病类型、对应的患病机率。根据预设的分向量与预测结果含义对应关系,即可获知将来不同时间段内的预测疾病类型与对应的发病机率。
在一个实施方式中,所述将所述隐藏状态向量序列输入所述第二长短期记忆网络中运算,得到疾病预测结果,其中所述疾病预测结果包括预测疾病类型与对应的发病机率的步骤S3之后,包括:
S311、接收输入的多个改善因素组,并将所述改善因素组与第一医疗数据和第二医疗数据输入已训练完成的长短期记忆模型中进行计算,其中所述改善因素组包括在指定时间点进行用药或者手术;
S312、获取所述长短期记忆模型输出的与所述多个改善因素组分别对应的多组改善疾病预测结果,其中所述改善疾病预测结果包括预测疾病类型与对应的发病机率;
S313、根据预设的选择规则,从所述多组改善疾病预测结果中筛选出最终改善疾病预测结果,并生成治疗推荐方案,其中所述治疗推荐方案附有所述最终改善疾病预测结果对应的改善因素组。
如上所述,实现了生成治疗推荐方案。由于本申请的输入为包括用药史和手术史的医疗数据,因此也可以接受假设的用药或者手术,进而本申请的长短期记忆模型可以模拟治疗方案的疗效。因此接收输入的多个改善因素组,并将所述改善因素组与第一医疗数据和第二医疗数据输入已训练完成的长短期记忆模型中进行计算,其中所述改善因素组包括在指定时间点进行用药或者手术;获取所述长短期记忆模型输出的与所述多个改善因素组分别对应的多组改善疾病预测结果,其中所述改善疾病预测结果包括预测疾病类型与对应的发病机率;根据预设的选择规则,从所述多组改善疾病预测结果中筛选出最终改善疾病预测结果,并生成治疗推荐方案,其中所述治疗推荐方案附有所述最终改善疾病预测结果对应的改善因素组。其中预设的选择规则例如为预测的疾病类型最少,或者预测的疾病类型的发病率均小于预设的阈值等。
在一个实施方式中,所述疾病关联网络为知识图谱网络,所述根据预设的疾病关联网络获取与所述指定疾病直接连接的关联疾病,其中关联网络的网络节点为不同的疾病类型的步骤S4之前,包括:
S321、采用预设的知识图谱构建工具从预先收集的指定信息中识别出初始实体,其中所述指定信息至少记载了所述指定疾病,所述初始实体至少包括所述指定疾病;
S322、对所述初始实体进行去重处理,从而获取最终实体;
S323、从所述指定信息中提取出最终实体之间的关系,从而形成三元组,并依据所述三元组生成所述知识图谱网络。
如上所述,实现了构建所述包括所述指定成员的知识图谱。其中预设的知识图谱构建工具可以为任意工具,任如为现有的SPSS、UcinetNetDraw、VOSviewer等等,由于上述工具是现有的知识图谱构建工具,因此不再赘述。其中指定信息记载了疾病信息,据此可以获知疾病之间的关联关系。其中所述实体即是知识图谱中的知识节点,初始实体指未经过去重处理的知识节点。识别出初始实体的过程例如为:对指定信息进行分词处理,从而获得由多个词构成的词序列,将所述词序列输入预设的语句结构模型,从而在所述词序列中获取初始实体。再对所述初始实体进行去重处理,从而获取最终实体。去重处理的过程例如为:对所有初始实体进行同义词判断,将属于同一同义词组的初始实体替换为所述同义词组中的一个词汇。再从所述指定信息中提取出最终实体之间的关系,从而形成三元组,并依据所述三元组生成所述包括所述指定成员的知识图谱。其中三元组例如指两个实体之间的关系。其中,所述从所述指定信息中提取出最终实体之间的关系的方法例如:将所述指定信息套入预设的语句结构,从而通过所述语句结构将表述多个实体间的关系的词汇提取出来。据此,将疾病之间的相互关系以知识图谱的网络形式 表示出来,其中疾病类型作为知识图谱中的一个知识节点。进一步地,所述知识图谱的知识节点还可以包括非疾病类型的实体。
本申请的基于长短期记忆模型的疾病预测方法,获取目标对象的第一医疗数据和关联对象的第二医疗数据;将第一医疗数据和第二医疗数据输入已训练完成的长短期记忆模型中的第一长短期记忆网络中运算,得到所述第一长短期记忆网络中的隐藏状态向量序列;将所述隐藏状态向量序列输入所述第二长短期记忆网络中运算,得到疾病预测结果;从所述疾病预测结果中筛选出发病机率高于预设阈值的预测疾病,记为指定疾病,并根据预设的疾病关联网络获取与所述指定疾病直接连接的关联疾病;输出所述疾病预测结果与所述关联疾病,从而提高预测的准确性。
参照图2,本申请实施例提供一种基于长短期记忆模型的疾病预测装置,包括:
医疗数据获取单元10,用于获取目标对象的第一医疗数据和关联对象的第二医疗数据,其中所述目标对象与关联对象存在血缘关系,第一医疗数据包括用药史、疾病史和手术史,第二医疗数据包括遗传病治疗史;
隐藏状态向量序列获取单元20,用于将第一医疗数据和第二医疗数据输入已训练完成的长短期记忆模型中的第一长短期记忆网络中运算,得到所述第一长短期记忆网络中的隐藏状态向量序列,所述长短期记忆模型包括用于编码的第一长短期记忆网络和用于解码的第二长短期记忆网络;
疾病预测结果获取单元30,用于将所述隐藏状态向量序列输入所述第二长短期记忆网络中运算,得到疾病预测结果,其中所述疾病预测结果包括预测疾病类型与对应的发病机率;
关联疾病获取单元40,用于从所述疾病预测结果中筛选出发病机率高于预设阈值的预测疾病,记为指定疾病,并根据预设的疾病关联网络获取与所述指定疾病直接连接的关联疾病,其中关联网络的网络节点为不同的疾病类型;
输出单元50,用于输出所述疾病预测结果与所述关联疾病。
其中上述单元分别用于执行的操作与前述实施方式的基于长短期记忆模型的疾病预测方法的步骤一一对应,在此不再赘述。
在一个实施方式中,所述隐藏状态向量序列获取单元20,包括:
多段数据序列获取子单元,用于将所述第一医疗数据根据预设的时间段分为多段数据序列;
指定影响因子获取子单元,用于根据预设的遗传病与其他疾病的影响因子对应关系,获取所述第二医疗数据中的遗传病对其他疾病的指定影响因子;
隐藏状态向量序列获取子单元,用于将所述多段数据序列与所述指定影响因子输入已训练完成的长短期记忆模型中的第一长短期记忆网络中运算,得到所述第一长短期记忆网络中的隐藏状态向量序列。
其中上述子单元分别用于执行的操作与前述实施方式的基于长短期记忆模型的疾病预测方法的步骤一一对应,在此不再赘述。
在一个实施方式中,所述隐藏状态向量序列获取子单元,包括:
隐藏状态向量计算模块,用于根据公式:h t=LSTM enc(x t,h t-1),获得所述第一长短期记忆网络中的隐藏状态向量h t,其中t为第t个时间段,h t为对应于第t个时间段的隐藏状态向量,h t-1为对应于第t-1个时间段的隐藏状态向量,X t为第t个时间段的输入数据,LSTM enc指利用第一长短期记忆网络进行编码运算,其中X t包括第t个时间段的第一医疗数据和第t个时间段的指定影响因子;
隐藏状态向量序列获取模块,用于将多个预设的时间段对应的隐藏状态向量构成隐藏状态向量序列h 1、h 2…、h n,其中共有n个时间段。
其中上述模块分别用于执行的操作与前述实施方式的基于长短期记忆模型的疾病预测方法的步骤一一对应,在此不再赘述。
在一个实施方式中,所述隐藏状态向量序列获取模块,包括:
最终隐藏状态向量获取子模块,用于根据公式:
Figure PCTCN2019103547-appb-000005
e ij=score(s i,h j),获取所述第一长短期记忆网络中最终隐藏状态向量c i,a ij为权重参数,其中共有n个时间段,s i为所述第二长短期记忆网络中的第i个隐藏状态向量,score(s i,h j)指采用预设的score函数根据s i和h j计算出的分数;
隐藏状态向量序列获取子模块,用于将多个预设的时间段对应的最终隐藏状态向量构成隐藏状态向量序列c 1、c 2…、c n
其中上述子模块分别用于执行的操作与前述实施方式的基于长短期记忆模型的疾病预测方法的步骤一一对应,在此不再赘述。
在一个实施方式中,所述疾病预测结果获取单元30,包括:
高维向量序列获取子单元,用于将所述隐藏状态向量序列输入所述第二长短期记忆网络中运算,从而得到所述第二长短期记忆网络输出的高维向量序列;
疾病预测结果获取子单元,用于根据预设的分向量与预测结果含义对应关系,解读所述高维向量序列,从而得到在将来不同时间段内的疾病预测结果,其中所述疾病预测结果包括预测疾病类型与对应的 发病机率。
其中上述子单元分别用于执行的操作与前述实施方式的基于长短期记忆模型的疾病预测方法的步骤一一对应,在此不再赘述。
在一个实施方式中,所述装置,包括:
改善因素组接收单元,用于接收输入的多个改善因素组,并将所述改善因素组与第一医疗数据和第二医疗数据输入已训练完成的长短期记忆模型中进行计算,其中所述改善因素组包括在指定时间点进行用药或者手术;
改善疾病预测结果获取单元,用于获取所述长短期记忆模型输出的与所述多个改善因素组分别对应的多组改善疾病预测结果,其中所述改善疾病预测结果包括预测疾病类型与对应的发病机率;
生成治疗推荐方案单元,用于根据预设的选择规则,从所述多组改善疾病预测结果中筛选出最终改善疾病预测结果,并生成治疗推荐方案,其中所述治疗推荐方案附有所述最终改善疾病预测结果对应的改善因素组。
其中上述单元分别用于执行的操作与前述实施方式的基于长短期记忆模型的疾病预测方法的步骤一一对应,在此不再赘述。
在一个实施方式中,所述疾病关联网络为知识图谱网络,所述装置,包括:
初始实体识别单元,用于采用预设的知识图谱构建工具从预先收集的指定信息中识别出初始实体,其中所述指定信息至少记载了所述指定疾病,所述初始实体至少包括所述指定疾病;
最终实体获取单元,用于对所述初始实体进行去重处理,从而获取最终实体;
知识图谱网络生成单元,用于从所述指定信息中提取出最终实体之间的关系,从而形成三元组,并依据所述三元组生成所述知识图谱网络。
其中上述单元分别用于执行的操作与前述实施方式的基于长短期记忆模型的疾病预测方法的步骤一一对应,在此不再赘述。
本申请的基于长短期记忆模型的疾病预测装置,获取目标对象的第一医疗数据和关联对象的第二医疗数据;将第一医疗数据和第二医疗数据输入已训练完成的长短期记忆模型中的第一长短期记忆网络中运算,得到所述第一长短期记忆网络中的隐藏状态向量序列;将所述隐藏状态向量序列输入所述第二长短期记忆网络中运算,得到疾病预测结果;从所述疾病预测结果中筛选出发病机率高于预设阈值的预测疾病,记为指定疾病,并根据预设的疾病关联网络获取与所述指定疾病直接连接的关联疾病;输出所述 疾病预测结果与所述关联疾病,从而提高预测的准确性。
参照图3,本发明实施例中还提供一种计算机设备,该计算机设备可以是服务器,其内部结构可以如图所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设计的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储基于长短期记忆模型的疾病预测方法所用数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种基于长短期记忆模型的疾病预测方法。
上述处理器执行上述基于长短期记忆模型的疾病预测方法,其中所述方法包括的步骤分别与执行前述实施方式的基于长短期记忆模型的疾病预测方法的步骤一一对应,在此不再赘述。
本领域技术人员可以理解,图中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定。
本申请一实施例还提供一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现基于长短期记忆模型的疾病预测方法,其中所述方法包括的步骤分别与执行前述实施方式的基于长短期记忆模型的疾病预测方法的步骤一一对应,在此不再赘述。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的和实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可以包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双速据率SDRAM(SSRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。

Claims (20)

  1. 一种基于长短期记忆模型的疾病预测方法,其特征在于,包括:
    获取目标对象的第一医疗数据和关联对象的第二医疗数据,其中所述目标对象与关联对象存在血缘关系,第一医疗数据包括用药史、疾病史和手术史,第二医疗数据包括遗传病治疗史;
    将第一医疗数据和第二医疗数据输入已训练完成的长短期记忆模型中的第一长短期记忆网络中运算,得到所述第一长短期记忆网络中的隐藏状态向量序列,所述长短期记忆模型包括用于编码的第一长短期记忆网络和用于解码的第二长短期记忆网络;
    将所述隐藏状态向量序列输入所述第二长短期记忆网络中运算,得到疾病预测结果,其中所述疾病预测结果包括预测疾病类型与对应的发病机率;
    从所述疾病预测结果中筛选出发病机率高于预设阈值的预测疾病,记为指定疾病,并根据预设的疾病关联网络获取与所述指定疾病直接连接的关联疾病,其中关联网络的网络节点为不同的疾病类型;
    输出所述疾病预测结果与所述关联疾病。
  2. 根据权利要求1所述的基于长短期记忆模型的疾病预测方法,其特征在于,所述将第一医疗数据和第二医疗数据输入已训练完成的长短期记忆模型中的第一长短期记忆网络中运算,得到所述第一长短期记忆网络中的隐藏状态向量序列的步骤,包括:
    将所述第一医疗数据根据预设的时间段分为多段数据序列;
    根据预设的遗传病与其他疾病的影响因子对应关系,获取所述第二医疗数据中的遗传病对其他疾病的指定影响因子;
    将所述多段数据序列与所述指定影响因子输入已训练完成的长短期记忆模型中的第一长短期记忆网络中运算,得到所述第一长短期记忆网络中的隐藏状态向量序列。
  3. 根据权利要求2所述的基于长短期记忆模型的疾病预测方法,其特征在于,所述将所述多段数据序列与所述指定影响因子输入已训练完成的长短期记忆模型中的第一长短期记忆网络中运算,得到所述第一长短期记忆网络中的隐藏状态向量序列的步骤,包括:
    根据公式:h t=LSTM enc(x t,h t-1),获得所述第一长短期记忆网络中的隐藏状态向量h t,其中t为第t个时间段,h t为对应于第t个时间段的隐藏状态向量,h t-1为对应于第t-1个时间段的隐藏状态向量,X t为第t个时间段的输入数据,LSTM enc指利用第一长短期记忆网络进行编码运算,其中X t包括第t个时间段的第一医疗数据和第t个时间段的指定影响因子;
    将多个预设的时间段对应的隐藏状态向量构成隐藏状态向量序列h 1、h 2…、h n,其中共有n个时间段。
  4. 根据权利要求3所述的基于长短期记忆模型的疾病预测方法,其特征在于,所述将多个预设的时间段对应的隐藏状态向量构成隐藏状态向量序列h 1、h 2…、h n,其中共有n个时间段的步骤,包括:
    根据公式:
    Figure PCTCN2019103547-appb-100001
    e ij=score(s i,h j),获取所述第一长短期记忆网络中最终隐藏状态向量c i,a ij为权重参数,其中共有n个时间段,s i为所述第二长短期记忆网络中的第i个隐藏状态向量,score(s i,h j)指采用预设的score函数根据s i和h j计算出的分数;
    将多个预设的时间段对应的最终隐藏状态向量构成隐藏状态向量序列c 1、c 2…、c n
  5. 根据权利要求1所述的基于长短期记忆模型的疾病预测方法,其特征在于,所述将所述隐藏状态向量序列输入所述第二长短期记忆网络中运算,得到疾病预测结果,其中所述疾病预测结果包括预测疾病类型与对应的发病机率的步骤,包括:
    将所述隐藏状态向量序列输入所述第二长短期记忆网络中运算,从而得到所述第二长短期记忆网络输出的高维向量序列;
    根据预设的分向量与预测结果含义对应关系,解读所述高维向量序列,从而得到在将来不同时间段内的疾病预测结果,其中所述疾病预测结果包括预测疾病类型与对应的发病机率。
  6. 根据权利要求1所述的基于长短期记忆模型的疾病预测方法,其特征在于,所述将所述隐藏状态向量序列输入所述第二长短期记忆网络中运算,得到疾病预测结果,其中所述疾病预测结果包括预测疾病类型与对应的发病机率的步骤之后,包括:
    接收输入的多个改善因素组,并将所述改善因素组与第一医疗数据和第二医疗数据输入已训练完成的长短期记忆模型中进行计算,其中所述改善因素组包括在指定时间点进行用药或者手术;
    获取所述长短期记忆模型输出的与所述多个改善因素组分别对应的多组改善疾病预测结果,其中所述改善疾病预测结果包括预测疾病类型与对应的发病机率;
    根据预设的选择规则,从所述多组改善疾病预测结果中筛选出最终改善疾病预测结果,并生成治疗推荐方案,其中所述治疗推荐方案附有所述最终改善疾病预测结果对应的改善因素组。
  7. 根据权利要求1所述的基于长短期记忆模型的疾病预测方法,其特征在于,所述疾病关联网络为知识图谱网络,所述根据预设的疾病关联网络获取与所述指定疾病直接连接的关联疾病,其中关联网 络的网络节点为不同的疾病类型的步骤之前,包括:
    采用预设的知识图谱构建工具从预先收集的指定信息中识别出初始实体,其中所述指定信息至少记载了所述指定疾病,所述初始实体至少包括所述指定疾病;
    对所述初始实体进行去重处理,从而获取最终实体;
    从所述指定信息中提取出最终实体之间的关系,从而形成三元组,并依据所述三元组生成所述知识图谱网络。
  8. 一种基于长短期记忆模型的疾病预测装置,其特征在于,包括:
    医疗数据获取单元,用于获取目标对象的第一医疗数据和关联对象的第二医疗数据,其中所述目标对象与关联对象存在血缘关系,第一医疗数据包括用药史、疾病史和手术史,第二医疗数据包括遗传病治疗史;
    隐藏状态向量序列获取单元,用于将第一医疗数据和第二医疗数据输入已训练完成的长短期记忆模型中的第一长短期记忆网络中运算,得到所述第一长短期记忆网络中的隐藏状态向量序列,所述长短期记忆模型包括用于编码的第一长短期记忆网络和用于解码的第二长短期记忆网络;
    疾病预测结果获取单元,用于将所述隐藏状态向量序列输入所述第二长短期记忆网络中运算,得到疾病预测结果,其中所述疾病预测结果包括预测疾病类型与对应的发病机率;
    关联疾病获取单元,用于从所述疾病预测结果中筛选出发病机率高于预设阈值的预测疾病,记为指定疾病,并根据预设的疾病关联网络获取与所述指定疾病直接连接的关联疾病,其中关联网络的网络节点为不同的疾病类型;
    输出单元,用于输出所述疾病预测结果与所述关联疾病。
  9. 根据权利要求8所述的基于长短期记忆模型的疾病预测装置,其特征在于,所述隐藏状态向量序列获取单元,包括:
    多段数据序列获取子单元,用于将所述第一医疗数据根据预设的时间段分为多段数据序列;
    指定影响因子获取子单元,用于根据预设的遗传病与其他疾病的影响因子对应关系,获取所述第二医疗数据中的遗传病对其他疾病的指定影响因子;
    隐藏状态向量序列获取子单元,用于将所述多段数据序列与所述指定影响因子输入已训练完成的长短期记忆模型中的第一长短期记忆网络中运算,得到所述第一长短期记忆网络中的隐藏状态向量序列。
  10. 根据权利要求9所述的基于长短期记忆模型的疾病预测装置,其特征在于,所述隐藏状态向量 序列获取子单元,包括:
    隐藏状态向量计算模块,用于根据公式:h t=LSTM enc(x t,h t-1),获得所述第一长短期记忆网络中的隐藏状态向量h t,其中t为第t个时间段,h t为对应于第t个时间段的隐藏状态向量,h t-1为对应于第t-1个时间段的隐藏状态向量,X t为第t个时间段的输入数据,LSTM enc指利用第一长短期记忆网络进行编码运算,其中X t包括第t个时间段的第一医疗数据和第t个时间段的指定影响因子;
    隐藏状态向量序列获取模块,用于将多个预设的时间段对应的隐藏状态向量构成隐藏状态向量序列h 1、h 2…、h n,其中共有n个时间段。
  11. 根据权利要求10所述的基于长短期记忆模型的疾病预测装置,其特征在于,所述隐藏状态向量序列获取模块,包括:
    最终隐藏状态向量获取子模块,用于根据公式:
    Figure PCTCN2019103547-appb-100002
    e ij=score(s i,h j),获取所述第一长短期记忆网络中最终隐藏状态向量c i,a ij为权重参数,其中共有n个时间段,s i为所述第二长短期记忆网络中的第i个隐藏状态向量,score(s i,h j)指采用预设的score函数根据s i和h j计算出的分数;
    隐藏状态向量序列获取子模块,用于将多个预设的时间段对应的最终隐藏状态向量构成隐藏状态向量序列c 1、c 2…、c n
  12. 根据权利要求8所述的基于长短期记忆模型的疾病预测装置,其特征在于,所述疾病预测结果获取单元,包括:
    高维向量序列获取子单元,用于将所述隐藏状态向量序列输入所述第二长短期记忆网络中运算,从而得到所述第二长短期记忆网络输出的高维向量序列;
    疾病预测结果获取子单元,用于根据预设的分向量与预测结果含义对应关系,解读所述高维向量序列,从而得到在将来不同时间段内的疾病预测结果,其中所述疾病预测结果包括预测疾病类型与对应的发病机率。
  13. 根据权利要求8所述的基于长短期记忆模型的疾病预测装置,其特征在于,所述装置,包括:
    改善因素组接收单元,用于接收输入的多个改善因素组,并将所述改善因素组与第一医疗数据和第二医疗数据输入已训练完成的长短期记忆模型中进行计算,其中所述改善因素组包括在指定时间点进行用药或者手术;
    改善疾病预测结果获取单元,用于获取所述长短期记忆模型输出的与所述多个改善因素组分别对应 的多组改善疾病预测结果,其中所述改善疾病预测结果包括预测疾病类型与对应的发病机率;
    生成治疗推荐方案单元,用于根据预设的选择规则,从所述多组改善疾病预测结果中筛选出最终改善疾病预测结果,并生成治疗推荐方案,其中所述治疗推荐方案附有所述最终改善疾病预测结果对应的改善因素组。
  14. 根据权利要求8所述的基于长短期记忆模型的疾病预测装置,其特征在于,所述疾病关联网络为知识图谱网络,所述装置,包括:
    初始实体识别单元,用于采用预设的知识图谱构建工具从预先收集的指定信息中识别出初始实体,其中所述指定信息至少记载了所述指定疾病,所述初始实体至少包括所述指定疾病;
    最终实体获取单元,用于对所述初始实体进行去重处理,从而获取最终实体;
    知识图谱网络生成单元,用于从所述指定信息中提取出最终实体之间的关系,从而形成三元组,并依据所述三元组生成所述知识图谱网络。
  15. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其特征在于,所述处理器执行所述计算机程序时实现基于长短期记忆模型的疾病预测方法,所述基于长短期记忆模型的疾病预测方法,包括:
    获取目标对象的第一医疗数据和关联对象的第二医疗数据,其中所述目标对象与关联对象存在血缘关系,第一医疗数据包括用药史、疾病史和手术史,第二医疗数据包括遗传病治疗史;
    将第一医疗数据和第二医疗数据输入已训练完成的长短期记忆模型中的第一长短期记忆网络中运算,得到所述第一长短期记忆网络中的隐藏状态向量序列,所述长短期记忆模型包括用于编码的第一长短期记忆网络和用于解码的第二长短期记忆网络;
    将所述隐藏状态向量序列输入所述第二长短期记忆网络中运算,得到疾病预测结果,其中所述疾病预测结果包括预测疾病类型与对应的发病机率;
    从所述疾病预测结果中筛选出发病机率高于预设阈值的预测疾病,记为指定疾病,并根据预设的疾病关联网络获取与所述指定疾病直接连接的关联疾病,其中关联网络的网络节点为不同的疾病类型;
    输出所述疾病预测结果与所述关联疾病。
  16. 根据权利要求15所述的计算机设备,其特征在于,所述将第一医疗数据和第二医疗数据输入已训练完成的长短期记忆模型中的第一长短期记忆网络中运算,得到所述第一长短期记忆网络中的隐藏状态向量序列的步骤,包括:
    将所述第一医疗数据根据预设的时间段分为多段数据序列;
    根据预设的遗传病与其他疾病的影响因子对应关系,获取所述第二医疗数据中的遗传病对其他疾病的指定影响因子;
    将所述多段数据序列与所述指定影响因子输入已训练完成的长短期记忆模型中的第一长短期记忆网络中运算,得到所述第一长短期记忆网络中的隐藏状态向量序列。
  17. 根据权利要求16所述的计算机设备,其特征在于,所述将所述多段数据序列与所述指定影响因子输入已训练完成的长短期记忆模型中的第一长短期记忆网络中运算,得到所述第一长短期记忆网络中的隐藏状态向量序列的步骤,包括:
    根据公式:h t=LSTM enc(x t,h t-1),获得所述第一长短期记忆网络中的隐藏状态向量h t,其中t为第t个时间段,h t为对应于第t个时间段的隐藏状态向量,h t-1为对应于第t-1个时间段的隐藏状态向量,X t为第t个时间段的输入数据,LSTM enc指利用第一长短期记忆网络进行编码运算,其中X t包括第t个时间段的第一医疗数据和第t个时间段的指定影响因子;
    将多个预设的时间段对应的隐藏状态向量构成隐藏状态向量序列h 1、h 2…、h n,其中共有n个时间段。
  18. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现基于长短期记忆模型的疾病预测方法,所述基于长短期记忆模型的疾病预测方法,包括:
    获取目标对象的第一医疗数据和关联对象的第二医疗数据,其中所述目标对象与关联对象存在血缘关系,第一医疗数据包括用药史、疾病史和手术史,第二医疗数据包括遗传病治疗史;
    将第一医疗数据和第二医疗数据输入已训练完成的长短期记忆模型中的第一长短期记忆网络中运算,得到所述第一长短期记忆网络中的隐藏状态向量序列,所述长短期记忆模型包括用于编码的第一长短期记忆网络和用于解码的第二长短期记忆网络;
    将所述隐藏状态向量序列输入所述第二长短期记忆网络中运算,得到疾病预测结果,其中所述疾病预测结果包括预测疾病类型与对应的发病机率;
    从所述疾病预测结果中筛选出发病机率高于预设阈值的预测疾病,记为指定疾病,并根据预设的疾病关联网络获取与所述指定疾病直接连接的关联疾病,其中关联网络的网络节点为不同的疾病类型;
    输出所述疾病预测结果与所述关联疾病。
  19. 根据权利要求18所述的计算机可读存储介质,其特征在于,所述将第一医疗数据和第二医疗 数据输入已训练完成的长短期记忆模型中的第一长短期记忆网络中运算,得到所述第一长短期记忆网络中的隐藏状态向量序列的步骤,包括:
    将所述第一医疗数据根据预设的时间段分为多段数据序列;
    根据预设的遗传病与其他疾病的影响因子对应关系,获取所述第二医疗数据中的遗传病对其他疾病的指定影响因子;
    将所述多段数据序列与所述指定影响因子输入已训练完成的长短期记忆模型中的第一长短期记忆网络中运算,得到所述第一长短期记忆网络中的隐藏状态向量序列。
  20. 根据权利要求19所述的计算机可读存储介质,其特征在于,所述将所述多段数据序列与所述指定影响因子输入已训练完成的长短期记忆模型中的第一长短期记忆网络中运算,得到所述第一长短期记忆网络中的隐藏状态向量序列的步骤,包括:
    根据公式:h t=LSTM enc(x t,h t-1),获得所述第一长短期记忆网络中的隐藏状态向量h t,其中t为第t个时间段,h t为对应于第t个时间段的隐藏状态向量,h t-1为对应于第t-1个时间段的隐藏状态向量,X t为第t个时间段的输入数据,LSTM enc指利用第一长短期记忆网络进行编码运算,其中X t包括第t个时间段的第一医疗数据和第t个时间段的指定影响因子;
    将多个预设的时间段对应的隐藏状态向量构成隐藏状态向量序列h 1、h 2…、h n,其中共有n个时间段。
PCT/CN2019/103547 2019-06-27 2019-08-30 基于长短期记忆模型的疾病预测方法、装置和计算机设备 WO2020220545A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US17/264,299 US11710571B2 (en) 2019-06-27 2019-08-30 Long short-term memory model-based disease prediction method and apparatus, and computer device
SG11202008385YA SG11202008385YA (en) 2019-06-27 2019-08-30 Disease prediction method and apparatus based on long short-term memory model, and computer device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910570055.9A CN110459324B (zh) 2019-06-27 2019-06-27 基于长短期记忆模型的疾病预测方法、装置和计算机设备
CN201910570055.9 2019-06-27

Publications (1)

Publication Number Publication Date
WO2020220545A1 true WO2020220545A1 (zh) 2020-11-05

Family

ID=68481784

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/103547 WO2020220545A1 (zh) 2019-06-27 2019-08-30 基于长短期记忆模型的疾病预测方法、装置和计算机设备

Country Status (4)

Country Link
US (1) US11710571B2 (zh)
CN (1) CN110459324B (zh)
SG (1) SG11202008385YA (zh)
WO (1) WO2020220545A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113707323A (zh) * 2021-08-31 2021-11-26 平安科技(深圳)有限公司 基于机器学习的疾病预测方法、装置、设备及介质
CN117594241A (zh) * 2024-01-15 2024-02-23 北京邮电大学 基于时序知识图谱邻域推理的透析低血压预测方法及装置

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111161880B (zh) * 2019-12-23 2022-12-02 深圳平安医疗健康科技服务有限公司 基于分类模型的医疗信息分类方法、装置和计算机设备
CN111476092B (zh) * 2020-03-05 2023-07-21 平安科技(深圳)有限公司 基于车联网的数据存储方法、装置和计算机设备
CN111785370B (zh) * 2020-07-01 2024-05-17 医渡云(北京)技术有限公司 病历数据处理方法及装置、计算机存储介质、电子设备
CN112002410A (zh) * 2020-08-20 2020-11-27 医渡云(北京)技术有限公司 传染病疾病状态预测方法及装置、存储介质、电子设备
CN111899883B (zh) * 2020-09-29 2020-12-15 平安科技(深圳)有限公司 少样本或零样本的疾病预测设备、方法、装置及存储介质
CN111933303B (zh) * 2020-09-30 2021-01-15 平安科技(深圳)有限公司 事件预测方法、装置、电子设备及存储介质
CN112102950B (zh) * 2020-11-04 2021-02-12 平安科技(深圳)有限公司 一种数据处理系统、方法、装置及存储介质
CN112447298A (zh) * 2020-11-24 2021-03-05 平安科技(深圳)有限公司 基于联邦迁移学习的神经退行性疾病建模装置及相关设备
CN113241178B (zh) * 2021-05-28 2023-06-27 温州康宁医院股份有限公司 一种确定被测者的抑郁症严重程度的装置
CN113270182B (zh) * 2021-07-20 2021-09-28 武汉泰乐奇信息科技有限公司 基于长短期记忆网络的共同医疗访问合约生成方法和系统
CN113688119B (zh) * 2021-08-24 2023-09-12 深圳平安智慧医健科技有限公司 基于人工智能的医疗数据库构建方法及相关设备
CN113679348B (zh) * 2021-08-26 2024-02-06 深圳平安智慧医健科技有限公司 血糖预测方法、血糖预测装置、设备及存储介质
CN114093423A (zh) * 2021-11-29 2022-02-25 竹石生物科技(苏州)有限公司 病变dna识别方法、装置、电子设备及存储介质
CN114022058A (zh) * 2022-01-06 2022-02-08 成都晓多科技有限公司 基于时序知识图谱的中小企业失信风险预测方法
US20230420127A1 (en) * 2022-06-26 2023-12-28 Danika Gupta Multi-modal machine learning medical assessment
CN115886766A (zh) * 2022-11-29 2023-04-04 重庆理工大学 一种基于注意力机制与ctg图像的胎儿、新生儿缺氧无创诊断系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107145746A (zh) * 2017-05-09 2017-09-08 北京大数医达科技有限公司 一种病情描述的智能分析方法及系统
CN109147954A (zh) * 2018-07-26 2019-01-04 南京邮电大学 基于知识图谱的患者信息处理装置
US20190138691A1 (en) * 2017-11-08 2019-05-09 International Business Machines Corporation Personalized risk prediction based on intrinsic and extrinsic factors
CN109785971A (zh) * 2019-01-30 2019-05-21 华侨大学 一种基于先验医学知识的疾病风险预测方法

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11432778B2 (en) * 2017-01-24 2022-09-06 General Electric Company Methods and systems for patient monitoring
CN106980899B (zh) * 2017-04-01 2020-11-17 北京昆仑医云科技有限公司 预测血管树血管路径上的血流特征的深度学习模型和系统
US10636518B2 (en) * 2017-08-08 2020-04-28 Virgo Surgical Video Solutions, Inc. Automated medical note generation system utilizing text, audio and video data
EP3536245A1 (en) * 2018-03-08 2019-09-11 Koninklijke Philips N.V. A system and method of identifying characteristics of ultrasound images
US10765409B2 (en) * 2018-06-28 2020-09-08 Fitbit, Inc. Menstrual cycle tracking
CN109599177B (zh) * 2018-11-27 2023-04-11 华侨大学 一种基于病历的深度学习预测医疗轨迹的方法
CN109754852A (zh) * 2019-01-08 2019-05-14 中南大学 基于电子病历的心血管疾病风险预测方法
US11557380B2 (en) * 2019-02-18 2023-01-17 Merative Us L.P. Recurrent neural network to decode trial criteria
US20200293882A1 (en) * 2019-03-15 2020-09-17 Samsung Electronics Co., Ltd. Near-infrared spectroscopy (nir) based glucose prediction using deep learning
US11366985B2 (en) * 2020-05-15 2022-06-21 Retrace Labs Dental image quality prediction platform using domain specific artificial intelligence

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107145746A (zh) * 2017-05-09 2017-09-08 北京大数医达科技有限公司 一种病情描述的智能分析方法及系统
US20190138691A1 (en) * 2017-11-08 2019-05-09 International Business Machines Corporation Personalized risk prediction based on intrinsic and extrinsic factors
CN109147954A (zh) * 2018-07-26 2019-01-04 南京邮电大学 基于知识图谱的患者信息处理装置
CN109785971A (zh) * 2019-01-30 2019-05-21 华侨大学 一种基于先验医学知识的疾病风险预测方法

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113707323A (zh) * 2021-08-31 2021-11-26 平安科技(深圳)有限公司 基于机器学习的疾病预测方法、装置、设备及介质
CN113707323B (zh) * 2021-08-31 2024-05-14 平安科技(深圳)有限公司 基于机器学习的疾病预测方法、装置、设备及介质
CN117594241A (zh) * 2024-01-15 2024-02-23 北京邮电大学 基于时序知识图谱邻域推理的透析低血压预测方法及装置
CN117594241B (zh) * 2024-01-15 2024-04-30 北京邮电大学 基于时序知识图谱邻域推理的透析低血压预测方法及装置

Also Published As

Publication number Publication date
SG11202008385YA (en) 2020-12-30
CN110459324B (zh) 2023-05-23
US11710571B2 (en) 2023-07-25
US20210296002A1 (en) 2021-09-23
CN110459324A (zh) 2019-11-15

Similar Documents

Publication Publication Date Title
WO2020220545A1 (zh) 基于长短期记忆模型的疾病预测方法、装置和计算机设备
CN108986908B (zh) 问诊数据处理方法、装置、计算机设备和存储介质
JP7305656B2 (ja) 確率分布をモデル化するためのシステムおよび方法
Bashir et al. BagMOOV: A novel ensemble for heart disease prediction bootstrap aggregation with multi-objective optimized voting
CN110119775B (zh) 医疗数据处理方法、装置、系统、设备和存储介质
CN104572583B (zh) 用于数据致密化的方法和系统
WO2021151327A1 (zh) 分诊数据处理方法、装置、设备及介质
CN109036545B (zh) 医疗信息处理方法、装置、计算机设备和存储介质
CN108062556B (zh) 一种药物-疾病关系识别方法、系统和装置
Muhammed Using data mining technique to diagnosis heart disease
Marathe et al. Prediction of heart disease and diabetes using naive Bayes algorithm
CN113345564B (zh) 一种基于图神经网络的患者住院时长早期预测方法及装置
CN114191665A (zh) 机械通气过程中人机异步现象的分类方法和分类装置
WO2024131026A1 (zh) 模型构建方法、装置、设备和介质
CN112735542A (zh) 基于临床试验数据的数据处理方法及系统
CN110473636B (zh) 基于深度学习的智能医嘱推荐方法及系统
CN109493975B (zh) 基于xgboost模型的慢性病复发预测方法、装置和计算机设备
Pezzuto et al. Learning cardiac activation maps from 12-lead ECG with multi-fidelity Bayesian optimization on manifolds
CN112035361B (zh) 医疗诊断模型的测试方法、装置、计算机设备和存储介质
Su et al. Efficient Bayesian metamodeling for fine-grained and robust fragility analysis of buildings at a regional scale
Kasabe et al. Cardio Vascular ailments prediction and analysis based on deep learning techniques
CN114822849B (zh) 基于数字孪生的数据监测方法、装置、设备和存储介质
US20240169187A1 (en) Systems and Methods for Supplementing Data With Generative Models
WO2023178789A1 (zh) 患病风险估计网络的优化方法、装置、介质及设备
CN114999649B (zh) 一种基于深度学习的老人体征数据监控预警方法和系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19927446

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19927446

Country of ref document: EP

Kind code of ref document: A1