CN112233798B - Interpretable disease risk analysis system based on pathological mode and attention mechanism - Google Patents

Interpretable disease risk analysis system based on pathological mode and attention mechanism Download PDF

Info

Publication number
CN112233798B
CN112233798B CN202011479766.4A CN202011479766A CN112233798B CN 112233798 B CN112233798 B CN 112233798B CN 202011479766 A CN202011479766 A CN 202011479766A CN 112233798 B CN112233798 B CN 112233798B
Authority
CN
China
Prior art keywords
patient
module
pathological
server
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011479766.4A
Other languages
Chinese (zh)
Other versions
CN112233798A (en
Inventor
吕明琪
王琦晖
曾大建
时毅
李文娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Love News Medical Technology Co ltd
Original Assignee
Hangzhou Smart Strategy Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Smart Strategy Technology Co ltd filed Critical Hangzhou Smart Strategy Technology Co ltd
Priority to CN202011479766.4A priority Critical patent/CN112233798B/en
Publication of CN112233798A publication Critical patent/CN112233798A/en
Application granted granted Critical
Publication of CN112233798B publication Critical patent/CN112233798B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Primary Health Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention discloses a disease risk analysis system based on a pathological model and an attention mechanism, which comprises an information acquisition device and a server, wherein the information acquisition device is in communication connection with the server, the information acquisition device transmits acquired demographic characteristic data information of a plurality of patients and electronic medical record data information of a target disease to the server, the server is internally provided with an image construction module, a graph embedding algorithm module, a medical record information extraction module, a training set construction module, a decision tree integration model module, a vectorization module, an attention mechanism module, a characterization vector calculation module, a splicing module and a logistic regression analysis module, and the server acquires a result required by health early warning based on the modules. The invention provides big data service support for health early warning and has the advantage of accurate analysis result.

Description

Interpretable disease risk analysis system based on pathological mode and attention mechanism
Technical Field
The invention relates to the field of intelligent medical treatment, in particular to a disease risk analysis information system based on pathological mode mining and attention mechanism.
Background
In recent years, with the accumulation of electronic medical record big data and the development of artificial intelligence technology, a data-driven disease risk prediction method is developed, namely, a disease risk prediction model is established by analyzing and mining the electronic medical record big data, so that the early prediction of the potential risk and development trend of diseases is realized. The disease risk prediction can give early warning to the future risk of some diseases so as to assist doctors to make a more effective treatment scheme for preventing and controlling the diseases.
According to the trend of technical development, data-driven disease risk prediction methods can be roughly divided into two categories, namely statistical methods and machine learning methods. Early data-driven disease risk prediction methods mainly used statistical methods to perform correlation analysis on a certain disease and multiple risk factors to find out the main risk factors inducing the disease. However, statistical analysis can only analyze the original risk factors, and cannot find hidden risk factors. In addition, the prediction models established based on statistical methods are mostly linear, and the accuracy is generally low.
The machine learning method can automatically learn knowledge from the electronic medical record data, and forecast future data on the basis of the knowledge. The machine learning method can be divided into two types, namely a traditional shallow learning method and a deep learning method. The shallow learning method (such as logistic regression and decision tree model) has the advantages of better interpretability, but has the disadvantages that the performance of the model is over dependent on the domain knowledge (namely the manually defined characteristics), and the generalization capability is not strong; the deep learning method has the advantages that complex and hidden features can be automatically learned, the accuracy is high, but the model is a black box system and is lack of interpretability.
Chinese patent with prior art application number 201510357827.2: a disease risk adjustment model building method discloses a method for analyzing disease risks based on patient historical data by using a computer statistical model, but the method mainly uses a traditional statistical model for risk analysis and does not adopt an advanced deep learning technology, so that the problem of low accuracy still exists.
Chinese patent application No. 201610715985.5: a health prediction system, an intelligent terminal and a server based on big data cloud analysis disclose a method for predicting and analyzing risks of diseases by using health condition parameters in a computer, but the method is mainly used for predicting the risks of the diseases by establishing a statistical model based on the big data analysis, so that the method is also a method for predicting the diseases by using traditional statistics and has the problem of low accuracy.
Interpretability is important for intelligent medical systems, particularly disease risk prediction systems. For example, when a disease risk prediction system determines that a patient is at risk for a disease, a physician typically needs to know which risk factors, or which test data, the model is based on to make this determination in order to provide a reliable diagnosis and treatment. If a model is difficult to interpret, the utility value is greatly limited.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a disease risk analysis information system based on pathological pattern mining and attention mechanism.
The technical scheme adopted by the invention is as follows:
an interpretable disease risk analysis system based on pathological patterns and attention mechanism, characterized in that: the system comprises an information acquisition device and a server, wherein the information acquisition device acquires demographic characteristic data of a plurality of patients and electronic medical record data of a plurality of target diseases of the patients, the information acquisition device is in communication connection with the server, the acquired demographic characteristic data of the patients and the electronic medical record data of the target diseases are transmitted to the server by the information acquisition device, the demographic characteristic data of the patients and the electronic medical record data of the target diseases are stored in a self-integrated memory by the server, and the server is internally provided with an image construction module, a graph embedding algorithm module, a medical record information extraction module, a training set construction module, a decision tree integration model module, a vectorization module, an attention machine module, a characteristic vector calculation module, a splicing module and a logistic regression analysis module in a program form, wherein:
the image construction module calls demographic characteristic data of a plurality of patients and constructs a patient image map based on the demographic characteristic data of the plurality of patients of the target disease;
the image embedding algorithm module acquires patient image maps of a plurality of patients from the image construction module, and processes the patient image maps by adopting an image embedding algorithm to obtain a characterization vector of each patient entity node by taking each patient entity and each demographic characteristic value as a node and taking the corresponding relation between the patient entity and the demographic characteristic value as an edge in the patient image maps;
the medical record information extraction module calls the electronic medical record data of each patient target disease and extracts various pathological features and target disease diagnosis results from the electronic medical record data of each patient target disease;
the training set construction module acquires various pathological characteristics and target disease diagnosis results of all patients from the medical record information extraction module, and constructs a training sample set according to the various pathological characteristics and the target disease diagnosis results of all patients;
the decision tree integrated model module acquires a training sample set from the training set construction module and trains the decision tree integrated model by utilizing the training sample set; then acquiring pathological features of each patient from a medical record information extraction module, and inputting the pathological features of each patient as input data into a trained decision tree integrated model to obtain all pathological modes of each patient;
the vectorization module acquires all pathological modes of each patient from the decision tree integrated model and vectorizes all pathological modes of each patient;
the attention mechanism module acquires all pathological mode vectors of each patient from the vectorization module and calculates the attention weight of each pathological mode of each patient to the corresponding patient based on the attention mechanism;
the characterization vector calculation module acquires the attention weight of each pathological mode of each patient to the corresponding patient from the attention mechanism module, simultaneously acquires all pathological mode vectors of each patient from the vectorization module, and then calculates and obtains a total characterization vector of all pathological modes of each patient based on each pathological mode vector of each patient and the attention weight corresponding to each pathological mode;
the splicing module acquires the overall characteristic vector of all pathological modes of each patient from the characteristic vector calculation module, acquires the characteristic vector of each patient entity node from the graph embedding algorithm module, and splices the characteristic vector of each patient entity node and the overall characteristic vector of all pathological modes of each patient to form a new characteristic vector;
and the logistic regression analysis module acquires a new characterization vector from the splicing module, and then analyzes the new characteristic vector by adopting a logistic regression analysis method to obtain the target disease risk probability of each patient.
The interpretable disease risk analysis system based on pathological patterns and attention mechanism is characterized in that: the process of constructing the patient image map by the image construction module in the server comprises the following steps: selecting the same demographic characteristics related to target diseases of a plurality of patients as patient portrait information, thereby constructing a patient portrait conceptual diagram, then converting continuous characteristics into discrete characteristics, and constructing the patient portrait diagram based on the discrete characteristicsG = (V, E) Wherein:Vis a collection of nodes, representing the values of patient entities and characteristics,Eis a set of relationships representing the correspondence between patient entities and characteristic values.
The interpretable disease risk analysis system based on pathological patterns and attention mechanism is characterized in that: the processing process of the image embedding algorithm module in the server to the patient image map is as follows:
firstly, the image of the patient is processed by adopting an image embedding algorithm to obtain all nodesd 1Dimension dense characterization vectors, each noden i Is characterized by a characterization vector of e i
Then, a trainable weight matrix W is set P Offset vector b P And a mapping vector h P Calculating each patient entity node based on formula (1)n i With its neighbour nodesn j Is weighted by the degree of associationα ij Equation (1) is as follows:
Figure 100002_DEST_PATH_IMAGE002
(1),
wherein the patient entity noden i Comprises a neighbor noden i Value nodes on each demographic characteristic andn i by themselves, the user can select the specific position,σ(…) is the activation function, e j Representing neighbor nodesn j The characterization vector of (2);
finally, based onEquation (2) calculates each patient entity noden i Is finally characterized by vector g i Equation (2) is as follows:
Figure 100002_DEST_PATH_IMAGE004
(2),
wherein the content of the first and second substances,A(i) Representing a patient entity noden i The number set of neighbor nodes.
The interpretable disease risk analysis system based on pathological patterns and attention mechanism is characterized in that: the process of obtaining a plurality of pathological modes of each patient by the decision tree integration model module in the server is as follows:
(1) and characteristic extraction: given patient data setD,To pairDIn each patientu i Fromu i Various pathological features are extracted from the electronic medical record data, and a training sample set is formed according to the disease diagnosis resultS
(2) Training a decision tree integration model: based on training sample setSTraining an apparatus comprisingNDecision tree integration model of decision treeTMThen, thenTMEach branch of each decision tree of (a) represents a pathological pattern;
(3) and pathological mode extraction: given a patientu i Firstly, inputting the pathological features into a decision tree integration modelTMEach decision tree of (2) will reach at least one leaf node of each decision tree, and the branch corresponding to each reached leaf node is the patientu i A pathological pattern of (1).
The interpretable disease risk analysis system based on pathological patterns and attention mechanism is characterized in that: the process of calculating the total characterization vector of all pathological modes of each patient by the characterization vector calculation module in the server is as follows:
(A1) pathological pattern embedding: each decision tree in the decision tree integration model is regarded as a class type characteristic, and each pathological mode is regarded as the classOne value of the typing characteristics is the pathological patternp j Expressed as a one-hot vector f j Then, each pathological mode is detected by a multi-layer sensorp j One-hot vector f of j Is converted into oned 2Dense vector x of dimension j
(A2) Attention weighting: first, a trainable weight matrix W is set A Offset vector b A And a mapping vector h A Calculating a pathological pattern based on the formula (3)p j For the patientu i Attention weight ofβ ij Equation (3) is as follows:
Figure 100002_DEST_PATH_IMAGE006
(3),
in the formula (3), the first and second groups,σ(…) is the activation function, g i For a patient entity noden i The final characterization vector of (2).
Then, each patient is calculated based on the formula (4)u i Extracted overall characterization vector y of all pathological modes i Equation (4) is as follows:
Figure 100002_DEST_PATH_IMAGE008
(4)。
the interpretable disease risk analysis system based on pathological patterns and attention mechanism is characterized in that: the server is also provided with a result analysis module in a program form, a threshold value is set in the result analysis module, the result analysis module acquires the target disease risk probability of the patient from the logistic regression analysis module and compares the target disease risk probability of the patient with the set threshold value, if the target disease risk probability of the patient is larger than the set threshold value, the result analysis module acquires the attention weights of all pathological modes of the patient from the attention mechanism module, and then the result analysis module selects a plurality of pathological modes with the highest attention weights as the explanation of the target disease risk analysis result reason of the patient and outputs the explanation to the outside.
The interpretable disease risk analysis system based on pathological patterns and attention mechanism is characterized in that: the information acquisition device is a human-computer interaction device which is in wired communication connection with the server through a communication bus and enables the human-computer interaction device to acquire demographic characteristic data of a plurality of patients and electronic medical record data of target diseases of the plurality of patients in a manual entry mode.
The interpretable disease risk analysis system based on pathological patterns and attention mechanism is characterized in that: the information acquisition device is a computer, the demographic characteristic data of a plurality of patients and the electronic medical record data of the target disease are recorded and stored in the computer, the computer is in communication connection with the server through a communication module integrated with the computer, and the demographic characteristic data of the patients and the electronic medical record data of the target disease are transmitted to the server by the computer.
The interpretable disease risk analysis system based on pathological patterns and attention mechanism is characterized in that: the information acquisition device is a personal intelligent terminal distributed to each patient, the personal intelligent terminal records and stores demographic characteristic data of each patient and electronic medical record data of target diseases, the personal intelligent terminals are respectively in communication connection with the server through communication modules integrated with the personal intelligent terminals, and the personal intelligent terminals respectively transmit the demographic characteristic data of the corresponding patient and the electronic medical record data of the target diseases to the server.
The invention has the following beneficial effects: 1. the information system can fully utilize the demographic characteristic data and the electronic medical record data to carry out big data analysis, has the advantages of intellectualization and high accuracy of analysis results, and provides big data service support for health early warning. 2. The server of the information system adopts a deep learning model and an attention mechanism, and obtains the interpretation capability of the prediction result while ensuring the prediction accuracy of the model. 3. The server of the information system adopts the integrated decision tree mining pathological mode as the explanation basis, so that the reference value of model explanation is improved. 4. The server of the information system of the invention represents the design attention mechanism based on the patient characteristics and the pathological mode, thereby improving the adaptability of model interpretation.
Drawings
FIG. 1 is a block diagram of the system architecture of the present invention.
FIG. 2a is a conceptual illustration of a patient representation constructed by the representation construction module in the server according to the present invention.
FIG. 2b is an illustration of a final patient representation constructed by the representation construction module in the server of the present invention.
FIG. 3 is a schematic diagram of the operation of the attention mechanism module, the characterization vector calculation module, the concatenation module, and the logistic regression analysis module in the server according to the present invention.
Detailed Description
The invention is further illustrated with reference to the following figures and examples.
As shown in fig. 1, the interpretable disease risk analysis system based on pathological mode and attention mechanism includes an information acquisition device and a server, where the information acquisition device is a human-computer interaction device, or a computer, or a personal intelligent terminal such as a mobile phone.
When the information acquisition device is a human-computer interaction device, the human-computer interaction device is in wired communication connection with the server through a communication bus, the demographic characteristic data of a plurality of patients and the electronic medical record data of target diseases of the plurality of patients are input through the human-computer interaction device in a manual input mode, and the demographic characteristic data of the plurality of patients and the electronic medical record data of the target diseases are stored into the self-integrated memory through the server.
When the information acquisition device is a computer, the demographic characteristic data of a plurality of patients and the electronic medical record data of the target disease are recorded and stored through the computer, the computer is in communication connection with the server through a wired or wireless communication module integrated with the computer, the demographic characteristic data of the patients and the electronic medical record data of the target disease are transmitted to the server through the computer, and the demographic characteristic data of the patients and the electronic medical record data of the target disease are stored into a memory integrated with the server.
When the information acquisition device is a personal intelligent terminal distributed to each patient, the demographic characteristic data of the individual and the electronic medical record data of the target disease can be automatically recorded into the personal intelligent terminal by the patient, then the demographic characteristic data of the patient and the electronic medical record data of the target disease corresponding to the individual and the electronic medical record data of the target disease are transmitted to the server by the personal intelligent terminal through the communication module integrated with the personal intelligent terminal, and the demographic characteristic data of a plurality of patients and the electronic medical record data of the target disease are stored into the memory integrated with the personal intelligent terminal by the server.
The server is provided with an image construction module, an image embedding algorithm module, a medical record information extraction module, a training set construction module, a decision tree integrated model module, a vectorization module, an attention mechanism module, a characterization vector calculation module, a splicing module, a logistic regression analysis module and a result analysis module in a program form, wherein:
the image construction module calls demographic characteristic data of a plurality of patients, and constructs a patient image map based on the demographic characteristic data information of the plurality of patients of the target disease, and the specific process is as follows:
as shown in FIG. 2a, demographic characteristics such as age, sex, weight, etc. are selected for each patient to construct a conceptual image of the patient. Then, the continuous characteristic is converted into discrete characteristic by adopting an equal width method, and a patient image map is constructed based on the discrete characteristicG = (V, E) Wherein:Vis a collection of nodes, representing the values of patient entities and characteristics,Eis a set of relationships representing the correspondence between patient entities and characteristic values. As shown in fig. 2b, the solid nodes in the final patient image map are zhang san and lie xi, which are characterized by the discrete characteristic gender, age level, weight level, etc. of zhang san and lie xi.
The image embedding algorithm module acquires patient image maps of a plurality of patients from the image construction module, and processes the patient image maps by adopting an image embedding algorithm to obtain a characterization vector of each patient entity node by taking each patient entity and each demographic characteristic value as a node and taking the corresponding relation between the patient entity and the demographic characteristic value as an edge in the patient image maps, wherein the specific process comprises the following steps:
firstly, the image of the patient is processed by adopting an image embedding algorithm to obtain all nodesd 1Dimension dense characterization vectors, each noden i Is characterized by a characterization vector of e i
Then, a trainable weight matrix W is set P Offset vector b P And a mapping vector h P Calculating each patient entity node based on formula (1)n i With its neighbour nodesn j Is weighted by the degree of associationα ij Equation (1) is as follows:
Figure DEST_PATH_IMAGE009
(1),
wherein the patient entity noden i Comprises a neighbor noden i Value nodes on each demographic characteristic andn i by themselves, the user can select the specific position,σ(…) is the activation function, e j Representing neighbor nodesn j The characterization vector of (2);
finally, each patient entity node is calculated based on equation (2)n i Is finally characterized by vector g i Equation (2) is as follows:
Figure DEST_PATH_IMAGE010
(2),
wherein the content of the first and second substances,A(i) Representing a patient entity noden i The number set of neighbor nodes.
The medical record information extraction module calls the electronic medical record data of the target diseases of each patient and extracts various pathological features and target disease diagnosis results from the electronic medical record data of the target diseases of each patient.
The training set construction module acquires various pathological characteristics and target disease diagnosis results of all patients from the medical record information extraction module, and constructs a training sample set according to the various pathological characteristics and the target disease diagnosis results of all patients.
The decision tree integrated model module acquires a training sample set from the training set construction module and trains the decision tree integrated model by utilizing the training sample set; then obtaining the pathological features of each patient from the medical record information extraction module, inputting the pathological features of each patient as input data into the trained decision tree integrated model to obtain all pathological modes of each patient, wherein the specific process is as follows:
(1) and characteristic extraction: given patient data setD,To pairDIn each patientu i Fromu i Various pathological features are extracted from the electronic medical record data, and a training sample set is formed according to the disease diagnosis resultS
(2) Training a decision tree integration model: based on training sample setSTraining an apparatus comprisingNDecision tree integration model of decision treeTMThen, thenTMEach branch of each decision tree of (a) represents a pathological pattern;
(3) and pathological mode extraction: given a patientu i Firstly, inputting the pathological features into a decision tree integration modelTMEach decision tree of (2) will reach at least one leaf node of each decision tree, and the branch corresponding to each reached leaf node is the patientu i A pathological pattern of (1).
The vectorization module obtains all pathological patterns of each patient from the decision tree integration model and vectorizes all pathological patterns of each patient.
As shown in FIG. 3, the attention mechanism module obtains all pathological mode vectors of each patient from the vectorization module and calculates attention weights of each pathological mode of each patient for the corresponding patient based on the attention mechanism.
The characterization vector calculation module acquires the attention weight of each pathological mode of each patient to the corresponding patient from the attention mechanism module, simultaneously acquires all pathological mode vectors of each patient from the vectorization module, and then calculates and obtains a total characterization vector of all pathological modes of each patient based on each pathological mode vector of each patient and the attention weight corresponding to each pathological mode, and the specific process is as follows:
(A1) pathological pattern embedding: each decision tree in the decision tree integration model is regarded as a type characteristic, each pathological mode is regarded as a value of the type characteristic, and the pathological mode is regarded as a value of the type characteristicp j Expressed as a one-hot vector f j Then, each pathological mode is detected by a multi-layer sensorp j One-hot vector f of j Is converted into oned 2Dense vector x of dimension j
(A2) Attention weighting: first, a trainable weight matrix W is set A Offset vector b A And a mapping vector h A Calculating a pathological pattern based on the formula (3)p j For the patientu i Attention weight ofβ ij Equation (3) is as follows:
Figure 53494DEST_PATH_IMAGE006
(3),
in the formula (3), the first and second groups,σ(…) is the activation function, g i For a patient entity noden i The final characterization vector of (2).
Then, each patient is calculated based on the formula (4)u i Extracted overall characterization vector y of all pathological modes i Equation (4) is as follows:
Figure 970634DEST_PATH_IMAGE008
(4)。
the splicing module acquires the overall characteristic vector of all pathological modes of each patient from the characteristic vector calculation module, acquires the characteristic vector of each patient entity node from the graph embedding algorithm module, and splices the characteristic vector of each patient entity node and the overall characteristic vector of all pathological modes of each patient to form a new characteristic vector.
And the logistic regression analysis module acquires a new characterization vector from the splicing module, and then analyzes the new characteristic vector by adopting a logistic regression analysis method to obtain the target disease risk probability of each patient.
The result analysis module is set with a threshold value, the result analysis module obtains the target disease risk probability of the patient from the logistic regression analysis module, compares the target disease risk probability of the patient with the set threshold value, if the target disease risk probability of the patient is larger than the set threshold value, the result analysis module obtains the attention weight of all pathological modes of the patient from the attention mechanism module, and then the result analysis module selects a plurality of pathological modes with the highest attention weight as the explanation of the target disease risk analysis result reason of the patient and outputs the explanation to the outside.
The embodiments of the present invention are described only for the preferred embodiments of the present invention, and not for the limitation of the concept and scope of the present invention, and various modifications and improvements made to the technical solution of the present invention by those skilled in the art without departing from the design concept of the present invention shall fall into the protection scope of the present invention, and the technical content of the present invention which is claimed is fully set forth in the claims.

Claims (9)

1. An interpretable disease risk analysis system based on pathological patterns and attention mechanism, characterized in that: the system comprises an information acquisition device and a server, wherein the information acquisition device acquires demographic characteristic data of a plurality of patients and electronic medical record data of a plurality of target diseases of the patients, the information acquisition device is in communication connection with the server, the acquired demographic characteristic data of the patients and the electronic medical record data of the target diseases are transmitted to the server by the information acquisition device, the demographic characteristic data of the patients and the electronic medical record data of the target diseases are stored in a self-integrated memory by the server, and the server is internally provided with an image construction module, a graph embedding algorithm module, a medical record information extraction module, a training set construction module, a decision tree integration model module, a vectorization module, an attention machine module, a characteristic vector calculation module, a splicing module and a logistic regression analysis module in a program form, wherein:
the image construction module calls demographic characteristic data of a plurality of patients and constructs a patient image map based on the demographic characteristic data of the plurality of patients of the target disease;
the image embedding algorithm module acquires patient image maps of a plurality of patients from the image construction module, and processes the patient image maps by adopting an image embedding algorithm to obtain a characterization vector of each patient entity node by taking each patient entity and each demographic characteristic value as a node and taking the corresponding relation between the patient entity and the demographic characteristic value as an edge in the patient image maps;
the medical record information extraction module calls the electronic medical record data of each patient target disease and extracts various pathological features and target disease diagnosis results from the electronic medical record data of each patient target disease;
the training set construction module acquires various pathological characteristics and target disease diagnosis results of all patients from the medical record information extraction module, and constructs a training sample set according to the various pathological characteristics and the target disease diagnosis results of all patients;
the decision tree integrated model module acquires a training sample set from the training set construction module and trains the decision tree integrated model by utilizing the training sample set; then acquiring pathological features of each patient from a medical record information extraction module, and inputting the pathological features of each patient as input data into a trained decision tree integrated model to obtain all pathological modes of each patient;
the vectorization module acquires all pathological modes of each patient from the decision tree integrated model and vectorizes all pathological modes of each patient;
the attention mechanism module acquires all pathological mode vectors of each patient from the vectorization module and calculates the attention weight of each pathological mode of each patient to the corresponding patient based on the attention mechanism;
the characterization vector calculation module acquires the attention weight of each pathological mode of each patient to the corresponding patient from the attention mechanism module, simultaneously acquires all pathological mode vectors of each patient from the vectorization module, and then calculates and obtains a total characterization vector of all pathological modes of each patient based on each pathological mode vector of each patient and the attention weight corresponding to each pathological mode;
the splicing module acquires the overall characteristic vector of all pathological modes of each patient from the characteristic vector calculation module, acquires the characteristic vector of each patient entity node from the graph embedding algorithm module, and splices the characteristic vector of each patient entity node and the overall characteristic vector of all pathological modes of each patient to form a new characteristic vector;
and the logistic regression analysis module acquires a new characterization vector from the splicing module, and then analyzes the new characteristic vector by adopting a logistic regression analysis method to obtain the target disease risk probability of each patient.
2. The system of claim 1, wherein the risk analysis system is based on pathological patterns and attention mechanism, and comprises: the process of constructing the patient image map by the image construction module in the server comprises the following steps: selecting the same demographic characteristics related to target diseases of a plurality of patients as patient portrait information, thereby constructing a patient portrait conceptual diagram, then converting continuous characteristics into discrete characteristics, and constructing the patient portrait diagram based on the discrete characteristicsG = (V, E) Wherein:Vis a collection of nodes, representing the values of patient entities and characteristics,Eis a set of relationships representing the correspondence between patient entities and characteristic values.
3. The system for interpretable disease risk analysis based on pathological patterns and attention-driven according to claim 1 or 2, wherein: the processing process of the image embedding algorithm module in the server to the patient image map is as follows:
firstly, the image of the patient is processed by adopting an image embedding algorithm to obtain all nodesd 1Dimension dense characterization vectors, each noden i Is characterized by a characterization vector of e i
Then, a trainable weight matrix W is set P Offset vector b P And a mapping vector h P Calculating each patient entity node based on formula (1)n i With its neighbour nodesn j Is weighted by the degree of associationα ij Equation (1) is as follows:
Figure DEST_PATH_IMAGE002
(1),
wherein the patient entity noden i Comprises a neighbor noden i Value nodes on each demographic characteristic andn i by themselves, the user can select the specific position,σ(…) is the activation function, e j Representing neighbor nodesn j The characterization vector of (2);
finally, each patient entity node is calculated based on equation (2)n i Is finally characterized by vector g i Equation (2) is as follows:
Figure DEST_PATH_IMAGE004
(2),
wherein the content of the first and second substances,A(i) Representing a patient entity noden i The number set of neighbor nodes.
4. The system of claim 1, wherein the risk analysis system is based on pathological patterns and attention mechanism, and comprises: the process of obtaining a plurality of pathological modes of each patient by the decision tree integration model module in the server is as follows:
(1) and characteristic extraction: given patient data setD,To pairDIn each patientu i Fromu i Various pathological features are extracted from the electronic medical record data, and a training sample set is formed according to the disease diagnosis resultS
(2) Training a decision tree integration model: based on training sample setSTraining an apparatus comprisingNDecision tree integration model of decision treeTMThen, thenTMEach branch of each decision tree of (a) represents a pathological pattern;
(3) and pathological mode extraction: given a patientu i Firstly, inputting the pathological features into a decision tree integration modelTMEach decision tree of (2) will reach at least one leaf node of each decision tree, and the branch corresponding to each reached leaf node is the patientu i A pathological pattern of (1).
5. The system for interpretable disease risk analysis according to claim 1 or 4, wherein: the process of calculating the total characterization vector of all pathological modes of each patient by the characterization vector calculation module in the server is as follows:
(A1) pathological pattern embedding: each decision tree in the decision tree integration model is regarded as a type characteristic, each pathological mode is regarded as a value of the type characteristic, and the pathological mode is regarded as a value of the type characteristicp j Expressed as a one-hot vector f j Then, each pathological mode is detected by a multi-layer sensorp j One-hot vector f of j Is converted into oned 2Dense vector x of dimension j
(A2) Attention weighting: first, a trainable weight matrix W is set A Offset vector b A And a mapping vector h A Calculating a pathological pattern based on the formula (3)p j For the patientu i Attention weight ofβ ij Equation (3) is as follows:
Figure DEST_PATH_IMAGE006
(3),
in the formula (3), the first and second groups,σ(…) is the activation function, g i For a patient entity noden i The final characterization vector of (1);
then, each patient is calculated based on the formula (4)u i Extracted overall characterization vector y of all pathological modes i Equation (4) is as follows:
Figure DEST_PATH_IMAGE008
(4)。
6. the system of claim 1, wherein the risk analysis system is based on pathological patterns and attention mechanism, and comprises: the server is also provided with a result analysis module in a program form, a threshold value is set in the result analysis module, the result analysis module acquires the target disease risk probability of the patient from the logistic regression analysis module and compares the target disease risk probability of the patient with the set threshold value, if the target disease risk probability of the patient is larger than the set threshold value, the result analysis module acquires the attention weights of all pathological modes of the patient from the attention mechanism module, and then the result analysis module selects a plurality of pathological modes with the highest attention weights as the explanation of the target disease risk analysis result reason of the patient and outputs the explanation to the outside.
7. The system of claim 1, wherein the risk analysis system is based on pathological patterns and attention mechanism, and comprises: the information acquisition device is a human-computer interaction device which is in wired communication connection with the server through a communication bus and enables the human-computer interaction device to acquire demographic characteristic data of a plurality of patients and electronic medical record data of target diseases of the plurality of patients in a manual entry mode.
8. The system of claim 1, wherein the risk analysis system is based on pathological patterns and attention mechanism, and comprises: the information acquisition device is a computer, the demographic characteristic data of a plurality of patients and the electronic medical record data of the target disease are recorded and stored in the computer, the computer is in communication connection with the server through a communication module integrated with the computer, and the demographic characteristic data of the patients and the electronic medical record data of the target disease are transmitted to the server by the computer.
9. The system of claim 1, wherein the risk analysis system is based on pathological patterns and attention mechanism, and comprises: the information acquisition device is a personal intelligent terminal distributed to each patient, the personal intelligent terminal records and stores demographic characteristic data of each patient and electronic medical record data of target diseases, the personal intelligent terminals are respectively in communication connection with the server through communication modules integrated with the personal intelligent terminals, and the personal intelligent terminals respectively transmit the demographic characteristic data of the corresponding patient and the electronic medical record data of the target diseases to the server.
CN202011479766.4A 2020-12-16 2020-12-16 Interpretable disease risk analysis system based on pathological mode and attention mechanism Active CN112233798B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011479766.4A CN112233798B (en) 2020-12-16 2020-12-16 Interpretable disease risk analysis system based on pathological mode and attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011479766.4A CN112233798B (en) 2020-12-16 2020-12-16 Interpretable disease risk analysis system based on pathological mode and attention mechanism

Publications (2)

Publication Number Publication Date
CN112233798A CN112233798A (en) 2021-01-15
CN112233798B true CN112233798B (en) 2021-03-19

Family

ID=74124747

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011479766.4A Active CN112233798B (en) 2020-12-16 2020-12-16 Interpretable disease risk analysis system based on pathological mode and attention mechanism

Country Status (1)

Country Link
CN (1) CN112233798B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112885480A (en) * 2021-02-23 2021-06-01 东软集团股份有限公司 User information processing method and device, storage medium and electronic equipment
CN117546155A (en) * 2021-06-10 2024-02-09 维萨国际服务协会 Systems, methods, and computer program products for feature analysis using embedded trees
CN114692785B (en) * 2022-05-26 2022-09-09 中国平安财产保险股份有限公司 Behavior classification method, device, equipment and storage medium
CN117194802A (en) * 2023-11-07 2023-12-08 中国人民武装警察部队北京市总队医院 Medical and anti-cooperative platform resident health portrait and service recommendation system and method

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109754852A (en) * 2019-01-08 2019-05-14 中南大学 Risk of cardiovascular diseases prediction technique based on electronic health record
CN111370120B (en) * 2020-02-17 2023-07-21 深圳大学 Heart diastole dysfunction detection method based on heart sound signals
CN111370122B (en) * 2020-02-27 2023-12-19 西安交通大学 Time sequence data risk prediction method and system based on knowledge guidance and application thereof
CN111681726B (en) * 2020-05-29 2023-11-03 北京百度网讯科技有限公司 Processing method, device, equipment and medium of electronic medical record data
CN111859938B (en) * 2020-07-22 2022-10-21 大连理工大学 Electronic medical record entity relation extraction method based on position vector noise reduction and rich semantics

Also Published As

Publication number Publication date
CN112233798A (en) 2021-01-15

Similar Documents

Publication Publication Date Title
CN112233798B (en) Interpretable disease risk analysis system based on pathological mode and attention mechanism
CN110136103A (en) Medical image means of interpretation, device, computer equipment and storage medium
US11058209B2 (en) Beauty counseling information providing device and beauty counseling information providing method
KR101779800B1 (en) System and method for evaluating multifaceted growth based on machine learning
CN110796199B (en) Image processing method and device and electronic medical equipment
CN115013298B (en) Real-time performance online monitoring system and monitoring method of sewage pump
CN111128380A (en) Method and system for constructing chronic disease health management model for simulating doctor diagnosis and accurate intervention strategy
CN110363072B (en) Tongue picture identification method, tongue picture identification device, computer equipment and computer readable storage medium
CN113077434A (en) Method, device and storage medium for lung cancer identification based on multi-modal information
CN115579141A (en) Interpretable disease risk prediction model construction method and disease risk prediction device
CN114693624A (en) Image detection method, device and equipment and readable storage medium
CN116933046A (en) Deep learning-based multi-mode health management scheme generation method and system
CN114343585B (en) Cognitive and behavioral disorder early warning method, device, equipment and storage medium
CN111160443A (en) Activity and user identification method based on deep multitask learning
CN113590971A (en) Interest point recommendation method and system based on brain-like space-time perception characterization
CN115115038B (en) Model construction method based on single lead electrocardiosignal and gender identification method
CN108846327B (en) Intelligent system and method for distinguishing pigmented nevus and melanoma
CN116704609A (en) Online hand hygiene assessment method and system based on time sequence attention
CN111582404B (en) Content classification method, device and readable storage medium
CN112560784B (en) Electrocardiogram classification method based on dynamic multi-scale convolutional neural network
CN112270996B (en) Classification method for multi-variable medical sensing data flow
CN114998731A (en) Intelligent terminal navigation scene perception identification method
CN112015894B (en) Text single class classification method and system based on deep learning
CN112001896B (en) Thyroid gland border irregularity detection device
CN112001894B (en) Thyroid gland boundary smoothness detection device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230808

Address after: 4th Floor, Building 1, No. 508 Yingxi North Road, Fuxi Street, Deqing County, Huzhou City, Zhejiang Province, 313200

Patentee after: Zhejiang love news Medical Technology Co.,Ltd.

Address before: Room 506-2, Block E, building 1, 1378 Wenyi West Road, Cangqian street, Yuhang District, Hangzhou City, Zhejiang Province

Patentee before: Hangzhou smart strategy Technology Co.,Ltd.