CN114639483A

CN114639483A - Electronic medical record retrieval method and device based on graph neural network

Info

Publication number: CN114639483A
Application number: CN202210291079.2A
Authority: CN
Inventors: 吕旭东; 李梦阳; 段会龙; 蔡海领
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2022-03-23
Filing date: 2022-03-23
Publication date: 2022-06-17

Abstract

The invention discloses an electronic medical record retrieval method based on a graph neural network, which comprises the following steps: acquiring a co-occurrence matrix of medical entities in the electronic medical record, adding co-occurrence information of the medical entities and ancestor medical entities into the medical entity co-occurrence matrix to obtain an enhanced medical entity co-occurrence matrix, extracting vector representation of each medical entity and vector representation of a patient by adopting a GloVe model, wherein the electronic medical record heterogeneous graph comprises medical entity nodes, patient nodes, a real relation of links between the medical entities and a real relation of links between the patient and the medical entities; inputting the electronic medical record abnormal picture into a graph neural network to respectively obtain patient node output vector representation and medical entity node output vector representation, and the probability of the link relation between the patient and the medical entity; probability of link relationships between medical entities; training the neural network of the graph by using the total loss function, and updating the parameters to obtain the neural network of the final graph; the method can provide for predicting a probability of association of a patient with a medical entity.

Description

Electronic medical record retrieval method and device based on graph neural network

Technical Field

The invention relates to the technical field of medical information data processing, in particular to an electronic medical record retrieval method and device based on a graph neural network.

Background

Medical practice is an activity that requires a large amount of data to support, which requires constant access to patient information for analysis and decision making. Electronic medical records, one of the main information sources at present, contain abundant information, and it is of great significance to support medical activities such as clinical decision support, clinical research and clinical trials by using the information, and the research development needs to effectively query the data of the electronic medical records. In the query tasks carried out by the medical field personnel, the support of information technical personnel is lacked, so that the personnel can only finish the expression of query by relying on self knowledge, the process of the query task is full of challenges, a large amount of browsing and exploration are needed to find the target information, the working efficiency is greatly reduced, and the workload of medical professionals is increased. To address this problem, an automated method is needed to reduce the time cost of clinical staff.

In the process, the query performance can be effectively improved by utilizing a semantic association mode. Currently, in the query task of an actual scene, various medical entities in an electronic medical record are associated by using medical ontology knowledge, and a corresponding target query entity is expanded in the query through the relationship. However, the method excessively depends on a general medical knowledge body, and related information existing in the electronic medical record is easily ignored; in addition, entities in the electronic medical record which do not appear in the medical knowledge ontology cannot be expanded, so that the application range of the method is limited.

The electronic medical record contains rich associated information which can effectively help to optimize the query task. Based on the idea, different information is needed to establish the link relation between the electronic medical record data, and then the query task is improved through the association relation between the data. With the development of machine learning, the practical effect of deep learning in various fields is proved, and therefore, the modeling of the electronic medical record by utilizing the neural network is an effective means. The graph neural network can represent a complex topological structure, and the electronic medical record can be regarded as a complex heterogeneous graph structure, so that the structure of the electronic medical record can be effectively represented by the graph neural network.

The graph neural network is a novel neural network structure developed from convolutional neural networks and graph representation learning, can extract and represent the characteristics of data in the graph field compared with the data type oriented by the neural networks, is an efficient and easily-expanded structure, and has strong functions in the aspect of graph data learning. Compared with the traditional deep learning method, the entity and the connection between the entities can be reflected by the constructed graph model. The graph neural network firstly carries out initialization description on nodes, then obtains the state with the characteristics of containing neighbor node information and a network topological structure through continuous node state updating, finally outputs the nodes through a specific method to obtain required results, and the results can be used in subsequent tasks. Therefore, the method is very suitable for modeling heterogeneous electronic medical records.

Due to their expertise and complexity, the medical field has a great deal of medical ontology knowledge, such as: ICD, SNOMED-CT, etc., can be used for establishing the relation between different medical entities, establish the association information that does not exist in the electronic medical record, thus enriching the topological structure information in the network.

Disclosure of Invention

The invention discloses an electronic medical record retrieval method based on a graph neural network, which can expand the relationship range between medical entities and between the medical entities and patients so as to prepare for predicting the association probability between the patients and the medical entities.

An electronic medical record retrieval method based on a graph neural network comprises the following steps:

(1) acquiring a co-occurrence matrix of medical entities in the electronic medical record, traversing ICD codes of medical ontology knowledge to acquire a plurality of ancestor medical entities corresponding to the medical entities, adding co-occurrence information of the medical entities and the ancestor medical entities into the medical entity co-occurrence matrix to acquire an enhanced medical entity co-occurrence matrix, extracting vector representation of each medical entity by adopting a GloVe model based on the enhanced medical entity co-occurrence matrix, and taking an aggregation result of a plurality of medical entity vector representations associated with a patient as the patient vector representation;

(2) constructing an electronic medical record heterogeneous graph, wherein the electronic medical record heterogeneous graph comprises medical entity nodes, patient nodes, real linking relations among medical entities and real linking relations between patients and medical entities;

representing each medical entity vector as an initial attribute of each medical entity node, representing each patient vector as an initial attribute of each patient node, connecting related medical entities to obtain a real link relationship between the medical entities, and connecting the related medical entities and the patients to obtain a real link relationship between the patients and the medical entities;

(3) inputting the electronic medical record abnormal graph into a GraphSAGE graph neural network to respectively obtain the output vector representation of the patient node and the output vector representation of the medical entity node; based on the patient node output vector representation and the medical entity node output vector representation, obtaining the link relation probability of the patient and the medical entity by adopting an activation function; based on the medical entity node output vector representation, obtaining the probability of the link relation between the medical entities by adopting an activation function;

(4) constructing a total loss function, wherein the total loss function comprises a first loss function, a second loss function and a multitask weighted loss function;

constructing a first loss function through the cross entropy of the real relation of the patient and the medical entity link and the probability of the patient and the medical entity link;

constructing a second loss function through the cross entropy of the link real relation between the medical entities and the link relation probability of the medical entities and the medical entities;

constructing a multitask weighted loss function through the loss value of the first loss function and the loss value of the second loss function;

(5) training a GraphSAGE graph neural network by using a total loss function, and updating parameters to obtain a final GraphSAGE graph neural network;

(6) when the method is applied, the medical entity vector representation and the patient vector representation are input into the final GraphSAGE graph neural network to predict the association probability of the medical entity and the patient.

Obtaining a co-occurrence matrix of medical entities in an electronic medical record comprises:

the frequency product of every two medical entities in every visit record of every patient is used as the co-occurrence information of every two medical entities, a co-occurrence matrix of every visit record is constructed based on the co-occurrence information of every two medical entities, the co-occurrence matrices of the multiple times of visit records of every patient are added to obtain an electronic medical record co-occurrence matrix of every patient, and the co-occurrence matrices of the electronic medical records of the multiple patients are added to obtain the co-occurrence matrix of the medical entities in the electronic medical records.

Traversing the ICD codes of the medical ontology knowledge to obtain a plurality of ancestor medical entities corresponding to the medical entities, wherein the method comprises the following steps:

and taking each medical entity as a leaf node, obtaining a plurality of ancestor nodes corresponding to the leaf node by traversing the ICD codes of the medical ontology from bottom to top, extracting the medical entities corresponding to the ancestor nodes to obtain ancestor medical entities, obtaining the co-occurrence information of each medical entity and the ancestor medical entities in the ICD codes of the medical ontology, and adding the co-occurrence information into the medical entity co-occurrence matrix to expand the medical entity co-occurrence matrix.

Extracting each medical entity vector representation by adopting a GloVe model based on the enhanced medical entity co-occurrence matrix, wherein the extraction comprises the following steps:

setting an initial vector representation of each medical entity, inputting the initial vector representation to a GloVe model, and training an objective function to obtain a vector representation of each medical entity, wherein the objective function J is as follows:

wherein M is_ijTo enhance the co-occurrence product of the ith and jth entity vectors in the medical entity co-occurrence matrix, | D | is the number of medical entities, e |_jIs a vector representation of the jth medical entity,e_iis a vector representation of the ith medical entity, b_iBias parameter for the ith medical entity, b_jIs the bias parameter for the jth medical entity.

An aggregated result of the plurality of medical entity vector representations associated with the patient is taken as the patient vector representation, the aggregated result comprising a summation, an average, a maximum, or a minimum.

Inputting the electronic medical record abnormal graph into a GraphSAGE graph neural network to respectively obtain the output vector representation of the patient node and the output vector representation of the medical entity node, wherein the method comprises the following steps:

performing Mean agglomerator aggregation in the GraphSAGE on the current layer neighbor node vector representation and the previous layer output vector representation of the patient node to obtain a patient node output vector representation, and performing Mean agglomerator aggregation in the GraphSAGE to obtain a medical entity node output vector representation through the current layer neighbor node vector representation and the previous layer output vector representation of the medical entity node;

wherein the current layer neighbor node vector represents

Comprises the following steps:

wherein R is the real relation of links between medical entities or the real relation of links between a patient and the medical entities, R is the set of the real relations of links, u is a neighbor node, v is a current node, N is^(r)(v) For the neighbor nodes of the current node v in the real relation of r link,

the vector representation of the neighbor node of the previous layer is 1, namely the current layer, and AGGREGATE (-) is the aggregation operation for combining the neighbor information of all the current nodes v together;

medical entity node output vector representation

Comprises the following steps:

wherein the content of the first and second substances,

is a vector representation of the node d of the medical entity of the previous layer, W_dFor the weight parameter of medical entity node d, MEAN () is the averaging function, and σ () is the activation function;

patient node output vector representation

Comprises the following steps:

wherein the content of the first and second substances,

is a vector representation of the patient node p of the previous layer, W_pIs the weight parameter of the patient node p.

The multitask weighted loss function L is:

wherein e is^-ηmIs an index of the mth loss function weight factor,

is the loss value of the mth loss function.

An electronic medical record prediction device based on a graph neural network, comprising a computer memory, a computer processor and a computer program stored in the computer memory and executable on the computer processor, wherein the computer memory employs the final graph SAGE graph neural network model according to any one of claims 1-7;

the computer processor, when executing the computer program, performs the steps of:

and inputting the medical entity vector representation and the patient vector representation into the final GraphSAGE graph neural network to predict the association probability of the medical entity and the patient.

Compared with the prior art, the invention has the beneficial effects that:

(1) the medical entity co-occurrence information in the ICD codes of the medical ontology is introduced into the co-occurrence matrix of the medical entities of the electronic medical record, so that the co-occurrence matrix of the medical entities is expanded, the relationships among the medical entities and between the medical entities and patients are enriched, and the relevance between the patients and the medical entities is more accurately obtained through the learnt graph neural network.

(2) According to the invention, the link relation between the medical entities and the patients are established through the heterogeneous graph, and the relevance between the medical entities and the patients can be accurately determined through the multi-task weighted loss function training.

Drawings

Fig. 1 is a flowchart of an electronic medical record retrieval method based on a graph neural network according to an embodiment of the present invention.

Fig. 2 is a flowchart of a multitask weighted loss function optimization graph neural network model according to an embodiment of the present invention.

Detailed Description

The following clearly and completely describes an implementation scheme of the electronic medical record link prediction method based on the graph neural network and fusing knowledge in combination with the accompanying drawings.

An electronic medical record prediction method based on a graph neural network is disclosed, as shown in fig. 1, and specifically comprises the following steps:

s1: the frequency product of every two medical entities of each patient in the visit record is used as the co-occurrence information of every two medical entities, the co-occurrence matrix of the visit record of each time is constructed based on the co-occurrence information of every two medical entities, the co-occurrence matrixes of the multiple times of the visit records of each patient are added to obtain the co-occurrence matrix of the electronic medical record of each patient, and the co-occurrence matrixes of the electronic medical records of the multiple patients are added to obtain the co-occurrence matrix of the medical entities in the electronic medical record.

Wherein the co-occurrence information co-occurrence (c) of every two medical entities_j，c_jAnd p) is:

co-occurrence(c_i，c_j，p)＝count(c_i，p)×count(c_j，p)

wherein, count (c)_iP) count (c) the number of occurrences of the ith medical entity for the p patient in each visit record_jP) is the number of occurrences of the jth medical entity for the pth patient in each visit record.

Acquiring a co-occurrence matrix of medical entities in an electronic medical record, taking each medical entity as a leaf node, acquiring a plurality of ancestor nodes corresponding to the leaf node by traversing ICD codes of medical ontology from bottom to top, extracting the medical entities corresponding to the ancestor nodes to acquire ancestor medical entities, acquiring co-occurrence information of each medical entity and the ancestor medical entities in the ICD codes of the medical ontology, adding the co-occurrence information into the medical entity co-occurrence matrix to acquire an enhanced medical entity co-occurrence matrix, expanding the medical entity co-occurrence matrix, setting initial vector representation of each medical entity, inputting the initial vector representation into a GloVe model, and acquiring vector representation of each medical entity through target function training, wherein a target function J is as follows:

wherein M is_ijTo enhance the co-occurrence product of the ith and jth entity vectors in the medical entity co-occurrence matrix, | D | is the number of medical entities, e |_jIs a vector representation of the jth medical entity, e_iIs a vector representation of the ith medical entity, b_iBias parameter for the ith medical entity, b_jIs the bias parameter for the jth medical entity.

And taking as the patient vector representation an aggregated result of the plurality of medical entity vector representations associated with the patient, the aggregated result comprising a summation, an average, a maximum, or a minimum.

S2: taking each medical entity vector representation as the initial input of each medical entity node, taking each patient vector representation as the initial input of each patient node, connecting related medical entities to obtain a real link relation between the medical entities, and connecting the related medical entities and patients to obtain a real link relation between the patients and the medical entities; so as to construct an electronic medical record abnormal composition.

S3: inputting the electronic medical record abnormal graph into a GraphSAGE graph neural network, obtaining the current layer neighbor node vector representation through the real link relationship between medical entities and the real link relationship between a patient and the medical entities, the aggregators of the GraphSAGE, and the sum calculation method, wherein the current layer neighbor node vector representation

Comprises the following steps:

wherein R is the real link relationship between medical entities or the real link relationship between the patient and the medical entities, R is the set of the real link relationship, u is the neighbor node, v is the current node, N (R), (v) is the neighbor node of the current node v on the real link relationship of R,

and the neighbor node vector representation of the previous layer is 1, the current layer is AGGREGATE (.) and the aggregation operation is used for combining the neighbor information of all current nodes v together.

medical entity node output vector representation

Comprises the following steps:

wherein the content of the first and second substances,

patient node output vector representation

Comprises the following steps:

wherein the content of the first and second substances,

Based on the patient node output vector representation and the medical entity node output vector representation, the probability of the link relation between the medical entity and the patient is obtained by adopting an activation function

Comprises the following steps:

wherein z is_dIs an output vector representation of the medical entity node, z_pOutput vector representation, δ, for the patient node(. cndot.) is an activation function.

Based on medical entity node output vector representation, the probability of the link relation between medical entities is obtained by adopting an activation function

Is as follows;

wherein z is_d′A vector representation is output for another medical entity node.

S4: constructing a total loss function: as shown in fig. 2, the node information of the abnormal graph G is calculated by using the graph neural network, and a first loss function is constructed by the cross entropy of the patient-medical entity link real relationship (if the link relationship exists, the patient-medical entity link real relationship is 1, and if the link relationship does not exist, the patient-medical entity link real relationship is 0) and the probability of the patient-medical entity link real relationship to train the patient-medical entity relationship link prediction task L₁；

Training medical entity-medical entity relation link prediction task L by constructing second loss function through cross entropy of link real relation between medical entities and link relation probability of medical entities and medical entities₂；

Constructing a multitask weighted loss function learning weight factor eta through the loss value of the first loss function and the loss value of the second loss function;

the training method using the multi-task weighted loss function is combined with the two loss functions to carry out optimization simultaneously, and the multi-task weighted loss function L is as follows:

wherein e is^-ηmAs a mth loss function weight factor η_mThe index of (a) is,

and (4) finishing training to obtain a weight factor eta if the m-th loss function is a loss value, and continuously calculating the node information of the abnormal graph G by using the graph neural network if the m-th loss function is not converged.

Training a GraphSAGE graph neural network by using a total loss function, and updating parameters to obtain a final GraphSAGE graph neural network;

s5: when the method is applied, the medical entity vector representation and the patient vector representation are input into the final GraphSAGE graph neural network to predict the association probability of the medical entity and the patient.

Based on the method, the relationship range between the medical entities and the relationship range between the medical entities and the patients are expanded, so that the association degree between the medical entities and the patients can be accurately predicted.

Claims

1. An electronic medical record retrieval method based on a graph neural network is characterized by comprising the following steps:

(1) acquiring a co-occurrence matrix of medical entities in the electronic medical record, traversing medical ontology knowledge ICD codes to obtain a plurality of ancestor medical entities corresponding to the medical entities, adding co-occurrence information of the medical entities and the ancestor medical entities into the medical entity co-occurrence matrix to obtain an enhanced medical entity co-occurrence matrix, extracting vectors of each medical entity based on the enhanced medical entity co-occurrence matrix by adopting a GloVe model to represent, and representing an aggregation result of a plurality of medical entity vectors associated with a patient as a patient vector;

each medical entity vector is represented as an initial attribute of each medical entity node, each patient vector is represented as an initial attribute of each patient node, related medical entities are connected to obtain a real link relationship between the medical entities, and the related medical entities and patients are connected to obtain a real link relationship between the patients and the medical entities;

(3) inputting the electronic medical record abnormal graph into a GraphSAGE graph neural network to respectively obtain the output vector representation of the patient node and the output vector representation of the medical entity node; obtaining the probability of the link relation between the patient and the medical entity by adopting an activation function based on the patient node output vector representation and the medical entity node output vector representation; obtaining the probability of the link relation between the medical entities by adopting an activation function based on the medical entity node output vector representation;

2. The method of claim 1, wherein obtaining a co-occurrence matrix of medical entities in the electronic medical record comprises:

3. The method for retrieving the electronic medical record based on the graph neural network as claimed in claim 1, wherein traversing the ICD code to obtain a plurality of ancestor medical entities corresponding to the medical entity comprises:

4. The electronic medical record retrieval method based on the graph neural network as claimed in claim 1, wherein the extracting of each medical entity vector representation using a GloVe model based on the enhanced medical entity co-occurrence matrix comprises:

5. The method of claim 1, wherein the aggregating results of the plurality of medical entity vector representations associated with the patient are used as the patient vector representation, and the aggregating operation comprises summing, averaging, maximizing or minimizing.

6. The method for retrieving the electronic medical record based on the graph neural network as claimed in claim 1, wherein the step of inputting the electronic medical record differential map into the graph neural network to respectively obtain the output vector representation of the patient node and the output vector representation of the medical entity node comprises:

carrying out Mean agglomerator aggregation in GraphSAGE on the current layer neighbor node vector representation and the previous layer output vector representation of the patient node to obtain a patient node output vector representation, and carrying out Mean agglomerator aggregation in GraphSAGE through the current layer neighbor node vector representation and the previous layer output vector representation of the medical entity node to obtain a medical entity node output vector representation;

wherein the current layer neighbor node vector represents

Comprises the following steps:

wherein R is the real relation of the links between the medical entities or the real relation of the links between the patient and the medical entities, R is the set of the real relations of the links, u is the neighbor node, v is the current node, N^(r)(v) For the neighbor nodes of the current node v in the real relation of r link,

the vector representation of the neighbor node of the previous layer is shown, l is the current layer, and AGGREGATE (-) is the aggregation operation for combining the neighbor information of the current node v together;

medical entity node output vector representation

Comprises the following steps:

wherein the content of the first and second substances,

patient node output vector representation

Comprises the following steps:

wherein the content of the first and second substances,

7. The method for retrieving the electronic medical record based on the graph neural network as claimed in claim 1, wherein the multitask weighted loss function L is:

wherein the content of the first and second substances,

as the mth loss function weight factor eta_mThe index of (a) is,

is the loss value of the mth loss function.

8. An electronic medical record retrieval device based on a graph neural network, comprising a computer memory, a computer processor and a computer program stored in the computer memory and executable on the computer processor, wherein the computer memory adopts the final graph neural network model of any one of claims 1 to 7;

and inputting the medical entity vector representation and the patient vector representation into the final GraphSAGE diagram neural network to predict the association probability of the medical entity and the patient.