CN113380360B - Similar medical record retrieval method and system based on multi-mode medical record map - Google Patents

Similar medical record retrieval method and system based on multi-mode medical record map Download PDF

Info

Publication number
CN113380360B
CN113380360B CN202110629894.0A CN202110629894A CN113380360B CN 113380360 B CN113380360 B CN 113380360B CN 202110629894 A CN202110629894 A CN 202110629894A CN 113380360 B CN113380360 B CN 113380360B
Authority
CN
China
Prior art keywords
medical record
graph
similarity
data
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110629894.0A
Other languages
Chinese (zh)
Other versions
CN113380360A (en
Inventor
王晓黎
罗峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Original Assignee
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University filed Critical Xiamen University
Priority to CN202110629894.0A priority Critical patent/CN113380360B/en
Publication of CN113380360A publication Critical patent/CN113380360A/en
Application granted granted Critical
Publication of CN113380360B publication Critical patent/CN113380360B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/532Query formulation, e.g. graphical querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Epidemiology (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The invention discloses a similar medical record retrieval method and a similar medical record retrieval system based on a multi-mode medical record map. The method comprises the following steps: acquiring medical record graph structural features corresponding to a target medical record and a sample medical record, wherein the medical record graph structural features comprise a plurality of entity graph structural features, the entity graph structural features comprise graph relation features and node attribute features of two entities in a medical record association relation topological graph, the medical record association relation topological graph is constructed according to the corresponding medical record, the entities in the medical record association relation topological graph comprise diseases, medicines and operations in the corresponding medical record, the association relation comprises a medical relation between the two entities in the corresponding medical record, and the node attribute features comprise multi-dimensional attribute features of the entities. And calculating the similarity of the medical record graph structural characteristics corresponding to the target medical record and the medical record graph structural characteristics corresponding to each sample medical record by adopting a similarity calculation model based on a graph neural network. And determining the sample medical record similar to the target medical record according to the similarity. The invention has the characteristic of high retrieval precision.

Description

Similar medical record retrieval method and system based on multi-mode medical record map
Technical Field
The invention relates to the field of similar medical record retrieval, in particular to a similar medical record retrieval method and system based on a multi-mode medical record map.
Background
Similar medical records are retrieved for screening similar medical records.
At present, there is a similar medical record retrieval mode based on features combined with a deep neural network: and (3) representing the medical records as a set of medical concepts, learning the medical concepts and the feature representation of the medical records through a deep neural network, and calculating the similarity of the medical records based on the medical concepts and the feature representation of the medical records.
However, in the above similar medical record retrieval method based on features, only the medical concepts in the medical records are serialized, and the similarity between the medical records is determined by the similarity of the serialized medical concepts between the medical records.
Disclosure of Invention
The invention aims to provide a similar medical record retrieval method and a similar medical record retrieval system based on a multi-mode medical record map, so that the retrieval precision is guaranteed on the basis that no large amount of medical record data are used for feature learning.
In order to achieve the purpose, the invention provides the following scheme:
a similar medical record retrieval method based on a multi-modal medical record map comprises the following steps:
acquiring medical record graph structural features corresponding to a target medical record and medical record graph structural features corresponding to a sample medical record, wherein the medical record graph structural features comprise a plurality of entity graph structural features, the entity graph structural features comprise graph relation features and node attribute features of two entities in a medical record incidence relation topological graph, the medical record incidence relation topological graph is constructed according to the corresponding medical record, the entities in the medical record incidence relation topological graph comprise diseases, medicines and operations in the corresponding medical record, the incidence relation in the medical record incidence relation topological graph comprises medical relations between the two entities in the medical record incidence relation topological graph corresponding to the medical record, and the node attribute features comprise multi-dimensional attribute features of the entities;
calculating the similarity of the medical record graph structural characteristics corresponding to the target medical record and the medical record graph structural characteristics corresponding to each sample medical record by adopting a similarity calculation model based on a graph neural network;
and determining a sample medical record similar to the target medical record according to the similarity.
Optionally, the node attribute feature includes multi-dimensional feature fusion data of the entity, and the multi-dimensional feature fusion data fuses at least two of image feature data, text feature data, and ontology feature data of the entity.
Optionally, before acquiring the structural features of the medical record map, the method further includes:
determining the correlation between every two entities in the medical record;
and when the two entities have correlation, creating an edge between the two entities in the medical record association relation topological graph corresponding to the medical record.
Optionally, the determining the correlation between two entities in the medical record specifically includes:
taking a medical record as a document, taking entities in the medical record as words, and determining the correlation between every two words in the document by adopting a PMI algorithm to obtain the correlation between every two entities.
Optionally, the similarity calculation model includes a gated graph neural network module and a similarity determination module;
the gated graph neural network module is to: outputting first data according to the medical record graph structure characteristics corresponding to the target medical record, and outputting second data according to the medical record graph structure characteristics corresponding to the sample medical record;
the similarity determination module is configured to: and determining the similarity between the target medical record and the sample medical record according to the first data and the second data.
Optionally, the gated graph neural network module includes a plurality of graph neural networks, and a gating layer located between adjacent graph neural networks; the gating layer is used for filtering data output by the neural network of the previous graph.
Optionally, the similarity determining module includes a global similarity calculating unit, a local similarity calculating unit, and a full connection layer;
the global similarity calculation unit is configured to: calculating a first graph feature vector of the first data and a second graph feature vector of the second data by adopting an attention mechanism; calculating the global similarity of the target medical record and the sample medical record by adopting a neural tensor network according to the first chart feature vector and the second chart feature vector;
the local similarity calculation unit is configured to: calculating the similarity of the first data and the second data in each layer by adopting a cosine similarity calculation method according to the characterization data of the first data in each feature extraction layer and the characterization data of the second data in each feature extraction layer to obtain a local similarity matrix of the target medical record and the sample medical record;
the full connection layer is used for: and calculating the similarity of the target medical record and the sample medical record according to the global similarity and the local similarity matrix.
The invention also provides a similar medical record retrieval system based on the multi-mode medical record map, which comprises the following steps:
the medical record graph structure characteristic acquisition module is used for acquiring medical record graph structure characteristics corresponding to a target medical record and medical record graph structure characteristics corresponding to a sample medical record, the medical record graph structure characteristics comprise a plurality of entity graph structure characteristics, the entity graph structure characteristics comprise graph relation characteristics and node attribute characteristics of two entities in a medical record association relation topological graph, the medical record association relation topological graph is constructed according to the corresponding medical record, the entities in the medical record association relation topological graph comprise diseases, medicines and operations in the corresponding medical record, and the association relation in the medical record association relation topological graph comprises a medical relation between the two entities in the medical record association relation topological graph corresponding to the medical record;
the similarity calculation module is used for calculating the similarity between the medical record graph structural characteristics corresponding to the target medical record and the medical record graph structural characteristics corresponding to each sample medical record by adopting a similarity calculation model based on a graph neural network;
and the similar medical record determining module is used for determining the sample medical record similar to the target medical record according to the similarity.
Optionally, the node attribute feature includes multi-dimensional feature fusion data of the entity, and the multi-dimensional feature fusion data fuses at least two of image feature data, text feature data, and ontology feature data of the entity.
Optionally, the similarity calculation model includes a gated graph neural network module and a similarity determination module;
the gated graph neural network module is to: outputting first data according to the medical record graph structure characteristics corresponding to the target medical record, and outputting second data according to the medical record graph structure characteristics corresponding to the sample medical record;
the similarity determination module is to: determining the similarity of the target medical record and the sample medical record according to the first data and the second data;
the gated graph neural network module comprises a plurality of graph neural networks and a gating layer positioned between adjacent graph neural networks; the gating layer is used for filtering data output by the neural network of the previous graph;
the similarity determining module comprises a global similarity calculating unit, a local similarity calculating unit and a full connection layer;
the global similarity calculation unit is configured to: calculating a first graph feature vector of the first data and a second graph feature vector of the second data by adopting an attention mechanism; calculating the global similarity of the target medical record and the sample medical record by adopting a neural tensor network according to the first chart feature vector and the second chart feature vector;
the local similarity calculation unit is configured to: calculating the similarity of the first data and the second data in each layer by adopting a cosine similarity calculation method according to the characterization data of the first data in each feature extraction layer and the characterization data of the second data in each feature extraction layer to obtain a local similarity matrix of the target medical record and the sample medical record;
the full connection layer is used for: and calculating the similarity of the target medical record and the sample medical record according to the global similarity and the local similarity matrix.
According to the specific embodiment provided by the invention, the following technical effects are disclosed: the method comprises the steps of calculating similarity of a target medical record and a sample medical record by using medical record graph structural features of the target medical record and medical record graph structural features of the sample medical record, wherein the medical record graph structural features comprise multi-dimensional attribute features of entity concept entities and relation features in the medical record besides attribute features of all entity concepts in the medical record.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
Fig. 1 is a flowchart of a similar medical record retrieval method based on a multi-modal medical record chart according to embodiment 1 of the present invention;
FIG. 2 is a structural diagram of a multi-modal characterization learning module in embodiment 1 of the present invention;
fig. 3 is a structural diagram of a similarity calculation model in embodiment 1 of the present invention;
fig. 4 is a schematic structural diagram of a similar medical record retrieval system based on a multi-modal medical record chart according to embodiment 2 of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The object of the invention is.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Example 1
Referring to fig. 1, the embodiment provides a similar medical record retrieval method based on a multi-modal medical record map, which includes the following steps:
step 101: the method comprises the steps of obtaining medical record graph structural features corresponding to a target medical record and medical record graph structural features corresponding to a sample medical record, wherein the medical record graph structural features comprise a plurality of entity graph structural features, the entity graph structural features comprise graph relation features and node attribute features of two entities in a medical record association relation topological graph, the medical record association relation topological graph is constructed according to the corresponding medical record, the entities in the medical record association relation topological graph comprise diseases, medicines and operations in the corresponding medical record, the association relation in the medical record association relation topological graph comprises medical relations between the two entities in the medical record association relation topological graph corresponding to the medical record, and the node attribute features comprise multi-dimensional attribute features of the entities.
Step 102: and calculating the similarity of the medical record graph structural characteristics corresponding to the target medical record and the medical record graph structural characteristics corresponding to each sample medical record by adopting a similarity calculation model based on a graph neural network.
Step 103: and determining a sample medical record similar to the target medical record according to the similarity.
In this embodiment, the node attribute feature includes multi-dimensional feature fusion data of the entity, and the multi-dimensional feature fusion data fuses at least two of image feature data, text feature data, and ontology feature data of the entity. The image feature data, text feature data, and ontology feature data of the entity may be image feature data, text feature data, and ontology feature data of the entity in the corresponding medical record, or image feature data, text feature data, and ontology feature data of the entity extracted from other medical records, files, websites, and the like other than the corresponding medical record. For example, when the corresponding medical record does not have image feature data, text feature data or ontology feature data of a certain entity, the image feature data, text feature data or ontology feature data related to the entity can be crawled from the outside, such as a website and the like.
The extraction of the image feature data, the text feature data, and the body feature data will be described in detail later.
In an example, the topological graph of the medical record association relationship in step 101 of this embodiment is constructed, and when the topological graph of the medical record association relationship is constructed, entities in the medical record need to be extracted, the association relationship between the entities is determined, and the topological graph of the medical record association relationship corresponding to the medical record is constructed based on the association relationship between the entities. The method comprises the following steps:
taking a medical record as a document, taking entities in the medical record as words, and determining the correlation between every two words in the document by adopting a PMI algorithm to obtain the correlation between every two entities. And when the two entities have correlation, creating an edge between the two entities in the medical record incidence relation topological graph corresponding to the medical record.
The specific process can be as follows:
1. constructing a 'treatment knowledge map' TKG: the TKG is composed of (medical concept entity, medical relationship, medical concept entity) triplets, and mainly comprises three medical concept entities, namely disease, operation and medicine, and two medical relationships, namely 'operation treatment' and 'medicine treatment'. A common Point Mutual Information (PMI) algorithm in the field of natural language processing is mainly adopted to mine reliable medical concept entity relationship triples. And (3) referring to a PMI algorithm, regarding the medical records as documents, regarding the medical concept entities in the medical records as words, and calculating the correlation between the medical concept entities.
Suppose that two different kinds of medical concept entities are represented as
Figure BDA0003102964150000061
Correlation of
Figure BDA0003102964150000062
The calculation steps are as follows:
Figure BDA0003102964150000063
by screening
Figure BDA0003102964150000064
Positive number medical concept entity pair
Figure BDA0003102964150000065
TKG is stored as a relationship triplet with reliable dependencies
Figure BDA0003102964150000066
Wherein r isijFor medical concept entity pair
Figure BDA0003102964150000067
The relationship (c) in (c).
2. Constructing a medical record association relation topological graph: based on the established knowledge graph TKG, the step establishes the knowledge graph TKG for the medical recordAnd G & ltplemodality medical record graph & gt (V, epsilon). First, we extract the medical concept entity V of the structured disease from the medical recorddPharmaceutical medical concept entity VmSurgical medical concept entity VpAs a graph node V ═ Vd∪Vm∪Vp}. Second, we traverse the disease nodes in the graph
Figure BDA0003102964150000068
Searching nodes of diseases in knowledge graph TKG
Figure BDA0003102964150000069
Surgical node with medical relationship
Figure BDA00031029641500000610
And drug node
Figure BDA00031029641500000611
By connecting all disease nodes and operation node pairs
Figure BDA00031029641500000612
And disease node and drug node pairs
Figure BDA00031029641500000613
And realizing the construction of the edge epsilon of the medical record graph. Finally, for all graph nodes, the collected multi-modal data, namely multi-dimensional attribute features (image feature data, text feature data, ontology feature data and the like) of the entity are combined to further enrich the content of the nodes.
For the extraction and fusion of image feature data, text feature data and ontology feature data in the multi-modal data, the following method can be adopted to realize the following steps:
referring to fig. 2, a multi-modal characterization learning module is used to perform feature-representation fusion learning on multi-modal data in a node.
Suppose a medical concept entity node in the medical record chart is
Figure BDA0003102964150000071
As shown in fig. 2, the multi-modal representation learning module (1) first obtains image data, text data and feature representations corresponding to medical ontology data, namely image representations, text representations and ontology representations, from the multi-modal data of the nodes respectively; (2) and secondly, fusing the image, text and ontology representations of the nodes into multi-modal representations, and giving multi-modal representation information to the nodes, thereby enhancing the feature representation of the nodes. Wherein each characterization is learned as follows:
image characterization: performing feature learning on image data by adopting a pre-trained ResNet50 model, and extracting 2048-dimensional vector output of the last full-connection layer of the model as an image representation
Figure BDA0003102964150000072
Text characterization: learning Word vector representation in medical text data by using static Word vector pre-training model Word2Vec, and extracting Word vectors corresponding to medical concept entities as text representations
Figure BDA0003102964150000073
Ontology characterization: the method for acquiring the ontology representation comprises the steps that (1) ontology knowledge is expressed into a hierarchical tree structure according to the hierarchical division of ontology data, tree nodes are converted into graph nodes, subtree branches are converted into bidirectional edges, and a bidirectional acyclic graph (DAG) is obtained, wherein leaf nodes in the graph represent related medical concept entities; (2) secondly, a bottom-up learning strategy and a learning strategy from ancestor nodes to leaf nodes are proposed based on the graph attention network, and feature representations of all nodes in the bidirectional acyclic graph are learned step by step.
The learning strategy from bottom to top is as follows: suppose node c in a bidirectional acyclic graph*Is initialized to
Figure BDA0003102964150000074
The bottom-up learning strategy combines the nodes c in the graphiChild node ch (c)i) As the node feature representation, the calculation steps are shown in formula 1:
Figure BDA0003102964150000075
wherein the content of the first and second substances,
Figure BDA0003102964150000081
is a matrix of the input to the conversion,
Figure BDA0003102964150000082
is a weight vector and the LeakyReLU is a non-linear calculation function.
Learning strategy from ancestor node to leaf node: on this basis, the learning strategy from ancestor node to leaf node is for each leaf node ci'All integrate ancestor node anc (c)i') Is expressed as shown in equation 2, wherein,
Figure BDA0003102964150000083
and
Figure BDA0003102964150000084
for learnable parameters:
Figure BDA0003102964150000085
Figure BDA0003102964150000086
finally, the image representation, the text representation and the ontology representation are normalized to the same vector space by adopting a linear conversion function, and are fused into a multi-modal representation m.
e'=f(e,W1,b1),t'=f(t,W2,b2),o'=f(o,W3,b3) (3)
m=[e',t',o'] (4)
Wherein o' is a body characteristic.
In this embodiment, the similarity calculation model includes a gated graph neural network module and a similarity determination module. The gated graph neural network module is to: and outputting first data according to the medical record graph structural feature corresponding to the target medical record, and outputting second data according to the medical record graph structural feature corresponding to the sample medical record. The similarity determination module is to: and determining the similarity between the target medical record and the sample medical record according to the first data and the second data.
Wherein the gated graph neural network module comprises a plurality of graph neural networks and a gating layer located between adjacent graph neural networks; the gating layer is used for filtering data output by the neural network of the previous graph.
Referring to fig. 3, the chart structural features corresponding to the target medical record and the chart structural features corresponding to the sample medical record are initialized to
Figure BDA0003102964150000087
And
Figure BDA0003102964150000088
we first learn the neighborhood structure information of nodes using L-level Gated Graph Neural Networks (GGNNs) for updating node characterizations. GGNNs adopt the idea of gated round robin networking (GRU), assuming graph nodes
Figure BDA0003102964150000089
The initialization node is characterized as
Figure BDA00031029641500000810
The nodes of the L layers of GGNNs are characterized in that
Figure BDA00031029641500000811
Node characterization of (l +1) -level GGNNs
Figure BDA00031029641500000812
The update is as follows:
Figure BDA0003102964150000091
Figure BDA0003102964150000092
Figure BDA0003102964150000093
Figure BDA0003102964150000094
Figure BDA0003102964150000095
wherein N (n) is
Figure BDA0003102964150000096
Of the node(s) of (a) is,
Figure BDA0003102964150000097
is a hidden state combined with the feature representation of the neighbor node,
Figure BDA0003102964150000098
and
Figure BDA0003102964150000099
represent the vector of the update gate and the reset gate, W, respectivelyz,Wr,
Figure BDA00031029641500000910
bz,br,
Figure BDA00031029641500000911
Are training parameters.
Referring to fig. 3, in an example, to improve the accuracy of graph similarity calculation, we propose a global interactive learning strategy and a local interactive learning strategy to respectively obtain global graph similarity and local graph similarity, and calculate the final graph similarity by combining the global graph similarity and the local graph similarity. The method comprises the following specific steps: the similarity determining module in the similarity calculation model comprises a global similarity calculating unit, a local similarity calculating unit and a full connection layer.
The global similarity calculation unit is configured to: calculating a first graph feature vector of the first data and a second graph feature vector of the second data by adopting an attention mechanism; and calculating the global similarity of the target medical record and the sample medical record by adopting a neural tensor network according to the first chart feature vector and the second chart feature vector.
The local similarity calculation unit is used for: and calculating the similarity of the first data and the second data in each layer by adopting a cosine similarity calculation method according to the characterization data of the first data in each feature extraction layer and the characterization data of the second data in each feature extraction layer to obtain a local similarity matrix of the target medical record and the sample medical record.
The full connection layer is used for: and calculating the similarity of the target medical record and the sample medical record according to the global similarity and the local similarity matrix.
A global similarity calculation unit: the weights of the nodes are learned by an attention mechanism and a graph characterization vector is calculated based on the weight weighting of the nodes. Using a medical record chart G1For example, assume a node in a graph
Figure BDA00031029641500000912
Is characterized by mnThe nodes of the last layer of GGNNs are characterized as
Figure BDA00031029641500000913
Graph characterization in global interactive learning strategy
Figure BDA00031029641500000914
The calculation of (c) is as follows:
Figure BDA0003102964150000101
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003102964150000102
is a node
Figure BDA0003102964150000103
The attention weight of (a) is given,
Figure BDA0003102964150000104
the parameters are learned for the attention weights. Suppose G1And G2Is characterized by a chart feature vector of g1And g2The global interactive learning strategy adopts a Neural Tensor Network (Neural Tensor Network) to obtain the global graph similarity S of the K granularity1(G1,G2):
Figure BDA0003102964150000105
Wherein
Figure BDA0003102964150000106
And
Figure BDA0003102964150000107
is a matrix of parameters, and is,
Figure BDA0003102964150000108
is a deviation vector.
A local similarity calculation unit: firstly, extracting initial node representation of disease nodes and updated L-layer node representation; secondly, the multi-level local similarity of the disease node pairs is traversed and calculated by combining a cosine similarity calculation method, and a multi-level local node similarity matrix { P } is obtained0,P1,...,PL}:
Figure BDA0003102964150000109
Hypothetical phaseThe similarity matrix is P*Matching histogram algorithm according to P*Grouping all elements into different buckets according to the sizes of the elements, and calculating by combining a logarithm function to obtain a similarity distribution vector q by counting the number of the elements in the grouped buckets*. According to the similarity distribution vector, the local graph similarity S of multiple levels2(G1,G2) The calculation is as follows:
S2(G1,G2)=[q0,q1,...,qL] (13)
finally, based on global graph similarity S1(G1,G2) Similarity to local graph S2(G1,G2) The graph similarity module calculates a final graph similarity S (G) using the full link layer1,G2):
S(G1,G2)=Sigmoid(Ws[S1(G1,G2),S2(G1,G2)]+bs) (14)
The step 103 of this embodiment can be implemented as follows:
sequencing similar medical records: for a target medical record uploaded by a user, firstly, the medical record graph structure characteristics of the target medical record are constructed, secondly, the medical record graph structure characteristics corresponding to all sample medical records in the medical record database are traversed, and the similarity between the medical record graph structure characteristics of the sample medical records and the medical record graph structure characteristics of the target medical record is calculated. And finally, screening the sample medical records with the similarity larger than 0.5 corresponding to the medical record graph structural feature as similar medical records of the target medical record, sequencing the similarity from high to low, and returning the sequenced similar medical records as a retrieval result. Therefore, for the target medical records uploaded by the user, effective retrieval of similar medical records can be achieved, the user can be helped to obtain the similar medical records, and follow-up diagnosis and treatment are assisted.
In order to verify the effectiveness of the method (SMCR method) provided by the invention in improving the similar case history retrieval performance, the experiment firstly selects Auto-Diagnosis, Deep Embedding, MiME, GCT, GRAM, Graph2vec and SimGNN as reference methods. Performance analysis the results of the experiments are shown in the table below, where '√' represents a model using a graph structure.
TABLE 1
Model (model) Picture structure Rate of accuracy Fraction F1
Auto-Diagnosis 0.4085 0.4567
MiME 0.5068 0.4973
GRAM × 0.5423 0.5842
Graph2vec 0.6512 0.5855
Deep Embedding × 0.7295 0.7628
SimGNN 0.8452 0.8509
GCT 0.8682 0.8636
SMCR 0.8773 0.8842
The experimental results in the analysis table show that compared with the best reference method GCT in the past, the SMCR algorithm improves the accuracy rate and the F1 score by 1.0 percent and 2.4 percent respectively. Therefore, the SMCR can effectively learn the multi-mode information and the similar graph structure information in the medical record graph by utilizing the multi-mode learning module and the graph similarity learning module, so that the similar medical record retrieval accuracy is improved together.
Meanwhile, in order to verify the effectiveness of each component in the SMCR method, the influence on the overall performance is compared by removing a certain component of an SMCR algorithm in combination with an ablation experiment. The ablation experiments used a total of 5 different configurations: removing text representation, removing image representation, removing body representation, removing global interaction information and removing local interaction information. By analyzing the experimental results of the first to third structures, the fact that the similar medical record retrieval performance can be effectively improved by multi-mode representation learning through the fusion of text representation, image representation and ontology representation is found. In addition, the graph similarity learning module can calculate the similarity more accurately by acquiring the global interactive information and the more important local interactive information, so that the retrieval performance of similar medical records is enhanced.
TABLE 2
Model (model) Rate of accuracy Fraction F1
SMCR 0.8773 0.8842
The first structure is as follows: removing text tokens 0.8527(-2.8%) 0.8497(-3.9%)
The second structure is as follows: removing image representations 0.8605(-1.9%) 0.8674(-1.9%)
A third configuration: removing ontology representations 0.8643(-1.5%) 0.8679(-1.8%)
A fourth configuration: removing global mutual information 0.835(-5%) 0.8555(-3.2%)
A fifth configuration: removing local mutual information 0.7905(-9.9%) 0.8032(-9.2%)
Aiming at structured medical concept entity data in medical records, the similar medical record retrieval method based on the multi-mode medical record map firstly adopts a knowledge map containing medical concept entity relations and simultaneously combines the multi-mode data (including images, texts and ontologies) to construct a medical record association relation topological map corresponding to the medical records. And finally, combining a similarity calculation model based on a graph neural network to realize depth prediction of medical record similarity. The method combines the incidence relation among the medical entities and the multi-mode data of the medical entities, enriches the entity characteristics, and further effectively improves the accuracy of similar medical record retrieval.
Example 2
Referring to fig. 4, the embodiment provides a similar medical record retrieval system based on a multi-modal medical record map, and the system includes:
a medical record graph structural feature obtaining module 401, configured to obtain medical record graph structural features corresponding to a target medical record and medical record graph structural features corresponding to a sample medical record, where the medical record graph structural features include multiple entity graph structural features, the entity graph structural features include graph relationship features and node attribute features of two entities in a medical record association relationship topological graph, the medical record association relationship topological graph is constructed according to a corresponding medical record, entities in the medical record association relationship topological graph include diseases, medicines and operations in a corresponding medical record, and an association relationship in the medical record association relationship topological graph includes a medical relationship between the two entities in the medical record association relationship topological graph corresponding to the medical record;
a similarity calculation module 402, configured to calculate, using a similarity calculation model based on a graph neural network, a similarity between the medical record graph structural feature corresponding to the target medical record and the medical record graph structural feature corresponding to each sample medical record;
and a similar medical record determining module 403, configured to determine, according to the similarity, a sample medical record similar to the target medical record.
The node attribute features comprise multi-dimensional feature fusion data of the entity, and the multi-dimensional feature fusion data fuses at least two of image feature data, text feature data and body feature data of the entity.
The similarity calculation model comprises a gated graph neural network module and a similarity determination module.
The gated graph neural network module is to: and outputting first data according to the medical record graph structural feature corresponding to the target medical record, and outputting second data according to the medical record graph structural feature corresponding to the sample medical record.
The similarity determination module is configured to: and determining the similarity of the target medical record and the sample medical record according to the first data and the second data.
The gated graph neural network module comprises a plurality of graph neural networks and a gating layer positioned between adjacent graph neural networks; the gating layer is used for filtering data output by the neural network of the previous graph.
The similarity determining module comprises a global similarity calculating unit, a local similarity calculating unit and a full connection layer.
The global similarity calculation unit is configured to: calculating a first graph feature vector of the first data and a second graph feature vector of the second data by adopting an attention mechanism; and calculating the global similarity of the target medical record and the sample medical record by adopting a neural tensor network according to the first chart feature vector and the second chart feature vector.
The local similarity calculation unit is configured to: and calculating the similarity of the first data and the second data in each layer by adopting a cosine similarity calculation method according to the characterization data of the first data in each feature extraction layer and the characterization data of the second data in each feature extraction layer to obtain a local similarity matrix of the target medical record and the sample medical record.
The full connection layer is used for: and calculating the similarity of the target medical record and the sample medical record according to the global similarity and the local similarity matrix.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the description of the method part.
The principle and the embodiment of the present invention are explained by applying specific examples, and the above description of the embodiments is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the foregoing, the description is not to be taken in a limiting sense.

Claims (6)

1. A similar medical record retrieval method based on a multi-mode medical record map is characterized by comprising the following steps:
acquiring medical record graph structural features corresponding to a target medical record and medical record graph structural features corresponding to a sample medical record, wherein the medical record graph structural features comprise a plurality of entity graph structural features, the entity graph structural features comprise graph relationship features and node attribute features of two entities in a medical record association relationship topological graph, the medical record association relationship topological graph is constructed according to the corresponding medical record, the entities in the medical record association relationship topological graph comprise diseases, medicines and operations in the corresponding medical record, the association relationship in the medical record association relationship topological graph comprises medical relationships between the two entities in the medical record association relationship topological graph corresponding to the medical record, and the node attribute features comprise multi-dimensional attribute features of the entities;
calculating the similarity of the medical record graph structural feature corresponding to the target medical record and the medical record graph structural feature corresponding to each sample medical record by adopting a similarity calculation model based on a graph neural network;
the similarity calculation model comprises a gated graph neural network module and a similarity determination module;
the gated graph neural network module is to: outputting first data according to the medical record graph structural feature corresponding to the target medical record, and outputting second data according to the medical record graph structural feature corresponding to the sample medical record;
the gated graph neural network module comprises a plurality of graph neural networks and a gating layer positioned between adjacent graph neural networks; the gating layer is used for filtering data output by the neural network of the previous graph;
the similarity determining module comprises a global similarity calculating unit, a local similarity calculating unit and a full connection layer;
the global similarity calculation unit: learning the weight of the node through an attention mechanism, calculating a graph feature vector of the target medical record and a graph feature vector of the sample case based on the weight of the node, and calculating the global similarity of the target medical record and the sample case according to the graph feature vector of the target medical record and the graph feature vector of the sample case;
wherein the graph characteristic vector
Figure DEST_PATH_IMAGE002
The calculation of (c) is as follows:
Figure DEST_PATH_IMAGE004
wherein, the nodes in the medical record chart
Figure DEST_PATH_IMAGE006
Is characterized by mnThe nodes of the last layer of GGNNs are characterized as
Figure DEST_PATH_IMAGE008
Figure DEST_PATH_IMAGE010
Is a node
Figure DEST_PATH_IMAGE012
The attention weight of (a) is given,
Figure DEST_PATH_IMAGE014
learning parameters for attention weights;
calculating global similarity S of K granularity by adopting neural tensor network1(G1,G2) The calculation formula of (c) is as follows:
Figure DEST_PATH_IMAGE016
wherein, the first and the second end of the pipe are connected with each other,
Figure DEST_PATH_IMAGE018
and
Figure DEST_PATH_IMAGE020
in the form of a matrix of parameters,
Figure DEST_PATH_IMAGE022
is a deviation vector, g1And g2Are respectively the target medical record G1And sample case G2The graph feature vector of (a);
the global similarity calculation unit is configured to: calculating a first graph characteristic vector of the first data and a second graph characteristic vector of the second data by adopting an attention mechanism; calculating the global similarity of the target medical record and the sample medical record by adopting a neural tensor network according to the first chart feature vector and the second chart feature vector;
the local similarity calculation unit is configured to: according to the characterization data of the first data in each feature extraction layer and the characterization data of the second data in each feature extraction layer, calculating the similarity of the first data and the second data in each layer by adopting a cosine similarity calculation method to obtain a local similarity matrix of the target medical record and the sample medical record;
the full connection layer is used for: calculating the similarity between the target medical record and the sample medical record according to the global similarity and the local similarity matrix;
and determining a sample medical record similar to the target medical record according to the similarity.
2. The method for retrieving similar medical records based on multi-modal medical record chart according to claim 1, wherein the node attribute feature comprises multi-dimensional feature fusion data of the entity, the multi-dimensional feature fusion data fusing at least two of image feature data, text feature data and ontology feature data of the entity.
3. The method for retrieving similar medical records based on multi-modal medical record map as claimed in claim 1, further comprising, before obtaining the structural features of the medical record map:
determining the correlation between every two entities in the medical record;
and when the two entities have correlation, creating an edge between the two entities in the medical record association relation topological graph corresponding to the medical record.
4. The method for retrieving similar medical records based on multi-modal medical record map as claimed in claim 3, wherein the determining the correlation between two entities in the medical record specifically comprises:
taking a medical record as a document, taking entities in the medical record as words, and determining the correlation between every two words in the document by adopting a PMI algorithm to obtain the correlation between every two entities.
5. A system for retrieving similar medical records based on a multi-modal medical record map, comprising:
the medical record graph structure characteristic acquisition module is used for acquiring medical record graph structure characteristics corresponding to a target medical record and medical record graph structure characteristics corresponding to a sample medical record, the medical record graph structure characteristics comprise a plurality of entity graph structure characteristics, the entity graph structure characteristics comprise graph relation characteristics and node attribute characteristics of two entities in a medical record association relation topological graph, the medical record association relation topological graph is constructed according to the corresponding medical record, the entities in the medical record association relation topological graph comprise diseases, medicines and operations in the corresponding medical record, and the association relation in the medical record association relation topological graph comprises a medical relation between the two entities in the medical record association relation topological graph corresponding to the medical record;
the similarity calculation module is used for calculating the similarity between the medical record graph structural characteristics corresponding to the target medical record and the medical record graph structural characteristics corresponding to each sample medical record by adopting a similarity calculation model based on a graph neural network;
the similarity calculation model comprises a gated graph neural network module and a similarity determination module;
the gated graph neural network module is to: outputting first data according to the medical record graph structural feature corresponding to the target medical record, and outputting second data according to the medical record graph structural feature corresponding to the sample medical record;
the similarity determination module is configured to: determining the similarity between the target medical record and the sample medical record according to the first data and the second data;
the gated graph neural network module comprises a plurality of graph neural networks and a gating layer positioned between adjacent graph neural networks; the gating layer is used for filtering data output by the neural network of the previous graph;
the similarity determining module comprises a global similarity calculating unit, a local similarity calculating unit and a full connection layer;
the global similarity calculation unit: learning the weight of the node through attention control, calculating a graph characteristic vector of the target medical record and a sample case based on the weight of the node in a weighted manner, and calculating the global similarity of the target medical record and the sample case according to the graph characteristic vector of the target medical record and the graph characteristic vector of the sample case;
wherein the graph characteristic vector
Figure DEST_PATH_IMAGE002A
The calculation of (c) is as follows:
Figure DEST_PATH_IMAGE004A
wherein, the nodes in the medical record chart
Figure DEST_PATH_IMAGE006A
Is characterized by mnThe nodes of the last layer of GGNNs are characterized as
Figure DEST_PATH_IMAGE008A
Figure DEST_PATH_IMAGE010A
Is a node
Figure DEST_PATH_IMAGE012A
The attention weight of (a) is given,
Figure DEST_PATH_IMAGE014A
learning parameters for attention weights;
calculating global similarity S of K granularity by adopting neural tensor network1(G1,G2) The calculation formula of (c) is as follows:
Figure DEST_PATH_IMAGE016A
wherein, the first and the second end of the pipe are connected with each other,
Figure DEST_PATH_IMAGE018A
and
Figure DEST_PATH_IMAGE020A
in the form of a matrix of parameters,
Figure DEST_PATH_IMAGE022A
is a deviation vector, g1And g2Respectively a target medical record G1And sample case G2The graph feature vector of (a);
the global similarity calculation unit is configured to: calculating a first graph characteristic vector of the first data and a second graph characteristic vector of the second data by adopting an attention mechanism; calculating the global similarity of the target medical record and the sample medical record by adopting a neural tensor network according to the first chart feature vector and the second chart feature vector;
the local similarity calculation unit is configured to: calculating the similarity of the first data and the second data in each layer by adopting a cosine similarity calculation method according to the characterization data of the first data in each feature extraction layer and the characterization data of the second data in each feature extraction layer to obtain a local similarity matrix of the target medical record and the sample medical record;
the full connection layer is used for: calculating the similarity between the target medical record and the sample medical record according to the global similarity and the local similarity matrix;
and the similar medical record determining module is used for determining the sample medical record similar to the target medical record according to the similarity.
6. The system of claim 5, wherein the node attribute features comprise multi-dimensional feature fusion data of the entity, the multi-dimensional feature fusion data fusing at least two of image feature data, text feature data, and ontology feature data of the entity.
CN202110629894.0A 2021-06-07 2021-06-07 Similar medical record retrieval method and system based on multi-mode medical record map Active CN113380360B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110629894.0A CN113380360B (en) 2021-06-07 2021-06-07 Similar medical record retrieval method and system based on multi-mode medical record map

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110629894.0A CN113380360B (en) 2021-06-07 2021-06-07 Similar medical record retrieval method and system based on multi-mode medical record map

Publications (2)

Publication Number Publication Date
CN113380360A CN113380360A (en) 2021-09-10
CN113380360B true CN113380360B (en) 2022-07-22

Family

ID=77576263

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110629894.0A Active CN113380360B (en) 2021-06-07 2021-06-07 Similar medical record retrieval method and system based on multi-mode medical record map

Country Status (1)

Country Link
CN (1) CN113380360B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113628709B (en) * 2021-10-09 2022-02-11 腾讯科技(深圳)有限公司 Similar object determination method, device, equipment and storage medium
CN114048340B (en) * 2021-11-15 2023-04-21 电子科技大学 Hierarchical fusion combined query image retrieval method

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11687577B2 (en) * 2018-07-27 2023-06-27 drchrono inc. Identifying missing questions by clustering and outlier detection
CN109830303A (en) * 2019-02-01 2019-05-31 上海众恒信息产业股份有限公司 Clinical data mining analysis and aid decision-making method based on internet integration medical platform
CN111415740B (en) * 2020-02-12 2024-04-19 东北大学 Method and device for processing inquiry information, storage medium and computer equipment
CN111613339B (en) * 2020-05-15 2021-07-09 山东大学 Similar medical record searching method and system based on deep learning
CN111916207B (en) * 2020-08-07 2023-08-08 杭州深睿博联科技有限公司 Disease identification method and device based on multi-mode fusion
CN112489740A (en) * 2020-12-17 2021-03-12 北京惠及智医科技有限公司 Medical record detection method, training method of related model, related equipment and device
CN112542223A (en) * 2020-12-21 2021-03-23 西南科技大学 Semi-supervised learning method for constructing medical knowledge graph from Chinese electronic medical record

Also Published As

Publication number Publication date
CN113380360A (en) 2021-09-10

Similar Documents

Publication Publication Date Title
Kim et al. Transparency and accountability in AI decision support: Explaining and visualizing convolutional neural networks for text information
CN108319686B (en) Antagonism cross-media retrieval method based on limited text space
CN110866124B (en) Medical knowledge graph fusion method and device based on multiple data sources
CN113254648A (en) Text emotion analysis method based on multilevel graph pooling
CN114565104A (en) Language model pre-training method, result recommendation method and related device
CN110659723A (en) Data processing method, device, medium and electronic equipment based on artificial intelligence
El Mohadab et al. Predicting rank for scientific research papers using supervised learning
CN113380360B (en) Similar medical record retrieval method and system based on multi-mode medical record map
EP3940582A1 (en) Method for disambiguating between authors with same name on basis of network representation and semantic representation
CN112307351A (en) Model training and recommending method, device and equipment for user behavior
CN113221882B (en) Image text aggregation method and system for curriculum field
CN107369098A (en) The treating method and apparatus of data in social networks
CN111858940A (en) Multi-head attention-based legal case similarity calculation method and system
Concolato et al. Data science: A new paradigm in the age of big-data science and analytics
Hong et al. Selective residual learning for visual question answering
CN115775349A (en) False news detection method and device based on multi-mode fusion
CN114880427A (en) Model based on multi-level attention mechanism, event argument extraction method and system
Guo et al. Matching visual features to hierarchical semantic topics for image paragraph captioning
CN114373554A (en) Drug interaction relation extraction method using drug knowledge and syntactic dependency relation
CN113569018A (en) Question and answer pair mining method and device
Rao et al. Deep learning-based image retrieval system with clustering on attention-based representations
CN113821610A (en) Information matching method, device, equipment and storage medium
MeshuWelde et al. Counting-based visual question answering with serial cascaded attention deep learning
Tang Analysis of English multitext reading comprehension model based on deep belief neural network
Zhang et al. Bi-directional capsule network model for chinese biomedical community question answering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant