CN114722216A - Entity alignment method based on Chinese electronic medical record knowledge graph - Google Patents

Entity alignment method based on Chinese electronic medical record knowledge graph Download PDF

Info

Publication number
CN114722216A
CN114722216A CN202210413638.2A CN202210413638A CN114722216A CN 114722216 A CN114722216 A CN 114722216A CN 202210413638 A CN202210413638 A CN 202210413638A CN 114722216 A CN114722216 A CN 114722216A
Authority
CN
China
Prior art keywords
entity
network
knowledge
view
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210413638.2A
Other languages
Chinese (zh)
Inventor
李丽双
董姜媛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN202210413638.2A priority Critical patent/CN114722216A/en
Publication of CN114722216A publication Critical patent/CN114722216A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Epidemiology (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Biophysics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention provides an entity alignment method based on a knowledge graph of a Chinese electronic medical record. The method comprises the following steps: constructing a training set and a test set, and constructing the training set and the test set for entity alignment on the heterogeneous medical knowledge map; reasoning is carried out on the medical knowledge graph by using rules, the missing relation is supplemented, and the structural heterogeneity among the medical knowledge graphs is relieved; a dual-view neural network model based on a gating mechanism is constructed, aiming at aligning and fusing heterogeneous medical knowledge maps, the accuracy of Hits @5 is as high as 85.4% on the basis of considering both the accuracy and the labor cost, the existing medical resources are effectively integrated, and the development of intelligent medical treatment is promoted.

Description

Entity alignment method based on Chinese electronic medical record knowledge graph
Technical Field
The invention belongs to the field of natural language processing, and relates to an entity alignment method in construction of a Knowledge Graph (KG) of Chinese Electronic Medical Record (EMR), in particular to a Medical Knowledge Graph alignment method based on a neural network and a gating mechanism.
Background
The Chinese electronic medical record is one of the products of the information medical health service, which contains a great deal of medical facts, and has important significance in automatically acquiring and integrating effective medical information from a great deal of electronic medical records by utilizing a natural language processing technology along with the accumulation of a great deal of domestic electronic medical records. The construction of the knowledge graph related to the electronic medical record is one of the most effective methods for displaying and utilizing the medical information in the electronic medical record. The knowledge graph is a knowledge representation form, and can structure, normalize and clearly show information elements in a graph form. The knowledge graph based on the electronic medical record is a vertical domain knowledge graph, and is beneficial to establishing a scientific intelligent medical knowledge base and a knowledge network. However, due to different knowledge sources and different construction purposes, overlapping knowledge and complementation phenomena occur between different knowledge maps, and particularly in medical knowledge maps, the phenomena are ubiquitous. The large-scale medical knowledge map established by the knowledge fusion technology is beneficial to improving the data quality so as to promote the development of intelligent medical treatment. The most critical technique in knowledge fusion is entity alignment, whose goal is to discriminate whether two entities in different knowledge-maps point to the same thing in the real world.
In the general field, early entity alignment used a similarity measure method to determine whether two entities are aligned by calculating their string similarity and structural similarity. However, such methods typically align entities according to manually designed rules, making them very complex and difficult to implement. Later, as knowledge graph representation learning developed, more and more researchers began to use knowledge graph representation learning instead of symbolic formal senses to solve the problem. The best known knowledge graph representation learning model is a set of translation-based models, the most classical of which is TransE. Although the translation-based method does not depend on manually constructed rules, the translation-based model can only obtain the first-order neighborhood information of the entity, so the learned entity embedding can only represent the local features of the entity, and the structural information of the knowledge graph cannot be fully utilized.
With the development of graph neural networks in recent years, some studies use graph neural networks to model the structure of knowledge graphs, and enhance entity embedding with neighbor entities, i.e., learn central entity representations by recursively aggregating representations of neighbor entities using graph convolution. Thus, the more similar the entity neighborhood structure, the closer the representation learned for the central entity based on the model of the graph neural network. However, due to different knowledge sources and different construction purposes, structural heterogeneity exists among knowledge maps, and the development of the model is seriously influenced. Wang et al (Cross-linking knowledge graph view graph volume networks. in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp.349-357 (2018)) first attempted to use graph convolution networks for entity alignment. Ye et al (A vectored relative network for multi-relative network alignment. in: IJCAI, pp.4135-4141 (2019)) introduces the translation characteristics of the TransE model in the convolution process, and adds different neighbors of the entity into corresponding relationship vectors for merging and representing. However, none of them consider the impact of knowledge-graph heterogeneity on entity alignment.
At present, entity alignment methods in the medical field are very deficient, and a systematic and efficient solution is not available. Since the manifestation of entities in the medical field is complex and diverse, such as the disease entity "chronic viral hepatitis b" may exist in the form of the entity "hepatitis b" or "HBV infection" in the knowledge map, the similarity measurement-based method in the general field is not suitable for the medical field. In addition, because the structural difference of the medical knowledge graph is obvious, the translation-based model only fuses the first-order neighborhood of the entity and cannot fully utilize the structural information of the knowledge graph, and meanwhile, the capability of relieving the heterogeneity of the knowledge graph is limited, so the method is not suitable for entity alignment in the medical field. Many researchers try to solve the entity alignment problem in the general field by using the graph neural network, but due to the complicated knowledge in the medical field, the graph structures generated by different electronic medical records have significant differences, and the structural heterogeneity is more obvious on the medical knowledge graph. The current research on solving the alignment of entities in the medical field using graph neural networks is nearly blank.
Disclosure of Invention
The invention provides a double-view-angle graph neural network entity alignment model, which is used for solving the problem of medical knowledge map fusion based on Chinese electronic medical records and is beneficial to improving the data quality of the existing medical knowledge map so as to promote the development of intelligent medical treatment.
The invention mainly comprises three parts: 1. constructing a training set and a test set, and constructing the training set and the test set for entity alignment on the heterogeneous medical knowledge map; 2. reasoning is carried out on the medical knowledge graph by using rules, the missing relation is supplemented, and the structural heterogeneity among the medical knowledge graphs is relieved; 3. a dual-view neural network model based on a gating mechanism is constructed, and aims to align and fuse heterogeneous medical knowledge maps.
The technical scheme adopted by the invention is as follows:
an entity alignment method based on knowledge graph of Chinese electronic medical record respectively obtains entity sets of two heterogeneous medical knowledge graphs, and carries out seed pair labeling on entities, for example: the entities AIDS and AIDS from knowledge-maps megg _1 and megg _2, respectively, are a set of seed pairs. Taking the entity seed pair which is labeled successfully as a training set and a test set; also comprises the following steps:
step 1, complementing a knowledge graph by using rules;
the invention completes the medical knowledge graph by filling up the missing relationship to relieve the difference in the structure of the heterogeneous medical knowledge graph. According to preset rules in each heterogeneous medical knowledge graph, combining all the rules to obtain a rule set K; applying a rule set K to each heterogeneous medical knowledge graph, giving a rule K belonging to K, searching all prerequisite triples meeting the rule, deducing a conclusion heterogeneous triple according to the rule, and adding the conclusion heterogeneous triple to the original knowledge graph if the conclusion heterogeneous triple does not exist in the original knowledge graph to fulfill the aim of knowledge graph completion; obtaining two supplemented heterogeneous medical knowledge maps G ═ E, R, T and G ' ═ E ', R ', T ', wherein E, E ' respectively represents a first entity set and a second entity set, R, R ' respectively represents a first relation set and a second relation set, T, T ' respectively represents a first triple set and a second triple set, E belongs to E, R belongs to R, and T belongs to T and respectively represents any entity, relation and triple in G; the invention completes the medical knowledge graph by filling up the missing relationship to relieve the difference in the structure of the heterogeneous medical knowledge graph.
Step 2, constructing a dual-view neural network model based on a gating mechanism;
the final purpose of entity alignment is to find the pair of entities in all E and E' that are pointing the same. At present, most of entity alignment models based on the neural network cannot well solve the problems caused by medical knowledge map isomerism, the invention provides a novel dual-view neural network model DvGNet based on a gating mechanism on the basis of the existing research to solve the entity alignment problem, and the model relieves the isomerism of the medical knowledge map from two visual angles. Wherein the entity representation of each view is obtained by aggregating multi-hop neighborhood information through a gating mechanism.
The model takes as input two medical knowledge maps G and G' and as output an embedded representation of each entity, and determines whether it should be aligned by measuring the distance between the embedded representations of the entities. The model comprehensively considers the structural heterogeneity of the knowledge graph from an entity interaction visual angle and a relation interaction visual angle. The model details are as follows:
step 2.1, constructing an entity interactive visual angle network;
ideally, if the central entities to be aligned have the same neighborhood structure, the graph neural network-based approach can learn very similar entity representations for the central entities. However, due to different purposes and sources of construction, medical knowledge maps often have pattern heterogeneity and imperfection, which results in different neighborhood of the central entity, which is an important reason for structural heterogeneity, and is referred to as neighborhood heterogeneity of the entity in the present invention.
The entity interaction perspective iteratively learns an accurate self-attention score for each neighbor of the entity through a self-attention mechanism, and important neighbors are given higher weights in the training process to relieve the heterogeneity of the entity neighborhood. According to any entity e in the supplemented heterogeneous medical knowledge graphiCalculating the embedded representation of the entity and its neighbor entity to obtain the self-attention score, and aggregating the neighbor feature vectors of the entity by using the self-attention score to obtain the entity eiA representation of the ith layer in an entity interaction perspective;
step 2.2, constructing a relationship interaction visual angle network;
most current research is aimed at addressing entity neighborhood heterogeneity, however, due to the independence of relationships between medical knowledge-maps, this leads to relational heterogeneity between knowledge-maps. For example, the relationships present in the knowledge-graph G may not be present in the knowledge-graph G', which is also an important cause of medical knowledge-graph structural heterogeneity. The range of entity interaction perspectives is limited to a single knowledge graph, and the relational heterogeneity between medical knowledge graphs is ignored, so the invention proposes to use relational interaction to solve the problem.
The relationship feature matrixes of the supplemented heterogeneous medical knowledge maps G and G' interact to obtain a relationship similarity matrix, and then the maximum pooling operation is carried out to obtain a relationship matching vector; finally, aggregating information from neighbors by using cross-map matching scores obtained from the relational matching vectors; obtaining any entity e in the complemented heterogeneous medical knowledge graphiA representation of the ith layer in a relational interactive view;
step 2.3, gate-controlled polymerization;
current research considers that the output layer representation of a multi-layer network aggregates multi-hop neighborhood information of entities, so researchers consider the output layer representation of the network as the final embedding of the entities. However, as the number of network layers increases, the number of neighborhoods aggregated by the central entity increases exponentially, and thus the representation of the output layer causes much noise to the central entity.
In order to obtain a more accurate representation of the entity, the present invention proposes a new gating mechanism to aggregate the embedding of the hidden and output layers, applying it to both views. The gating mechanism can remove redundant noise of each layer while capturing the embedding of the multi-hop neighborhood information enhancement entity of the entity, thereby solving the problem of noise caused by multilayer convolution.
Respectively performing gate control aggregation on the entity interaction view network constructed in the step 2.1 and the relation interaction view network constructed in the step 2.2 to respectively obtain the output h of the entity interaction view networki,1Output h of the sum-relation interaction view networki,2
Re-aggregating the outputs of two view-angle gating aggregates by a gating mechanism to obtain an entity eiIs finally expressed as hi
Step 3, calculating an embedding distance;
based on the final representation of the entity obtained in the step 2.3, calculating the embedding distance between the two heterogeneous medical knowledge map entities by using d (·), wherein the smaller the distance, the more similar the entities are represented; wherein d (-) is the L2 paradigm; and training the supervision model by using the seeds in the training set, so that the distance between aligned entities is gradually reduced, and the distance between non-aligned entities is gradually increased.
Further, step 2.1 above constructs self-attention scores in the entity interaction perspective network
Figure BDA0003597554360000041
The calculation formula of (a) is as follows:
Figure BDA0003597554360000042
Figure BDA0003597554360000043
wherein
Figure BDA0003597554360000044
Is a self-attention coefficient, representing an entity ejTo eiThe degree of importance of;
Figure BDA0003597554360000045
representing the neighbors of the entity including the entity, "| |" represents vector splicing, and LeakyReLU (·) is an activation function; w1,W2And p is a trainable parameter.
In a graph neural network, the representation of a node is learned by recursively aggregating the feature vectors of its neighbors. Different aggregation strategies result in different graph neural network variants. Graph attention networks are a very popular variant of graph neural networks.
Further, the step 2.1 of building the entity interaction perspective network utilizes the self-attention score represented by the formula (1)
Figure BDA0003597554360000046
Aggregate entity eiTo obtain entity eiThe expression of the ith layer in the entity interaction view angle is calculated according to the following formula:
Figure BDA0003597554360000048
wherein the content of the first and second substances,
Figure BDA0003597554360000047
is a weight parameter of the l-th layer in the view network, and σ () is an activation function, selected as ReLU ().
Further, in the step 2.2 of constructing the relationship interaction view network, a calculation formula of the relationship matching vector is as follows:
M=fM(fS(R,R′)) (4)
wherein f isS(. The) represents a relational similarity calculation function defined as fS(R,R′)=RTR ', R and R' respectively represent the relationship characteristic matrixes of the two medical knowledge maps to be aligned; f. ofM(. h) represents a maximum pooling operation function; calculating a cross-map matching score from the relationship matching vector M
Figure BDA0003597554360000051
The formula is as follows:
Figure BDA0003597554360000052
Figure BDA0003597554360000053
wherein M [. cndot ] represents a relational matching degree indexing operation; t represents a triple set of the knowledge-graph;
using cross-graph matching scores
Figure BDA0003597554360000054
Computing entity eiThe expression of the ith layer in the relational interactive view is calculated by the formula:
Figure BDA0003597554360000055
wherein
Figure BDA0003597554360000056
Is a weight parameter of the l-th layer in the view network, and σ () is an activation function, selected as ReLU ().
Further, in the step 2.2 of constructing the relationship interaction view network, the initialized representation of the relationship r is obtained by the embedded representation of the head and tail entities of all triples with r as the relationship, and the calculation formula is as follows:
Figure BDA0003597554360000057
wherein T isrRepresenting a set of triples in relation to r, ehAnd etRespectively representing the corresponding head-tail entity embedding.
Further, in the above step 2.3, the gate-controlled aggregation formula of the epsilon layer network is
Figure BDA0003597554360000058
Where ρ isξ(α,β)=gξ·α+(1-gξ)·β,gξFor trainable parameters used to control each layer output of the network; τ is a view category, namely a relationship interaction view or an entity interaction view;
respectively carrying out gate control aggregation on the entity interaction visual angle network constructed in the step 2.1 and the relation interaction visual angle network constructed in the step 2.2 by using a formula (9) to respectively obtain output h of the entity interaction visual angle networki,1Output h of the sum-relation interaction view networki,2
Entity eiThe final representation of (2) is obtained by re-aggregating the outputs of two view angle gating aggregates through a gating mechanism, and the calculation formula is as follows:
hi=g1·hi,1+(1-g1)·hi,2 (10)
wherein g is1Are trainable parameters for controlling the aggregation of two views.
The invention has the beneficial effects that firstly, the built training set is utilized to train the proposed dual-view neural network model based on the gating mechanism, the trained model is used for excavating the unknown alignment entity pair in the medical knowledge graph, the fusion of heterogeneous medical knowledge graphs can be realized, the accuracy of Hits @5 is up to 85.4 percent on the basis of considering both the accuracy and the labor cost, the existing medical resources are effectively integrated, and the development of intelligent medical treatment is promoted.
Drawings
FIG. 1 medical knowledge-map alignment framework diagram.
Fig. 2 is a dual view neural network model based on a gating mechanism.
Detailed Description
The technical scheme of the invention is described in detail with reference to the accompanying drawings.
For example, fig. 1 is a medical knowledge graph alignment framework diagram, and table 1 is detailed data statistics of heterogeneous medical knowledge graphs.
1. Corpus pre-processing
The invention manually marks the heterogeneous medical knowledge maps MeKG _1 and MeKG _2 constructed by the electronic medical records, and takes the manually marked seed pairs as a training set and a test set, wherein the seed pair number of the training set and the seed pair number of the test set are 227 and 136 respectively.
2. Complementing knowledge-maps using rules
First, the expert deduces rules from each medical knowledge graph, and then transforms the rules among the medical knowledge graphs according to the assumption of knowledge invariance to form a set of unified rules. Thereafter, rule matching is performed in each medical knowledge-graph. And finally, in order to ensure the quality of the triples obtained by rule inference, the experts are requested to screen the inferred triples.
(1) Inference and transfer of rules
The expert designs the rules existing in each medical knowledge graph, and combines all the rules to obtain a rule set K. For example,
Figure BDA0003597554360000061
(2) regularly fall to ground
Applying the rule set K obtained in the step (1) to each medical knowledge graph, giving a rule K belonging to K, searching all premise triples meeting the rule, deducing a conclusion triplet according to the rule, and adding the conclusion triplet to the original knowledge graph if the conclusion triplet does not exist in the original knowledge graph, so that the purpose of knowledge graph completion is achieved. For example, according to the above rules:
Figure BDA0003597554360000062
Figure BDA0003597554360000063
suppose entity e in fig. 1 is "imbalance syndrome", entity g is "hemodialysis", and entity k is "coma". Then the relationship r5 is "disease-indicating symptom" and the relationship r6 is "treatment-causing disease", since "hemodialysis" treatment may cause the complication "imbalance syndrome", and by the relationship of e to k, it can be concluded that the relationship r7 of g to k is "treatment-causing symptom".
Figure BDA0003597554360000064
Figure BDA0003597554360000065
Assume that entity e in FIG. 1 is "nephrotic syndrome", entity g is "prednisone", and entity k is "proteinuria". Then the relation r5 is "disease-indicating symptom", the relation r6 is "treatment-improving disease", since "prednisone" can treat "nephrotic syndrome", and the relation r7 of g and k can be inferred to be "symptomatic treatment" by the relation of e and k.
(2) Expert identification
And (3) screening the triples deduced in the step (2) by the expert to ensure the quality of the triples, wherein the number of the triples deduced by the expert screening the two medical knowledge maps is 11668 and 11820 respectively, and the detailed data of the completed knowledge maps are shown in table 1.
Table 1: heterogeneous knowledge map data statistical table
Figure BDA0003597554360000071
3. Model training
(1) Obtaining entity interaction perspective embedding
For any entity e in GiThe view angle utilizes the attention score calculated from equation (1)
Figure BDA0003597554360000072
Iteratively aggregating neighbor information for a central entity eiThe entity interaction view angle of (1) is embedded and expressed, and the output aggregation of each layer of the network is controlled by a gating mechanism, the number of the network layers is 2 in the embodiment, and an entity eiThe formula for the embedded representation at the view of class τ is as follows:
Figure BDA0003597554360000073
wherein g is0Is that trainable parameters are used to control the aggregation of the network hidden layer and output layer representations, in case the number of network layers is set to 2,
Figure BDA0003597554360000074
is a hidden layer embedding of the network,
Figure BDA0003597554360000075
is the output layer embedding of the network. Finally, entity e is obtainediEmbedding h in a physical interaction perspectivei,1
(2) Obtaining relational interactive perspective embedding
Similar to (1), e for any entity in GiThe perspective utilizes a cross-map matching score calculated from equation (5)
Figure BDA0003597554360000076
Iteratively aggregating neighbor information for a central entity eiThe relationship of (2) is embedded and expressed in a visual angle, a gating mechanism is used for controlling output aggregation of each layer of the network, and an entity e is finally obtained according to a formula (11)iEmbedding h at relational interaction viewsi,2
(3) Gated polymerization
Aggregating the embedding h of the entity interaction visual angle obtained from the steps (1) and (2) by using the formula (10)i,1Embedding of views with relational interaction hi,2To obtain an entity eiIs finally embedded to represent hi
Similarly, all entities in G' are also processed in the same manner as in steps (1), (2), and (3).
(4) Calculating an embedding distance
And (4) calculating the embedding distance between the entities by using d (-) according to the entity embedding representation obtained in the step (3), wherein the smaller the distance is, the more similar the entities are represented. Wherein d (-) is defined as the L2 paradigm. And training the supervision model by using the seeds in the training set, so that the distance between aligned entities is gradually reduced, and the distance between non-aligned entities is gradually increased.
4. Model quality assessment
In this example, Hits @ k and MRR were used as evaluation indexes of the model. Where Hits @1 represents the percentage of correctly aligned entities among the first k candidate entities, k is set to 1, 5, 10, and 50, respectively, and MRR represents the average of the inverse of the rank of the correctly aligned entities. Higher values for Hits @ k and MRR indicate better model performance. The results of the model on the test set are shown in table 2.
Table 2: table of model DvGNet results
Hits@1 Hits@5 Hits@10 Hits@50 MRR
DvGNet 75.9% 85.4% 86.9% 95.6% 80.6%
5. Mining unknown aligned entity pairs using DvGNet
Because only a small number of seed pairs are manually marked as a training set and a test set, a large number of entity pairs to be aligned exist in the knowledge graph, and the invention utilizes DvGNet to mine the remaining entity pairs to be aligned of medical knowledge graphs MeKG _1 and MeKG _ 2. As can be seen from the results in Table 2, the accuracy of Hits @5 was as high as 85.4%. On the basis of considering both accuracy and labor cost, the DvGNet recommends the first 5 candidate entities with the highest similarity for each entity to be aligned, and the expert labels the candidate entities to obtain a final alignment result. As shown in table 3, the aligned entity of the entity "acute lymphoblastic leukemia" is "ALL", the aligned entity of the entity "alzheimer's disease" is "senile dementia", and the aligned entity of the entity "Bartter syndrome" is "barter syndrome". The final number of pairs of entities to be aligned identified by the model is 546.
Table 3: entity alignment candidate example table
Entity to be aligned Top1 Top2 Top3 Top4 Top5
Acute lymphocytic leukemia ALL Leukemia (leukemia) Acute leukemia Leukemia of central nervous system CNSL
Alzheimer's disease Senile dementia Memory disorder Dementia and dementia ADS Cognitive dysfunction
Bartter syndrome Hypokalemic alkalosis Bart syndrome Hypokalemia Gitleman integration Hyperplasia of glomerular side organ

Claims (6)

1. An entity alignment method based on a knowledge graph of a Chinese electronic medical record is characterized in that entity sets of two heterogeneous medical knowledge graphs are respectively obtained, seed pairs of entities are labeled, and the successfully labeled entity seed pairs are used as a training set and a test set; also comprises the following steps:
step 1, complementing a knowledge graph by using rules;
according to preset rules in each heterogeneous medical knowledge graph, combining all the rules to obtain a rule set K; applying a rule set K to each heterogeneous medical knowledge graph, giving a rule K belonging to K, searching all prerequisite triples meeting the rule, deducing a conclusion heterogeneous triple according to the rule, and adding the conclusion heterogeneous triple to the original knowledge graph if the conclusion heterogeneous triple does not exist in the original knowledge graph to fulfill the aim of knowledge graph completion; obtaining two supplemented heterogeneous medical knowledge maps G ═ E, R, T and G ' ═ E ', R ', T ', wherein E, E ' respectively represents a first entity set and a second entity set, R, R ' respectively represents a first relation set and a second relation set, T, T ' respectively represents a first triple set and a second triple set, E belongs to E, R belongs to R, and T belongs to T and respectively represents any entity, relation and triple in G;
step 2, constructing a dual-view neural network model based on a gating mechanism;
step 2.1, constructing an entity interactive visual angle network;
according to any entity e in the supplemented heterogeneous medical knowledge graphiCalculating the embedded representation of the entity and its neighbor entity to obtain the self-attention score, and aggregating the neighbor feature vectors of the entity by using the self-attention score to obtain the entity eiA representation of the ith layer in an entity interaction perspective;
step 2.2, constructing a relationship interaction visual angle network;
the relationship feature matrixes of the supplemented heterogeneous medical knowledge maps G and G' interact to obtain a relationship similarity matrix, and then the maximum pooling operation is carried out to obtain a relationship matching vector; finally, aggregating information from the neighbors by using the cross-map matching score obtained from the relation matching vector; obtaining any entity e in the complemented heterogeneous medical knowledge graphiA representation of the ith layer in a relational interactive view;
step 2.3, gated polymerization;
respectively performing gate control aggregation on the entity interaction view network constructed in the step 2.1 and the relation interaction view network constructed in the step 2.2 to respectively obtain the output h of the entity interaction view networki,1Output h of the sum-relation interaction view networki,2
Re-aggregating the outputs of two view-angle gating aggregates by a gating mechanism to obtain an entity eiIs finally expressed as hi
Step 3, calculating an embedding distance;
based on the final representation of the entity obtained in the step 2.3, calculating the embedding distance between the two heterogeneous medical knowledge map entities by using d (·), wherein the smaller the distance, the more similar the entities are represented; wherein d (-) is the L2 paradigm; and training the supervision model by using the seeds in the training set, so that the distance between aligned entities is gradually reduced, and the distance between non-aligned entities is gradually increased.
2. According to claimThe entity alignment method based on the Chinese electronic medical record knowledge graph is characterized in that the step 2.1 is used for constructing the self-attention score in the entity interaction visual angle network
Figure FDA0003597554350000021
The calculation formula of (a) is as follows:
Figure FDA0003597554350000022
Figure FDA0003597554350000023
wherein
Figure FDA0003597554350000024
Is a self-attention coefficient, representing an entity ejTo eiThe degree of importance of;
Figure FDA0003597554350000025
representing the neighbors of the entity including the entity, "| |" represents vector splicing, and LeakyReLU (·) is an activation function; w1,W2And p is a trainable parameter.
3. The method of claim 2, wherein the step 2.1 of constructing the entity interaction view network utilizes the self-attention score of formula (1)
Figure FDA0003597554350000026
Aggregate entity eiTo obtain entity eiThe expression of the ith layer in the entity interaction view angle is calculated according to the following formula:
Figure FDA0003597554350000027
wherein the content of the first and second substances,
Figure FDA0003597554350000028
is the weighting parameter of the l-th layer in the view network, and σ () is the activation function, chosen as ReLU ().
4. The entity alignment method based on the knowledge graph of the Chinese electronic medical record as claimed in claim 1, wherein in the step 2.2 of constructing the relationship interaction view network, the calculation formula of the relationship matching vector is as follows:
M=fM(fS(R,R′)) (4)
wherein f isS(. The) represents a relational similarity calculation function defined as fS(R,R′)=RTR ', R and R' respectively represent the relationship characteristic matrixes of the two medical knowledge maps to be aligned; f. ofM(. h) represents a maximum pooling operation function; calculating a cross-map matching score from the relationship matching vector M
Figure FDA0003597554350000029
The formula is as follows:
Figure FDA00035975543500000210
Figure FDA00035975543500000211
wherein M [. cndot ] represents a relational matching degree indexing operation; t represents a triple set of the knowledge-graph;
using cross-graph matching scores
Figure FDA00035975543500000212
Computing entity eiThe expression of the ith layer in the relational interactive view is calculated by the formula:
Figure FDA00035975543500000213
wherein
Figure FDA00035975543500000214
Is a weight parameter of the l-th layer in the view network, and σ () is an activation function, selected as ReLU ().
5. The entity alignment method based on the knowledge-graph of Chinese electronic medical records of claim 1, wherein in the step 2.2 of constructing the relationship interaction view network, the initialized representation of the relationship r is obtained by the embedded representation of the head and tail entities of all the triples with the relationship r as a relation, and the calculation formula is as follows:
Figure FDA00035975543500000215
wherein T isrRepresenting a set of triples in relation to r, ehAnd etRespectively representing the corresponding head-tail entity embedding.
6. The entity alignment method based on the knowledge-graph of Chinese electronic medical record of claim 1, wherein in the step 2.3 of gate-controlled aggregation, the gate-controlled aggregation formula of an epsilon-layer network is as follows:
Figure FDA0003597554350000031
where ρ isξ(α,β)=gξ·α+(1-gξ)·β,gξFor trainable parameters used to control each layer output of the network; τ is a view category, namely a relationship interaction view or an entity interaction view;
respectively carrying out gated aggregation on the entity interaction view network constructed in the step 2.1 and the relation interaction view network constructed in the step 2.2 by using a formula (9),respectively obtaining the output h of the entity interactive visual angle networki,1Output h of sum relation interactive view networki,2
Entity eiThe final representation of (2) is obtained by re-aggregating the outputs of two view angle gating aggregates through a gating mechanism, and the calculation formula is as follows:
hi=g1·hi,1+(1-g1)·hi,2 (10)
wherein g is1Are trainable parameters for controlling the aggregation of two views.
CN202210413638.2A 2022-04-15 2022-04-15 Entity alignment method based on Chinese electronic medical record knowledge graph Pending CN114722216A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210413638.2A CN114722216A (en) 2022-04-15 2022-04-15 Entity alignment method based on Chinese electronic medical record knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210413638.2A CN114722216A (en) 2022-04-15 2022-04-15 Entity alignment method based on Chinese electronic medical record knowledge graph

Publications (1)

Publication Number Publication Date
CN114722216A true CN114722216A (en) 2022-07-08

Family

ID=82244392

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210413638.2A Pending CN114722216A (en) 2022-04-15 2022-04-15 Entity alignment method based on Chinese electronic medical record knowledge graph

Country Status (1)

Country Link
CN (1) CN114722216A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116502129A (en) * 2023-06-21 2023-07-28 之江实验室 Unbalanced clinical data classification system driven by knowledge and data in cooperation
CN116610820A (en) * 2023-07-21 2023-08-18 智慧眼科技股份有限公司 Knowledge graph entity alignment method, device, equipment and storage medium
CN117391092A (en) * 2023-12-12 2024-01-12 中南大学 Electronic medical record multi-mode medical semantic alignment method based on contrast learning

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116502129A (en) * 2023-06-21 2023-07-28 之江实验室 Unbalanced clinical data classification system driven by knowledge and data in cooperation
CN116502129B (en) * 2023-06-21 2023-09-22 之江实验室 Unbalanced clinical data classification system driven by knowledge and data in cooperation
CN116610820A (en) * 2023-07-21 2023-08-18 智慧眼科技股份有限公司 Knowledge graph entity alignment method, device, equipment and storage medium
CN116610820B (en) * 2023-07-21 2023-10-20 智慧眼科技股份有限公司 Knowledge graph entity alignment method, device, equipment and storage medium
CN117391092A (en) * 2023-12-12 2024-01-12 中南大学 Electronic medical record multi-mode medical semantic alignment method based on contrast learning
CN117391092B (en) * 2023-12-12 2024-03-08 中南大学 Electronic medical record multi-mode medical semantic alignment method based on contrast learning

Similar Documents

Publication Publication Date Title
CN114722216A (en) Entity alignment method based on Chinese electronic medical record knowledge graph
Ding et al. Real-time anomaly detection based on long short-Term memory and Gaussian Mixture Model
CN111737551B (en) Dark network cable detection method based on special-pattern attention neural network
CN112417219A (en) Hyper-graph convolution-based hyper-edge link prediction method
CN112906770A (en) Cross-modal fusion-based deep clustering method and system
CN110619084B (en) Method for recommending books according to borrowing behaviors of library readers
CN105550749A (en) Method for constructing convolution neural network in novel network topological structure
CN112464004A (en) Multi-view depth generation image clustering method
CN112381179A (en) Heterogeneous graph classification method based on double-layer attention mechanism
Kumar et al. Advanced prediction of performance of a student in an university using machine learning techniques
CN105844334B (en) A kind of temperature interpolation method based on radial base neural net
CN113220897A (en) Knowledge graph embedding model based on entity-relation association graph
CN114625881A (en) Economic field knowledge graph completion algorithm based on graph attention machine mechanism
Wang et al. Hyperspectral image classification based on domain adversarial broad adaptation network
CN115527627A (en) Drug relocation method and system based on hypergraph convolutional neural network
Li et al. Class balanced adaptive pseudo labeling for federated semi-supervised learning
CN112905894B (en) Collaborative filtering recommendation method based on enhanced graph learning
Li et al. Ensemble of the deep convolutional network for multiclass of plant disease classification using leaf images
CN115840853A (en) Course recommendation system based on knowledge graph and attention network
Hu et al. Tree species identification based on the fusion of multiple deep learning models transfer learning
Chen et al. Global attention-based graph neural networks for node classification
Shen et al. Domain-adaptive graph attention-supervised network for cross-network edge classification
Zhang et al. Efficient multiview representation learning with correntropy and anchor graph
WO2021169088A1 (en) Nearest-neighbor multi-granularity profit method for synergetic reduction of knowledge of massive electronic health records
CN113066537A (en) Compound classification method based on graph neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination