CN116821375A - Cross-institution medical knowledge graph representation learning method and system - Google Patents
Cross-institution medical knowledge graph representation learning method and system Download PDFInfo
- Publication number
- CN116821375A CN116821375A CN202311092562.9A CN202311092562A CN116821375A CN 116821375 A CN116821375 A CN 116821375A CN 202311092562 A CN202311092562 A CN 202311092562A CN 116821375 A CN116821375 A CN 116821375A
- Authority
- CN
- China
- Prior art keywords
- medical
- knowledge graph
- institution
- entity
- medical knowledge
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 238000005070 sampling Methods 0.000 claims description 22
- 230000006870 function Effects 0.000 claims description 19
- 239000011159 matrix material Substances 0.000 claims description 18
- 238000010276 construction Methods 0.000 claims description 17
- 230000008569 process Effects 0.000 claims description 11
- 230000004927 fusion Effects 0.000 claims description 10
- 230000002776 aggregation Effects 0.000 claims description 9
- 238000004220 aggregation Methods 0.000 claims description 9
- 230000007246 mechanism Effects 0.000 claims description 9
- 238000003745 diagnosis Methods 0.000 claims description 5
- 229940079593 drug Drugs 0.000 claims description 5
- 239000003814 drug Substances 0.000 claims description 5
- 238000012847 principal component analysis method Methods 0.000 claims description 4
- 208000024891 symptom Diseases 0.000 claims description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 230000009467 reduction Effects 0.000 claims description 2
- 238000001228 spectrum Methods 0.000 abstract 1
- 238000004364 calculation method Methods 0.000 description 3
- 238000007689 inspection Methods 0.000 description 3
- 201000010099 disease Diseases 0.000 description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000013604 expression vector Substances 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000006116 polymerization reaction Methods 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The invention discloses a cross-mechanism medical knowledge graph representation learning method, which comprises the steps of enabling each medical institution to homomorphic encrypt local medical knowledge graphs and then send the homomorphic encrypt local medical knowledge graphs to a third-party server, and enabling the third-party server to complete medical concept matching between the local medical knowledge graphs under the encryption condition to obtain a global medical knowledge graph so as to expand entities and relations in the existing medical knowledge graph. The invention also provides a cross-institution medical knowledge graph representation learning system. The method can solve the problem of medical knowledge spectrum difference caused by specific medical entities among medical institutions in the prior art, thereby obtaining medical entity embedded representation of cross-medical institution semantic consistent expression.
Description
Technical Field
The invention belongs to the technical field of medical information, and particularly relates to a method and a system for learning cross-institution medical knowledge graph representation.
Background
The medical knowledge graph is a thinking pattern and combination of specialized knowledge of doctors, and comprises medical entity content, the number of medical entities and professional logic relations among the medical entities. Logical applications to these medical entities may interpret and determine disease information to make the correct decisions and decisions. How to obtain the expression vector with rich semantic information of entities and relations in the medical knowledge graph by utilizing the knowledge graph embedding expression learning method is very important work for clinical decision support modeling. In order to obtain a computationally applicable representation of a medical knowledge graph that can be better applied to clinical disease diagnosis and medication recommendation, a medical institution may train an embedded representation of a medical entity in the medical knowledge graph using medical entity relationships and clinical data.
The medical institution generates a medical entity specific to the institution according to the local medical scene, and expands a general medical knowledge graph by utilizing the medical entity specific to the institution to form the local medical knowledge graph. The local medical knowledge graph based on local clinical data expansion accords with logic among medical concepts of the medical institution, but the problem that certain medical entities have low occurrence frequency in clinical data in the process of training the local medical knowledge graph by using clinical data in a single center, so that semantic information of final embedded representation is low. In addition, there is a large difference between the local medical knowledge maps and the entity embedded representations each constructed due to the heterogeneity of clinical data and the term heterogeneity between different medical institutions.
Patent document CN113434626a discloses a multi-center medical diagnosis knowledge graph representation learning method and system, the method only uses hierarchical relations among diagnostic entities in the medical knowledge graph to construct a knowledge graph, and does not consider complex relations among diagnostic, symptom, inspection, medication and other entities. In addition, the scheme assumes that different medical institutions use the same medical knowledge graph, and omits the expansion of the general medical knowledge graph by the local medical institutions.
Patent document CN111767411a discloses a knowledge graph representation learning optimization method, a device and a readable storage medium, and a training sample set of the method is composed of local knowledge graphs of various institutions and entities and relations in a part of expansion triples. According to the scheme, the triple information consisting of the entity and the relation is utilized for carrying out the representation learning of the knowledge graph, and the frequency information of the entity and the relation in the real scene is ignored. The real scene data corresponding to the knowledge graph is not involved in the learning process of the representation of the knowledge graph.
Disclosure of Invention
The invention aims to provide a cross-mechanism medical knowledge graph representation learning method and a system, so as to obtain a medical entity embedded representation of cross-mechanism semantic uniform expression, thereby providing better guidance for related modeling tasks such as clinical decision support and the like.
In order to achieve the first object, the present invention provides a technical method comprising the steps of:
and the main server generates a corresponding global medical knowledge graph according to the local medical knowledge graph of each medical institution.
A set of clinical data in a local medical facility is acquired, each piece of clinical data comprising a plurality of medical entities.
And constructing a corresponding medical entity co-occurrence matrix according to the occurrence frequency of every two medical entities in the same piece of clinical data, and performing dimension reduction on the medical entity co-occurrence matrix by using a principal component analysis method so as to obtain an initial embedded representation of each medical entity.
And constructing a triplet set based on the global medical knowledge graph, and forming a data set with the initial embedded representation.
The dataset is input into a pre-built graph-embedded representation model to obtain a medical entity-embedded representation, and a loss function is built in combination with the triplet set.
And obtaining a graph embedding model gradient of the graph embedding representation model by adopting back propagation based on the loss function.
And the map embedding model gradients of all the medical institutions are sent to a main server for aggregation by homomorphic encryption to obtain global model gradients, and the global model gradients are fed back to each medical institution to update the map embedding representation model until the loss function converges to obtain the optimal medical entity embedding representation.
Specifically, the global medical knowledge graph is fused and updated based on the similarity between the medical entity embedding representations of all medical institutions, if the structure of the global medical knowledge graph is updated, graph embedding model training is performed, and the embedding representation of the medical entity is trained by using the new global medical knowledge graph.
Specifically, the local medical knowledge graph is constructed using common medical entities including ICD10, CCS, and HPO, and institution-specific medical entities generated based on local clinical data and clinical business scenarios.
Specifically, the construction process of the global medical knowledge graph is as follows:
the master server generates a set of homomorphic encryption keys for each medical institution that are used by each medical institution to encrypt the local medical knowledge graph to obtain ciphertext triplets and send to the master server.
And the main server matches all the received ciphertext triplets and feeds back the ciphertext triplets to all the medical institutions so as to generate a local global medical knowledge graph of each medical institution.
Specifically, the medical entities comprise diagnosis, symptoms, inspection and medication, and the relationship among the entities such as diagnosis, symptoms, inspection, medication and the like is fully integrated, and the possible relationship among the entity-specific medical entities of different medical institutions is mined, so that the global medical knowledge graph is perfected.
Specifically, when the co-occurrence matrix is constructed, noise needs to be added to each element in the matrix, namely, noise with a normal distribution of which the mean value is 1 and the variance is 0.1 is added to each element in the co-occurrence matrix, so that simple encryption operation is completed.
Specifically, the triplet set comprises a positive sampling triplet and a negative sampling triplet, the positive sampling triplet sequentially comprises a head entity, a tail entity and a relation, and the negative sampling triplet is constructed by randomly replacing the head entity or the tail entity of the positive sampling triplet.
Specifically, the expression of the loss function is as follows:
wherein,,representing the edge loss with the value larger than 0;representing positive functions whenIn the time-course of which the first and second contact surfaces,when (when)In the time-course of which the first and second contact surfaces,;the function of the distance is represented as such,representing the positive sampling triplet,representing a negative-sampling triplet of the sample,representing the head entity and the position of the head entity,represents the tail entity of the plant,the relation is represented by a relationship of,is expressed as (1),Is expressed as (1),Is expressed as (1),Is expressed as (1),Is expressed as (1)。
Specifically, the global model gradient is aggregated by embedding a map of encryption states into the model gradient, and the expression is as follows:
wherein,,represent the firstThe map of the individual medical institution embeds the model gradients,indicating the total number of medical institutions,representing a homomorphic encryption key,representing a homomorphic encryption algorithm,representing the global model gradient.
Specifically, each medical institution iteratively updates the medical entity embedded representation once using a local Adam optimizer according to the received global model gradient.
In order to achieve the second object, the invention also provides a cross-mechanism medical knowledge graph representation learning system, which is executed by the cross-mechanism medical knowledge graph representation learning method and comprises a local medical knowledge graph construction module, a global medical knowledge graph construction module, a federal graph embedded model training module and a medical entity fusion module.
The local medical knowledge graph construction module is used for adding the local clinical data and the mechanism-specific medical entity generated by the clinical business scene into the medical knowledge graph in the local medical institution to generate a corresponding local medical knowledge graph.
The global medical knowledge graph construction module is used for carrying out encryption matching in the main server according to the local medical knowledge graph of each medical institution so as to generate a global medical knowledge graph and sending the global medical knowledge graph to each medical institution.
The federal graph embedded model training module is used for training graph embedded models of all medical institutions under the condition that clinical data cannot be output from local medical institutions through homomorphic encryption.
The medical entity fusion module is used for calculating the similarity among the medical entities in each medical institution so as to fuse and update the global medical knowledge graph.
Compared with the prior art, the invention has the beneficial effects that:
(1) Adding specific medical entities in specific medical scenes of all medical institutions into a local medical knowledge graph to construct a more complete global medical knowledge graph.
(2) Under the condition of not revealing the local medical knowledge graph, the construction of the global medical knowledge graph is realized.
(3) The provided federal graph embedding model fuses information of clinical data in a plurality of medical institutions in the training process, and can provide medical entity embedding representation with stronger semantics.
Drawings
Fig. 1 is a flowchart of a cross-institution medical knowledge graph representation learning method provided in the present embodiment;
fig. 2 is a schematic flow chart of global medical knowledge graph construction provided in the present embodiment;
FIG. 3 is a flowchart of federal graph embedded representation learning provided by the present embodiment;
fig. 4 is a block diagram of a cross-institution medical knowledge graph representation learning system provided in this embodiment.
Detailed Description
As shown in fig. 1, the method for learning a cross-mechanism medical knowledge graph representation provided in this embodiment includes the following steps:
first, an initial medical knowledge graph is constructed by using knowledge sources commonly used in medical fields such as ICD10, CCS, HPO and the like in each medical institution. And then adding the mechanism-specific medical entity generated by the local medical institution according to the local clinical data and the clinical business scene into a medical knowledge graph at the local of the medical institution, adding the relationship between the mechanism-specific medical entity and the original medical entity in the medical knowledge graph, and integrating the mechanism-specific medical entity into the medical knowledge graph to form the local medical knowledge graph.
The global medical knowledge graph construction module utilizes homomorphic encryption technology, and on the premise of ensuring that the local medical knowledge graph of each medical institution is not revealed to the third party server, the third party server matches according to the medical concept ciphertext of the local medical knowledge graph to obtain the global medical knowledge graph, and then the global medical knowledge graph is sent to each medical institution.
As shown in fig. 2, the specific construction process is as follows:
a pair of keys is first created by a computing server and distributed to individual medical institutions, the keys including encryption keys and decryption keys. Each medical institution encrypts the medical entities in the local medical knowledge-graph.
Encryption process: the local medical knowledge graph consists of a plurality of triplets, each triplet consisting of a head entity, a tail entity and a relationship. Wherein the head entity and the tail entity are both medical entities. Each medical institution encrypts the head entity and the tail entity in the triplet by using the encryption key to obtain the ciphertext of the head entity and the ciphertext of the tail entity. The ciphertext of a head entity, the ciphertext of a tail entity and a relationship form a ciphertext triplet. All ciphertext triplets are sent to the matching server by each medical institution, and ciphertext matching is carried out by the matching server.
Ciphertext matching process: after receiving the ciphertext triplets uploaded by each medical institution, the matching server compares whether the ciphertext of the head entity and the tail entity in each ciphertext triplet is identical to the ciphertext of the head entity and the tail entity in other ciphertext triplets, and the two entities with identical ciphertext are regarded as the same entity, and the identical triplets are removed, so that the unique triplet information of each medical institution is added in the medical knowledge graph triplets based on knowledge sources, and a global medical knowledge graph is formed.
After ciphertext matching is completed, the matching server sends the global medical knowledge graph to each medical institution. Each medical institution uses the decryption key to decrypt the ciphertext triplet of the global medical knowledge graph, and as each medical institution uses the same encryption key and decryption key, the plaintext of the non-local medical entity and the relationship with other medical entities can be directly obtained to form the global medical knowledge graph.
As shown in fig. 3, the federal graph embedding provided for this embodiment represents a specific process of learning:
local medical institution builds aIs used for the co-occurrence matrix of (a),is the number of medical entities in the global medical knowledge graph. The first co-occurrence matrixLine 1The column values are the first of the concurrent occurrence of global medical knowledge patterns in the clinical dataPersonal medical entity and the firstNumber of medical entity visit records. Then, a normal distributed noise with a mean of 1 and a variance of 0.1 is added to each element in the co-occurrence matrix.
The principal component analysis method is utilized to reduce the dimension of the co-occurrence matrix into oneIs a matrix of initially embedded representations of (1), whereinThe value of (2) is set according to the number of medical entities, 256, 512, etc. may be taken. Each row in the initial embedded representation matrix represents an initial embedded representation of a medical entity.
The input to the federal graph embedding model is an initial matrix of embedded representations of the medical entities and the output is an embedded representation of all the medical entities. The federal graph embedding model uses the embedded representation of the medical entity as a learnable parameter, and updates the embedded representation of the medical entity directly through back propagation of the model gradient. The medical entity embedded representation is a vector whose dimensions can be set according to the number of medical entities. The greater the number of medical entities, the greater the dimension in which the medical entities are embedded into the representation needs to be set to sufficiently distinguish the semantics of the different medical entities.
The positive sampling triplet set in the global medical knowledge graph is recorded asWherein the positive sampling triples are noted as,Representing the head entity and the position of the head entity,represents the tail entity of the plant,representing the relationship.
The positive sampling triples are negative sampled, i.e. the head entity or the tail entity in the triples are randomly replaced.
The negative sampling triplet set is marked asWherein the negative sampling triples are noted as,Representing the head entity of the negative sampling triplet,representing the tail entity of the negative sampling triplet,representative relationship, record,is expressed as (1),Is expressed as (1),Is expressed as (1),Is expressed as (1),Is expressed as (1)。
During each training process of the federal graph embedding model, each medical institution locally utilizes the medical entity embedding representation, the triplet set in the global medical knowledge graph and the negative sampling triplet set to calculate a loss function:
wherein (1)>Representing the edge loss with the value larger than 0; />Representing a positive function, when->When (I)>When (when)When (I)>;/>Representing the distance function, either the L1 or L2 norms may be used.
After the loss function is calculated, map-embedded model gradients are obtained locally at the medical facility using back propagation, and the medical entity embedded representation is updated locally at the medical facility.
The map embedded model gradient of each medical institution is sent to a main server for aggregation by homomorphic encryption: a pair of homomorphic encryption keys is first created by a matching server and sent to each medical institution. (suppose there are H medical institutions together) record the firstThe gradient of the graph embedded model of the medical institution is that. Encryption keys among homomorphic encryption keys used locally by various medical institutionsAnd homomorphic encryption algorithmFor a pair ofHomomorphic encryption is carried out to obtain ciphertextAnd sent to the computing server.
After receiving the ciphertext of the graph embedded model gradient of each medical institution, the computing server aggregates the ciphertext by utilizing the addition homomorphism of homomorphism encryption to obtain the global graph embedded model gradientCiphertext of (2): Wherein,,represent the firstThe map of the individual medical institution embeds the model gradients,indicating the total number of medical institutions,representing a homomorphic encryption key,representing a homomorphic encryption algorithm,representing the global model gradient.
Using decryption keys from homomorphic encryption keysAnd homomorphic encryption and decryption algorithmFor a pair ofDecrypting to obtain:The medical institution locally utilizes Adam optimizers according toUpdating the embedded representation of the medical entity, completing a round of iterations.
The present embodiment also provides a system for learning a cross-institution medical knowledge graph representation, which is executed based on the method for learning a cross-institution medical knowledge graph representation provided in the above embodiment, as described in fig. 4, and includes:
the local medical knowledge graph construction module is used for adding the local clinical data and the mechanism-specific medical entity generated by the clinical business scene into the medical knowledge graph in the local medical institution to generate a corresponding local medical knowledge graph.
And the global medical knowledge graph construction module is used for carrying out encryption matching in the main server according to the local medical knowledge graph of each medical institution so as to generate a global medical knowledge graph and sending the global medical knowledge graph to each medical institution.
The federal graph embedded model training module comprises a medical entity embedded initial stage, federal graph embedded model training, model gradient aggregation and embedded representation aggregation.
The medical entity embedding is specifically: clinical data is used at the local medical facility to initialize the medical entity embedding.
The federal graph embedded model training specifically comprises: the federal graph embedded model of each medical institution is trained.
The model gradient polymerization specifically comprises the following steps: and embedding the graph of each medical structure into the model gradient for encryption receiving and aggregation operation.
The embedding means aggregation specifically comprises: since the embedded representation of the initialized medical entity obtained by the clinical data is different for each medical institution, there is also a difference in the embedded representation of the medical entity obtained by the local training of each medical institution. The embedding representation aggregation submodule calculates the mean value of the embedding representation obtained by each medical entity in each medical institution by utilizing the homomorphic encryption principle, and the mean value is used as the embedding of the medical entity output by the federal graph embedding representation learning module.
And each medical institution locally encrypts the embedded representation of the medical entity in a homomorphic manner by utilizing an encryption key in the homomorphic encryption key and a homomorphic encryption algorithm to obtain a ciphertext of the embedded representation and sends the ciphertext to the calculation server. After receiving the ciphertext of the graph embedding model gradient of each medical institution, the computing server utilizes the addition homomorphism of homomorphism encryption to aggregate the ciphertext embedded and represented to obtain the ciphertext embedded and represented to the mean value. The computing server then sends the ciphertext embedded with the representation of the mean to each medical institution. The medical institutions locally use decryption keys in homomorphic encryption keys and homomorphic encryption and decryption algorithms to obtain the average value of the embedded representation of the medical entity on each medical institution as the final embedded representation of the medical entity.
And the medical entity fusion module is used for calculating the similarity among the medical entities in each medical institution so as to fuse and update the global medical knowledge graph.
More specifically, in a global medical knowledge graph, there are cases where institution-specific medical entities from different medical institutions are connected to the same medical entity. The function of the medical entity fusion module is to determine whether the institution-specific medical entities are the same concept using the embedded representation of the medical entities, and if so, to fuse the institution-specific medical entities into the same medical entity.
After obtaining the embedded representations of the medical entities, if there are two or more institution-specific medical entities connected to the same medical entity, a similarity of the embedded representations of the institution-specific medical entities is calculated. The similarity of the embedded representation is calculated using the Euclidean distance of the embedded representation. If the similarity of the embedded representations of the two mechanism-specific medical entities is smaller than a preset threshold, the two mechanism-specific medical entities are considered to be the same concept, the two mechanism-specific medical entities are fused into the same medical entity, and the structure of the global medical knowledge graph is updated after the fusion.
If the structure of the global medical knowledge graph is updated, the federal graph embedded model training module is entered again, and the embedded representation of the medical entity is trained by using the new global medical knowledge graph. And updating the embedded representation of the medical entity and then performing the operation in the medical entity fusion module again. The iteration is performed until the structure of the global medical knowledge-graph is not updated.
The finally output embedded representation of the medical entities can be used for calculating the similarity between the medical entities, combining the similarity with clinical data, and integrating the similarity with a deep learning model to complete a prediction task so as to provide more accurate guidance for the medical scheme formulation of patients.
In order to better illustrate the technical method of the present embodiment, the present embodiment further provides a specific implementation procedure:
5 medical institutions participate in multi-center medical knowledge graph representation learning. Each medical institution constructs a complete medical knowledge graph according to the knowledge source ICD10, then adds the institution-specific medical entity generated by the local medical institution according to the local clinical data and the clinical business scene into the medical knowledge graph, and adds the relationship between the institution-specific medical entity and the original medical entity in the medical knowledge graph to obtain respective local medical knowledge graph.
In the global medical knowledge graph construction stage, a group of keys are firstly generated by a computing server, each medical institution encrypts head entities and tail entities of all triplets in the local medical knowledge graph by using the encryption keys to form ciphertext triplets, and the ciphertext triplets are sent to a matching server. In the matching process, the matching server compares the ciphertexts of the head entity and the tail entity of the cipher text by two, and the two entities with the same cipher text are regarded as the same entity. After the matching is completed, the matching server sends the ciphertext triplets to each medical institution. And each medical institution decrypts the ciphertext triplet by using the decryption key to obtain the plaintext of the triplet, thereby forming a global medical knowledge graph.
During the federal graph embedding model training stage. The medical institutions firstly count co-occurrence information of medical entities in the global medical knowledge graph on the visit data of local clinical data, construct a co-occurrence matrix, and add a normal distributed random noise with the mean value of 1 and the variance of 0.1 to each element. And reducing the dimension of the embedded representation of each medical entity into 256 dimensions by using a principal component analysis method to obtain an initial embedded representation matrix of the medical entity. And randomly replacing a head entity or a tail entity in the triples in the global medical knowledge graph to obtain the negatively sampled triples. In this embodiment, the model iteration number is set to 100, the edge loss parameter is set to 1, and the distance function is set to L2 norm. In each iteration, a loss function is calculated from the triples and the negatively sampled triples, and a model gradient is found. Then, the matching server generates homomorphic encryption key pairs and sends the homomorphic encryption key pairs to each medical institution. And each medical institution encrypts the model gradient homomorphism and then sends the model gradient homomorphism to a calculation server, and the calculation server directly carries out aggregation operation on the ciphertext to obtain the ciphertext of the global gradient and sends the ciphertext to each medical institution. The medical institution decrypts the ciphertext of the global gradient to obtain the global gradient, and updates the embedded representation of the medical entity according to the global gradient. And after the iterative training of the graph embedding model is completed, obtaining the embedding representation of the medical entity. The length of the embedded representation of the medical entity is 128 dimensions. After the federal graph embedded model is trained, the average value of the embedded representation of each medical entity on each medical institution is calculated by using a homomorphic encryption method and is used as a training result of the embedded representation of the medical entity.
After obtaining the embedded representation of the medical entity, euclidean distances of their embedded representations are calculated as similarities for a plurality of institution-specific medical entities connected to the same medical entity. If the similarity of two institution-specific medical entities is smaller than the set threshold (in this embodiment, the similarity threshold is set to 0.01), the two institution-specific medical entities are considered to be the same concept, they are fused into the same medical entity, and the structure of the global medical knowledge graph is updated after the fusion. And after the structure of the global medical knowledge graph is updated, entering a federal graph embedded model training module, and training the embedded representation of the medical entity by using the new global medical knowledge graph. And updating the embedded representation of the medical entity and then performing the operation in the medical entity fusion module again. The iteration is performed until the structure of the global medical knowledge-graph is not updated.
Claims (10)
1. The method for learning the cross-institution medical knowledge graph representation is characterized by comprising the following steps of:
the main server generates a corresponding global medical knowledge graph according to the local medical knowledge graph of each medical institution;
acquiring a set of clinical data in a local medical facility, each piece of clinical data comprising a plurality of medical entities;
constructing a corresponding medical entity co-occurrence matrix according to the occurrence frequency of every two medical entities in the same piece of clinical data, and performing dimension reduction on the medical entity co-occurrence matrix by using a principal component analysis method so as to obtain an initial embedded representation of each medical entity;
constructing a triplet set based on a global medical knowledge graph, and forming a data set with the initial embedded representation;
inputting the data set into a pre-constructed graph embedded representation model to obtain a medical entity embedded representation, and constructing a loss function by combining the triplet set;
obtaining a graph embedding model gradient of a graph embedding representation model by adopting back propagation based on the loss function;
and the map embedding model gradients of all the medical institutions are sent to a main server for aggregation by homomorphic encryption to obtain global model gradients, and the global model gradients are fed back to each medical institution to update the map embedding representation model until the loss function converges to obtain the optimal medical entity embedding representation.
2. The across-facility medical knowledge-graph representation learning method of claim 1, wherein the local medical knowledge-graph is constructed with common medical entities including ICD10, CCS, and HPO and facility-specific medical entities generated based on local clinical data and clinical business scenarios.
3. The across-institution medical knowledge-graph representation learning method of claim 1 or 2, wherein the global medical knowledge-graph construction process is as follows:
the method comprises the steps that a master server generates a group of homomorphic encryption keys for all medical institutions, and all medical institutions encrypt a local medical knowledge graph by utilizing the homomorphic encryption keys to obtain ciphertext triplets and send the ciphertext triplets to the master server;
and the main server matches all the received ciphertext triplets and feeds back the ciphertext triplets to all the medical institutions so as to generate a local global medical knowledge graph of each medical institution.
4. The method of claim 1, wherein the medical entity comprises a diagnosis, a symptom, an examination, and a medication.
5. The method for learning the cross-institution medical knowledge graph representation as claimed in claim 1, wherein noise is added to each element in the matrix when the co-occurrence matrix is constructed.
6. The across-the-institution medical knowledge graph representation learning method of claim 1, wherein the triplet set comprises a positive sampling triplet and a negative sampling triplet, the positive sampling triplet sequentially comprises a head entity, a tail entity and a relationship, and the negative sampling triplet is constructed by randomly replacing the head entity or the tail entity of the positive sampling triplet.
7. The across-institution medical knowledge-graph representation learning method of claim 1, wherein the expression of the loss function is as follows:
wherein (1)>Representing the edge loss with the value larger than 0; />Representing a positive function, when->When (I)>When (when)When (I)>;/>Representing distance function>Representing a positive sampling triplet->Representing a negative sampling triplet->Representing head entity->Represents the tail entity,/->Representing relationship(s)>Is expressed as +.>,/>Is expressed as +.>,/>Is expressed as +.>,/>Is expressed as +.>,/>Is expressed as +.>。
8. The across-organization medical knowledge graph representation learning method of claim 1, wherein the global model gradient is aggregated by using graph-embedded model gradients of encryption states, and the expression is as follows:
wherein (1)>Indicate->Map-embedded model gradients for individual medical institutions, +.>Representing the total number of medical institutions>Representing homomorphic encryption key,/->Representing homomorphic encryption algorithm,/->Representing the global model gradient.
9. The method of claim 1, wherein each medical institution iteratively updates the representation of the medical entity's embedding using a local Adam optimizer based on the received global model gradients.
10. A cross-institution medical knowledge graph representation learning system, characterized in that the cross-institution medical knowledge graph representation learning system is executed by a cross-institution medical knowledge graph representation learning method according to any one of claims 1 to 9, and comprises a local medical knowledge graph construction module, a global medical knowledge graph construction module, a federal graph embedded model training module and a medical entity fusion module;
the local medical knowledge graph construction module is used for adding the local clinical data and the mechanism-specific medical entity generated by the clinical business scene into the medical knowledge graph in the local medical institution to generate a corresponding local medical knowledge graph;
the global medical knowledge graph construction module is used for carrying out encryption matching according to the local medical knowledge graph of each medical institution in the main server so as to generate a global medical knowledge graph and sending the global medical knowledge graph to each medical institution;
the federal graph embedded model training module is used for training graph embedded models of all medical institutions under the condition that clinical data cannot be output from local medical institutions through homomorphic encryption;
the medical entity fusion module is used for calculating the similarity among the medical entities in each medical institution so as to fuse and update the global medical knowledge graph.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311092562.9A CN116821375B (en) | 2023-08-29 | 2023-08-29 | Cross-institution medical knowledge graph representation learning method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311092562.9A CN116821375B (en) | 2023-08-29 | 2023-08-29 | Cross-institution medical knowledge graph representation learning method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116821375A true CN116821375A (en) | 2023-09-29 |
CN116821375B CN116821375B (en) | 2023-12-22 |
Family
ID=88116987
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311092562.9A Active CN116821375B (en) | 2023-08-29 | 2023-08-29 | Cross-institution medical knowledge graph representation learning method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116821375B (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108986871A (en) * | 2018-08-27 | 2018-12-11 | 东北大学 | A kind of construction method of intelligent medical treatment knowledge mapping |
CN110008959A (en) * | 2019-03-26 | 2019-07-12 | 北京博瑞彤芸文化传播股份有限公司 | A kind of medical data processing method and system |
CN111767411A (en) * | 2020-07-01 | 2020-10-13 | 深圳前海微众银行股份有限公司 | Knowledge graph representation learning optimization method and device and readable storage medium |
CN112200321A (en) * | 2020-12-04 | 2021-01-08 | 同盾控股有限公司 | Inference method, system, device and medium based on knowledge federation and graph network |
CN113434626A (en) * | 2021-08-27 | 2021-09-24 | 之江实验室 | Multi-center medical diagnosis knowledge map representation learning method and system |
CN113886598A (en) * | 2021-09-27 | 2022-01-04 | 浙江大学 | Knowledge graph representation method based on federal learning |
CN114639483A (en) * | 2022-03-23 | 2022-06-17 | 浙江大学 | Electronic medical record retrieval method and device based on graph neural network |
CN116386805A (en) * | 2023-04-13 | 2023-07-04 | 新理(深圳)科技有限公司 | Intelligent guided diagnosis report generation method |
-
2023
- 2023-08-29 CN CN202311092562.9A patent/CN116821375B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108986871A (en) * | 2018-08-27 | 2018-12-11 | 东北大学 | A kind of construction method of intelligent medical treatment knowledge mapping |
CN110008959A (en) * | 2019-03-26 | 2019-07-12 | 北京博瑞彤芸文化传播股份有限公司 | A kind of medical data processing method and system |
CN111767411A (en) * | 2020-07-01 | 2020-10-13 | 深圳前海微众银行股份有限公司 | Knowledge graph representation learning optimization method and device and readable storage medium |
CN112200321A (en) * | 2020-12-04 | 2021-01-08 | 同盾控股有限公司 | Inference method, system, device and medium based on knowledge federation and graph network |
CN113434626A (en) * | 2021-08-27 | 2021-09-24 | 之江实验室 | Multi-center medical diagnosis knowledge map representation learning method and system |
WO2023025255A1 (en) * | 2021-08-27 | 2023-03-02 | 之江实验室 | Multi-center medical diagnosis knowledge graph representation learning method and system |
CN113886598A (en) * | 2021-09-27 | 2022-01-04 | 浙江大学 | Knowledge graph representation method based on federal learning |
CN114639483A (en) * | 2022-03-23 | 2022-06-17 | 浙江大学 | Electronic medical record retrieval method and device based on graph neural network |
CN116386805A (en) * | 2023-04-13 | 2023-07-04 | 新理(深圳)科技有限公司 | Intelligent guided diagnosis report generation method |
Non-Patent Citations (2)
Title |
---|
ZIHENG ZHANG: "An Industry Evaluation of Embedding-based Entity Alignment", ARXIV, pages 1 - 11 * |
任益锋: "肺结节/早期肺癌预测模型的知识图谱与可视化分析", 中国胸心血管外科临床杂志, pages 1 - 15 * |
Also Published As
Publication number | Publication date |
---|---|
CN116821375B (en) | 2023-12-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109002861B (en) | Federal modeling method, device and storage medium | |
CN113434626B (en) | Multi-center medical diagnosis knowledge map representation learning method and system | |
Zhang et al. | A secure clinical diagnosis with privacy-preserving multiclass support vector machine in clouds | |
CN111931950A (en) | Method and system for updating model parameters based on federal learning | |
Cai et al. | A many-objective optimization based federal deep generation model for enhancing data processing capability in IoT | |
CN111460478B (en) | Privacy protection method for collaborative deep learning model training | |
Paul et al. | Privacy-preserving collective learning with homomorphic encryption | |
Liang et al. | Efficient and privacy-preserving decision tree classification for health monitoring systems | |
EP3863003B1 (en) | Hidden sigmoid function calculation system, hidden logistic regression calculation system, hidden sigmoid function calculation device, hidden logistic regression calculation device, hidden sigmoid function calculation method, hidden logistic regression calculation method, and program | |
CN112949865A (en) | Sigma protocol-based federal learning contribution degree evaluation method | |
CN115765965A (en) | Medical data safety sharing method based on federal learning and double union block chains | |
CN112364376A (en) | Attribute agent re-encryption medical data sharing method | |
Omer et al. | Privacy-preserving of SVM over vertically partitioned with imputing missing data | |
Randall et al. | Privacy preserving record linkage using homomorphic encryption | |
Obiri et al. | Personal health records sharing scheme based on attribute based signcryption with data integrity verifiable | |
Xiang et al. | BMIF: Privacy-preserving blockchain-based medical image fusion | |
Buholayka et al. | Is ChatGPT Ready to Write Scientific Case Reports Independently? A Comparative Evaluation Between Human and Artificial Intelligence | |
Tang et al. | IHVFL: a privacy-enhanced intention-hiding vertical federated learning framework for medical data | |
CN111914281B (en) | Bayesian model training method and device based on blockchain and homomorphic encryption | |
CN116821375B (en) | Cross-institution medical knowledge graph representation learning method and system | |
Gao et al. | Quantum identity-based encryption from the learning with errors problem | |
CN117592555A (en) | Federal learning method and system for multi-source heterogeneous medical data | |
Dilshad et al. | Yosida approximation iterative methods for split monotone variational inclusion problems | |
CN116628088A (en) | Enterprise operation data management method, enterprise operation data management device, computer equipment and storage medium | |
Jiang et al. | Efficient secure and verifiable KNN set similarity search over outsourced clouds |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |