CN113420551A - Biomedical entity relation extraction method for modeling entity similarity - Google Patents

Biomedical entity relation extraction method for modeling entity similarity Download PDF

Info

Publication number
CN113420551A
CN113420551A CN202110788351.3A CN202110788351A CN113420551A CN 113420551 A CN113420551 A CN 113420551A CN 202110788351 A CN202110788351 A CN 202110788351A CN 113420551 A CN113420551 A CN 113420551A
Authority
CN
China
Prior art keywords
entity
node
biomedical
type
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110788351.3A
Other languages
Chinese (zh)
Inventor
赵卫中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central China Normal University
Original Assignee
Central China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central China Normal University filed Critical Central China Normal University
Priority to CN202110788351.3A priority Critical patent/CN113420551A/en
Publication of CN113420551A publication Critical patent/CN113420551A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a biomedical entity relation extraction method for modeling entity similarity, which comprises the following steps: s1, obtaining an initial representation of the text by the input module; s2, assuming that the given biomedical text which has been subjected to entity labeling is a sequence composed of sentences, wherein each sentence is expressed as a sequence composed of vector representations of each word (S)1,...,Si,...,SL) (ii) a S3, considering the abundant semantic information contained in each word, the vector representation of each word is composed of word embedding, position embedding and entity type embedding; s4, establishing similarity information between the biomedical entities in each document through a relation heteromorphic model on the basis of S3The invention has the advantages that the method can learn richer entity representation: an end-to-end neural network is adopted by the relational heterogeneous graph module, meaningful features can be automatically learned from large-scale biomedical texts, and time-consuming, labor-consuming and extremely complex feature engineering in the traditional method is avoided.

Description

Biomedical entity relation extraction method for modeling entity similarity
Technical Field
The invention relates to the technical field of biomedicine, in particular to a biomedical entity relation extraction method for modeling entity similarity.
Background
The biomedical text comprises a large number of and a large variety of information entities, the entities contain complex relationships, and four types of relationships are divided according to the types of the entities: Protein-Protein Interactions (PPIS), Genotype-Phenotype Associations (GPA), Drug-Drug Interactions (DDI), and Chemical-Induced diseases (CID). There have been many studies on the extraction of relationships between these four types of biomedical entities, where the conventional rule-based method uses a rule template (usually in the form of a regular expression) generated by a domain expert to extract the matching relationships from the biomedical text; identifying relationships between biomedical entities by means of co-occurrence probability based on a statistical learning method; the NLP-based method decomposes the text into grammatical structures that allow easy extraction of relationships between entities by parsing the sentences.
However, research shows that the existing biomedical entity relationship extraction methods do not model similarity information between entities in biomedical texts, but the similarity information plays a key role in relationship extraction. Taking chemical and disease-inducing relationships (CIDs) as an example, if three CID relationship entity pairs have been successfully predicted: the method is characterized in that the method refers to the above information and greatly helps to predict whether the entity pair < isoniazid, dark spot > is CID relationship or not, because the entity pair < ethambutol, bilateral optic neuropathy > and < isoniazid, bilateral optic neuropathy > can be used for obtaining the information that the chemical entity 'ethambutol' and 'isoniazid' have certain similarity through CID relationship, and the entity pair < ethambutol, dark spot > is found to have CID relationship, so that the entity pair < isoniazid, dark spot > is also judged to have CID relationship. Therefore, it is necessary to fully model the similarity information between entities in the biomedical text, and obtain a better entity representation for efficient relationship extraction.
Disclosure of Invention
The invention aims to provide a biomedical entity relation extraction method for modeling entity similarity, which aims to solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme: a biomedical entity relation extraction method for modeling entity similarity comprises the following steps:
s1, obtaining an initial representation of the text by the input module;
s2, assuming that the given biomedical text which has been subjected to entity labeling is a sequence composed of sentences, wherein each sentence is expressed as a sequence composed of vector representations of each word (S)1,...,Si,...,SL);
S3, considering the abundant semantic information contained in each word, the vector representation of each word is composed of word embedding, position embedding and entity type embedding;
s4, modeling similarity information among the biomedical entities in each document through a relational heteromorphic model module on the basis of S3, and learning richer entity representation;
and S5, classifying the candidate relation nodes and storing the entity relation in a structured form.
As a preferred embodiment of the present invention: the word embedding adopts a pre-training model BioBERT as a word embedding model;
the position embedding adopts sine and cosine functions with different frequencies to model different position information in sentences;
the entity type in the entity type embedding comprises an O type, namely the entity type is not an entity, a vector is randomly initialized to represent information contained in the entity type, the entity type embedding vector is used as a parameter of a model, and the optimization is carried out through a training process, namely the Fine-tuning is obtained.
As a preferred embodiment of the present invention: the relationship heterogeneous graph module specifically comprises heterogeneous graph construction and a double-layer attention mechanism.
As a preferred embodiment of the present invention: the heteromorphic image construction specifically comprises the following steps:
a1: assuming that the heterogeneous graph is represented as HG ═ (HV, HE), where HV represents the set of nodes and HE is the set of edges;
a2: given a processed entityThe labeled biomedical text D, the node set of the heterogeneous graph constructed based on the text D is composed of a plurality of subsets: node set E of various biomedical entities1,E2,...,ENWhere N denotes the number of categories of biomedical entities in a given text, and a candidate relationship node R, consisting of pairs of biomedical entities, formalized as HV ═ (E)1,E2,...,EN)UR;
A3: initializing and expressing the biomedical entity nodes as vector expression obtained by an input module; in addition, for the candidate relation nodes, firstly, the vector representation of the corresponding biomedical entity is spliced, then, after a full connection layer is used for activation, the activated vector is finally used as the initialization representation of the candidate relation nodes in the heteromorphic graph.
As a preferred embodiment of the present invention: the construction of the edges between the nodes of the candidate relationship in the heteromorphic graph comprises the following two steps:
b1: each candidate relationship node is formed by pairing a chemical entity and a disease entity, so that an edge is constructed for each candidate relationship node and the corresponding biomedical entity;
b2: in order to take the similarity between the entities into consideration, similarity calculation is carried out between the entities, and if the similarity between two entities is large enough, an edge is constructed between the two entity nodes.
As a preferred embodiment of the present invention: the double-layer attention mechanism specifically comprises the following steps:
c1: giving a certain candidate relation node, and firstly collecting 1-hop and 2-hop neighbor nodes of the node;
c2: then all neighbor nodes are divided into groups according to the node types: and updating the vector representation of the given candidate relation node by using two levels of attention mechanisms of various biomedical entity neighbor nodes and candidate relation neighbor nodes.
As a preferred embodiment of the present invention: the double-layer attention mechanism consists of two parts: the node-level attention mechanism aggregates information of neighbors of the same type and the type-level attention mechanism aggregates information of neighbor nodes of different types.
As a preferred embodiment of the present invention: the node-level attention mechanism is used for fully modeling the importance of different neighbor nodes having the same type, and specifically comprises the following steps:
d1: given a certain candidate relation node, all the v-class biomedical entity neighbor node sets are assumed to be represented as
Figure BDA0003160022270000041
Wherein any class v biomedical entity neighbors
Figure BDA0003160022270000042
For all the V-class biomedical entity neighbors, selective information aggregation is carried out through a node level attention mechanism to obtain a neighbor vector representation representing the V-class biomedical entity type
Figure BDA0003160022270000043
Figure BDA0003160022270000044
D2: neighbor vector representations for other types of entities and candidate relationship types through D1 node-level attention
Figure BDA0003160022270000045
As a preferred embodiment of the present invention: the type level attention specifically comprises the following steps:
e1: the type-level attention is based on the node-level attention, the type-level attention learns the weights of different types of neighbors of a given candidate relationship node, and through a process similar to the node-level attention, the type-level attention is formally expressed as:
Figure BDA0003160022270000046
wherein
Figure BDA0003160022270000047
Representing multiple types of neighbors;
e2: vector representations to be obtained taking into account different neighbor nodes and different types of neighbors in the same type
Figure BDA0003160022270000048
E3: updating the original candidate relation node by using a full-connection network, and carrying out relation reasoning:
Figure BDA0003160022270000049
where σ denotes a Sigmoid activation function, the output value of which is between 0 and 1, so that
Figure BDA00031600222700000410
And finally, storing the extracted entity relationship in a structured form.
Compared with the prior art, the invention has the beneficial effects that: the invention aggregates the information of the neighbors of the same type through a node level attention mechanism, and fully models the importance of different neighbor nodes with the same type; the type level attention mechanism aggregates information of different types of neighbor nodes, on the basis of node level attention, the type level attention mechanism learns the weights of different types of neighbors of given candidate relationship nodes, considers the importance of different types, can fully model similarity information between entities in a biomedical text through a relationship heterogeneous module based on a double-layer attention mechanism to obtain more optimal candidate relationship node representation so as to carry out efficient relationship extraction, adopts an end-to-end neural network through a relationship heterogeneous graph module, can automatically learn meaningful characteristics from a large-scale biomedical text, avoids time-consuming and labor-consuming characteristic engineering with extreme complexity in the traditional method, and can fully model the similarity information between the entities in the biomedical text through modeling the similarity information between the entities in the biomedical text, and obtaining a better candidate relation node representation so as to carry out efficient relation extraction.
Drawings
Fig. 1 is a general roadmap for the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, the present invention provides a technical solution: a biomedical entity relation extraction method for modeling entity similarity comprises the following steps:
s1, obtaining an initial representation of the text by the input module;
s2, assuming that the given biomedical text which has been subjected to entity labeling is a sequence composed of sentences, wherein each sentence is expressed as a sequence composed of vector representations of each word (S)1,...,Si,...,SL);
S3, considering the abundant semantic information contained in each word, the vector representation of each word is composed of word embedding, position embedding and entity type embedding;
s4, modeling similarity information among the biomedical entities in each document through a relational heteromorphic model module on the basis of S3, and learning richer entity representation;
and S5, classifying the candidate relation nodes and storing the entity relation in a structured form.
The word embedding adopts a pre-training model BioBERT as a word embedding model;
position embedding adopts sine and cosine functions with different frequencies to model different position information in sentences;
the entity type in the entity type embedding comprises an O type, namely the entity type is not an entity, a vector is randomly initialized to represent information contained in the entity type, the entity type embedding vector is used as a parameter of a model, and the optimization is carried out through a training process, namely the Fine-tuning is carried out.
Through the input module, each sentence S in a given text can be represented as a matrix X1Where the jth row in the matrix represents the vector representation of the jth word. And inputting the initial representation of the biomedical text obtained by the input module into a next entity relationship heteromorphic module to learn richer entity representations.
The heteromorphic image construction specifically comprises the following steps:
a1: assuming that the heterogeneous graph is represented as HG ═ (HV, HE), where HV represents the set of nodes and HE is the set of edges;
a2: given a biomedical text D subjected to entity labeling, a node set of a heterogeneous graph constructed on the basis of the text D consists of a plurality of subsets: node set E of various biomedical entities1,E2,...,ENWhere N denotes the number of categories of biomedical entities in a given text, and a candidate relationship node R, consisting of pairs of biomedical entities, formalized as HV ═ (E)1,E2,...,EN)UR;
A3: initializing and expressing the biomedical entity nodes as vector expression obtained by an input module; in addition, for the candidate relation nodes, firstly, the vector representation of the corresponding biomedical entity is spliced, then, after a full connection layer is used for activation, the activated vector is finally used as the initialization representation of the candidate relation nodes in the heteromorphic graph.
The construction of the edges between the nodes of the candidate relationship in the heteromorphic graph is carried out in the following two steps:
b1: each candidate relationship node is formed by pairing a chemical entity and a disease entity, so that an edge is constructed for each candidate relationship node and the corresponding biomedical entity;
b2: in order to take the similarity between the entities into consideration, similarity calculation is carried out between the entities, and if the similarity between two entities is large enough, an edge is constructed between the two entity nodes.
The double-layer attention mechanism specifically comprises the following steps:
c1: giving a certain candidate relation node, and firstly collecting 1-hop and 2-hop neighbor nodes of the node;
c2: then all neighbor nodes are divided into groups according to the node types: and updating the vector representation of the given candidate relation node by using two levels of attention mechanisms of various biomedical entity neighbor nodes and candidate relation neighbor nodes.
The double-layer attention mechanism consists of two parts: the node-level attention mechanism aggregates information of neighbors of the same type and the type-level attention mechanism aggregates information of neighbor nodes of different types, in the constructed heterogeneous graph, relationship reasoning is mainly based on vector representation of candidate relationship nodes, so that information related to the candidate relationship nodes needs to be considered as much as possible to improve the performance of relationship extraction, as each candidate relationship node is provided with edges connected with corresponding entity nodes, different neighbor nodes in the same type and different types of neighbors have different importance for representation learning of the candidate relationship nodes, and therefore for each candidate relationship node, the influence of the neighbor nodes on the learning of the candidate relationship node vector representation is considered in two levels.
The node-level attention mechanism is used for fully modeling the importance of different neighbor nodes with the same type, and specifically comprises the following steps:
d1: given a certain candidate relation node, all the v-class biomedical entity neighbor node sets are assumed to be represented as
Figure BDA0003160022270000071
Wherein any class v biomedical entity neighbors
Figure BDA0003160022270000072
For all the V-class biomedical entity neighbors, selective information aggregation is carried out through a node level attention mechanism to obtain a neighbor vector representation representing the V-class biomedical entity type
Figure BDA0003160022270000073
Figure BDA0003160022270000074
D2: neighbor vector representations for other types of entities and candidate relationship types through D1 node-level attention
Figure BDA0003160022270000081
The type level attention specifically comprises the following steps:
e1: the type-level attention is based on the node-level attention, the type-level attention learns the weights of different types of neighbors of a given candidate relationship node, and through a process similar to the node-level attention, the type-level attention is formally expressed as:
Figure BDA0003160022270000082
wherein
Figure BDA0003160022270000083
Representing multiple types of neighbors;
e2: vector representations to be obtained taking into account different neighbor nodes and different types of neighbors in the same type
Figure BDA0003160022270000084
E3: updating the original candidate relation node by using a full-connection network, and carrying out relation reasoning:
Figure BDA0003160022270000085
where σ denotes a Sigmoid activation function, the output value of which is between 0 and 1, so that
Figure BDA0003160022270000086
And finally, storing the extracted entity relationship in a structured form.
In particular, in use, an initial representation of the text is obtained by the input module, assuming that a given biomedical text that has been entity-tagged is a sequence of sentences, wherein each sentence is represented as a sequence of vector representations of each word (S)1,...,Si,...,SL) Considering the rich semantic information contained in each word, the vector representation of each word is composed of word embedding, position embedding and entity type embedding, the word embedding adopts a pre-training model BioBERT as a word embedding model, the position embedding adopts sine and cosine functions with different frequencies to model different position information in sentences, the entity type in the entity type embedding comprises O type, namely not entity, one vector is randomly initialized to represent the information contained in the entity type, the entity type embedding vector is used as a parameter of the model, the optimization is realized through a training process, namely Fine-tuning, the similarity information between biomedical entities in each document is modeled through a relation heterogeneous graph module, richer entity representation is learned, the relation heterogeneous graph module specifically comprises heterogeneous graph construction and a double-layer attention mechanism, and the assumption that the representation of heterogeneous graph is HG (HV, HE), where HV represents a set of nodes, HE is a set of edges, and then given a biomedical text D that has been subjected to entity labeling, the set of nodes of the heterogeneous graph constructed based on the text D is composed of a plurality of subsets: node set E of various biomedical entities1,E2,...,ENWhere N denotes the number of categories of biomedical entities in a given text, and a candidate relationship node R, consisting of pairs of biomedical entities, formalized as HV ═ (E)1,E2,...,EN) UR, initialization of biomedical entity nodesThe expression is vector expression obtained by an input module; in addition, for candidate relation nodes, firstly, the vector representation of the corresponding biomedical entities is spliced, then, after a full connection layer is used for activation, finally, the activated vector is used as the initialization representation of the candidate relation nodes in the heteromorphic graph, the candidate relation nodes are classified, the entity relation is stored in a structured form, as each candidate relation node is formed by pairing a chemical entity and a disease entity, an edge is constructed for each candidate relation node and the corresponding biomedical entity, similarity calculation is carried out between the entities in order to consider the similarity between the entities, if the similarity between the two entities is large enough, an edge is constructed between the two entity nodes, a double-layer attention machine is controlled to give a certain candidate relation node, 1-hop neighbor node and 2-hop neighbor nodes are collected firstly, then all neighbor nodes are divided into groups according to the node types: the vector representation of the given candidate relation node is updated by utilizing two levels of attention mechanisms of various biomedical entity neighbor nodes and candidate relation neighbor nodes, and the double-level attention mechanism consists of two parts: the node-level attention mechanism aggregates information of neighbors of the same type and aggregates information of neighbor nodes of different types, the node-level attention mechanism is used for fully modeling importance of different neighbor nodes of the same type, the node-level attention mechanism gives a certain candidate relation node, and all the neighbor node sets of the V-type biomedical entities are assumed to be represented as
Figure BDA0003160022270000091
Wherein any class v biomedical entity neighbors
Figure BDA0003160022270000092
For all the V-class biomedical entity neighbors, selective information aggregation is carried out through a node level attention mechanism to obtain a neighbor vector representation representing the V-class biomedical entity type
Figure BDA0003160022270000101
Figure BDA0003160022270000102
Neighbor vector representations for other types of entities and candidate relationship types through D1 node-level attention
Figure BDA0003160022270000103
Type-level attention based on node-level attention, type-level attention learns the weights of different types of neighbors of a given candidate relationship node, and through a process similar to node-level attention, type-level attention is formally expressed as:
Figure BDA0003160022270000104
vector representations to be obtained taking into account different neighbor nodes and different types of neighbors in the same type
Figure BDA0003160022270000105
Updating the original candidate relation node by using a full-connection network, and carrying out relation reasoning:
Figure BDA0003160022270000106
where σ denotes a Sigmoid activation function, the output value of which is between 0 and 1, so that
Figure BDA0003160022270000107
And finally, storing the extracted entity relationship in a structured form.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (9)

1. A biomedical entity relation extraction method for modeling entity similarity is characterized by comprising the following steps:
s1, obtaining an initial representation of the text by the input module;
s2, assuming that the given biomedical text which has been subjected to entity labeling is a sequence composed of sentences, wherein each sentence is expressed as a sequence composed of vector representations of each word (S)1,...,Si,...,SL);
S3, considering the abundant semantic information contained in each word, the vector representation of each word is composed of word embedding, position embedding and entity type embedding;
s4, modeling similarity information among the biomedical entities in each document through a relational heteromorphic model module on the basis of S3, and learning richer entity representation;
and S5, classifying the candidate relation nodes and storing the entity relation in a structured form.
2. The method of claim 1, wherein the method comprises the following steps: the word embedding adopts a pre-training model BioBERT as a word embedding model;
the position embedding adopts sine and cosine functions with different frequencies to model different position information in sentences;
the entity type in the entity type embedding comprises an O type, namely the entity type is not an entity, a vector is randomly initialized to represent information contained in the entity type, the entity type embedding vector is used as a parameter of a model, and the optimization is carried out through a training process, namely the Fine-tuning is obtained.
3. The method of claim 1, wherein the method comprises the following steps: the relationship heterogeneous graph module specifically comprises heterogeneous graph construction and a double-layer attention mechanism.
4. The method of claim 3, wherein the method comprises the following steps: the heteromorphic image construction specifically comprises the following steps:
a1: assuming that the heterogeneous graph is represented as HG ═ (HV, HE), where HV represents the set of nodes and HE is the set of edges;
a2: given a biomedical text D subjected to entity labeling, a node set of a heterogeneous graph constructed on the basis of the text D consists of a plurality of subsets: node set E of various biomedical entities1,E2,...,ENWhere N denotes the number of categories of biomedical entities in a given text, and a candidate relationship node R, consisting of pairs of biomedical entities, formalized as HV ═ (E)1,E2,...,EN)UR;
A3: initializing and expressing the biomedical entity nodes as vector expression obtained by an input module; in addition, for the candidate relation nodes, firstly, the vector representation of the corresponding biomedical entity is spliced, then, after a full connection layer is used for activation, the activated vector is finally used as the initialization representation of the candidate relation nodes in the heteromorphic graph.
5. The method of claim 4, wherein the method comprises the following steps: the construction of the edges between the nodes of the candidate relationship in the heteromorphic graph comprises the following two steps:
b1: each candidate relationship node is formed by pairing a chemical entity and a disease entity, so that an edge is constructed for each candidate relationship node and the corresponding biomedical entity;
b2: in order to take the similarity between the entities into consideration, similarity calculation is carried out between the entities, and if the similarity between two entities is large enough, an edge is constructed between the two entity nodes.
6. The method of claim 3, wherein the method comprises the following steps: the double-layer attention mechanism specifically comprises the following steps:
c1: giving a certain candidate relation node, and firstly collecting 1-hop and 2-hop neighbor nodes of the node;
c2: then all neighbor nodes are divided into groups according to the node types: and updating the vector representation of the given candidate relation node by using two levels of attention mechanisms of various biomedical entity neighbor nodes and candidate relation neighbor nodes.
7. The method of claim 3, wherein the method comprises the following steps: the double-layer attention mechanism consists of two parts: the node-level attention mechanism aggregates information of neighbors of the same type and the type-level attention mechanism aggregates information of neighbor nodes of different types.
8. The method of claim 7, wherein the method comprises the following steps: the node-level attention mechanism is used for fully modeling the importance of different neighbor nodes with the same type, and specifically comprises the following steps:
d1: given a certain candidate relation node, all the v-class biomedical entity neighbor node sets are assumed to be represented as
Figure FDA0003160022260000031
Wherein any class v biomedical entity neighbors
Figure FDA0003160022260000032
For all the V-class biomedical entity neighbors, selective information aggregation is carried out through a node level attention mechanism to obtain a neighbor vector representation representing the V-class biomedical entity type
Figure FDA0003160022260000033
Figure FDA0003160022260000034
D2: neighbor vector representations for other types of entities and candidate relationship types through D1 node-level attention
Figure FDA0003160022260000035
9. The method of claim 8, wherein the method comprises the following steps: the type level attention specifically comprises the following steps:
e1: the type-level attention is based on the node-level attention, the type-level attention learns the weights of different types of neighbors of a given candidate relationship node, and through a process similar to the node-level attention, the type-level attention is formally expressed as:
Figure FDA0003160022260000036
wherein
Figure FDA0003160022260000037
Representing multiple types of neighbors;
e2: vector representations to be obtained taking into account different neighbor nodes and different types of neighbors in the same type
Figure FDA0003160022260000038
E3: updating the original candidate relation node by using a full-connection network, and carrying out relation reasoning:
Figure FDA0003160022260000039
where σ denotes a Sigmoid activation function, the output value of which is between 0 and 1, so that
Figure FDA00031600222600000310
And finally, storing the extracted entity relationship in a structured form.
CN202110788351.3A 2021-07-13 2021-07-13 Biomedical entity relation extraction method for modeling entity similarity Pending CN113420551A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110788351.3A CN113420551A (en) 2021-07-13 2021-07-13 Biomedical entity relation extraction method for modeling entity similarity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110788351.3A CN113420551A (en) 2021-07-13 2021-07-13 Biomedical entity relation extraction method for modeling entity similarity

Publications (1)

Publication Number Publication Date
CN113420551A true CN113420551A (en) 2021-09-21

Family

ID=77720777

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110788351.3A Pending CN113420551A (en) 2021-07-13 2021-07-13 Biomedical entity relation extraction method for modeling entity similarity

Country Status (1)

Country Link
CN (1) CN113420551A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109284396A (en) * 2018-09-27 2019-01-29 北京大学深圳研究生院 Medical knowledge map construction method, apparatus, server and storage medium
CN110083838A (en) * 2019-04-29 2019-08-02 西安交通大学 Biomedical relation extraction method based on multilayer neural network Yu external knowledge library
WO2020147594A1 (en) * 2019-01-16 2020-07-23 阿里巴巴集团控股有限公司 Method, system, and device for obtaining expression of relationship between entities, and advertisement retrieval system
CN111710428A (en) * 2020-06-19 2020-09-25 华中师范大学 Biomedical text representation method for modeling global and local context interaction
CN111859935A (en) * 2020-07-03 2020-10-30 大连理工大学 Method for constructing cancer-related biomedical event database based on literature
CN111881256A (en) * 2020-07-17 2020-11-03 中国人民解放军战略支援部队信息工程大学 Text entity relation extraction method and device and computer readable storage medium equipment
CN112200317A (en) * 2020-09-28 2021-01-08 西南电子技术研究所(中国电子科技集团公司第十研究所) Multi-modal knowledge graph construction method
CN112271001A (en) * 2020-11-17 2021-01-26 中山大学 Medical consultation dialogue system and method applying heterogeneous graph neural network
CN112818113A (en) * 2021-01-26 2021-05-18 山西三友和智慧信息技术股份有限公司 Automatic text summarization method based on heteromorphic graph network

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109284396A (en) * 2018-09-27 2019-01-29 北京大学深圳研究生院 Medical knowledge map construction method, apparatus, server and storage medium
WO2020147594A1 (en) * 2019-01-16 2020-07-23 阿里巴巴集团控股有限公司 Method, system, and device for obtaining expression of relationship between entities, and advertisement retrieval system
CN110083838A (en) * 2019-04-29 2019-08-02 西安交通大学 Biomedical relation extraction method based on multilayer neural network Yu external knowledge library
CN111710428A (en) * 2020-06-19 2020-09-25 华中师范大学 Biomedical text representation method for modeling global and local context interaction
CN111859935A (en) * 2020-07-03 2020-10-30 大连理工大学 Method for constructing cancer-related biomedical event database based on literature
CN111881256A (en) * 2020-07-17 2020-11-03 中国人民解放军战略支援部队信息工程大学 Text entity relation extraction method and device and computer readable storage medium equipment
CN112200317A (en) * 2020-09-28 2021-01-08 西南电子技术研究所(中国电子科技集团公司第十研究所) Multi-modal knowledge graph construction method
CN112271001A (en) * 2020-11-17 2021-01-26 中山大学 Medical consultation dialogue system and method applying heterogeneous graph neural network
CN112818113A (en) * 2021-01-26 2021-05-18 山西三友和智慧信息技术股份有限公司 Automatic text summarization method based on heteromorphic graph network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
WEIZHONG ZHAO: "Document-Level Chemical-Induced Disease Relation Extraction via Hierarchical Representation Learning"", IEEE, pages 2782 - 2793 *
孔德强: "基于异构图的实体关联性挖掘", 中国优秀硕士论文电子期刊网, pages 138 - 1025 *
熊江: "《NoSQL数据库原理与应用》", 31 January 2020, 《浙江科学技术出版社》, pages: 15 *
王亦凡;李继云;: "基于异构图嵌入学习的相似病案推荐", 计算机系统应用, no. 10, pages 232 - 238 *

Similar Documents

Publication Publication Date Title
CN112241481B (en) Cross-modal news event classification method and system based on graph neural network
US8775341B1 (en) Intelligent control with hierarchical stacked neural networks
CN108197294A (en) A kind of text automatic generation method based on deep learning
CN111274800A (en) Inference type reading understanding method based on relational graph convolution network
CN111046907A (en) Semi-supervised convolutional network embedding method based on multi-head attention mechanism
Chen et al. Visual and textual sentiment analysis using deep fusion convolutional neural networks
CN110059181A (en) Short text stamp methods, system, device towards extensive classification system
CN113849653B (en) Text classification method and device
CN113779220A (en) Mongolian multi-hop question-answering method based on three-channel cognitive map and graph attention network
CN113378913A (en) Semi-supervised node classification method based on self-supervised learning
CN109960732B (en) Deep discrete hash cross-modal retrieval method and system based on robust supervision
WO2024036840A1 (en) Open-domain dialogue reply method and system based on topic enhancement
CN110781271A (en) Semi-supervised network representation learning model based on hierarchical attention mechanism
CN105975497A (en) Automatic microblog topic recommendation method and device
Aurangzeb et al. Aspect based multi-labeling using SVM based ensembler
CN112256870A (en) Attribute network representation learning method based on self-adaptive random walk
CN116403730A (en) Medicine interaction prediction method and system based on graph neural network
CN112685609A (en) Knowledge graph complementing method combining translation mechanism and convolutional neural network
CN114780691A (en) Model pre-training and natural language processing method, device, equipment and storage medium
CN114969367B (en) Cross-language entity alignment method based on multi-aspect subtask interaction
Biswas et al. Cat2type: Wikipedia category embeddings for entity typing in knowledge graphs
CN115496072A (en) Relation extraction method based on comparison learning
Chen et al. Sparse Boltzmann machines with structure learning as applied to text analysis
CN117131933A (en) Multi-mode knowledge graph establishing method and application
CN116522165B (en) Public opinion text matching system and method based on twin structure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination