CN110032649A - Relation extraction method and device between a kind of entity of TCM Document - Google Patents
Relation extraction method and device between a kind of entity of TCM Document Download PDFInfo
- Publication number
- CN110032649A CN110032649A CN201910293263.9A CN201910293263A CN110032649A CN 110032649 A CN110032649 A CN 110032649A CN 201910293263 A CN201910293263 A CN 201910293263A CN 110032649 A CN110032649 A CN 110032649A
- Authority
- CN
- China
- Prior art keywords
- entities
- relationship
- entity
- type
- chinese medicine
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 19
- 238000000034 method Methods 0.000 claims abstract description 38
- 238000004458 analytical method Methods 0.000 claims abstract description 30
- 238000012549 training Methods 0.000 claims abstract description 26
- 239000003814 drug Substances 0.000 claims description 76
- 238000003058 natural language processing Methods 0.000 claims description 18
- 238000004806 packaging method and process Methods 0.000 claims description 6
- 238000010276 construction Methods 0.000 claims description 5
- 239000000284 extract Substances 0.000 abstract description 4
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 10
- 201000010099 disease Diseases 0.000 description 9
- 210000004072 lung Anatomy 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 230000009545 invasion Effects 0.000 description 5
- 206010062717 Increased upper airway secretion Diseases 0.000 description 3
- 230000009471 action Effects 0.000 description 3
- 230000008506 pathogenesis Effects 0.000 description 3
- 208000026435 phlegm Diseases 0.000 description 3
- 208000027418 Wounds and injury Diseases 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000015271 coagulation Effects 0.000 description 2
- 238000005345 coagulation Methods 0.000 description 2
- 230000006378 damage Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 208000014674 injury Diseases 0.000 description 2
- 210000002784 stomach Anatomy 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000002411 adverse Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000009792 diffusion process Methods 0.000 description 1
- 210000001035 gastrointestinal tract Anatomy 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000033001 locomotion Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 210000000813 small intestine Anatomy 0.000 description 1
- 210000000952 spleen Anatomy 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Machine Translation (AREA)
Abstract
The present invention provides Relation extraction method and device between the entity of TCM Document a kind of, can be improved the accuracy rate that relationship type between entity extracts.The described method includes: being directed to TCM Document to be processed, relationship type between the entity type and entity marked to its partial content is obtained;According to the entity type training Named Entity Extraction Model marked;Entity recognition is named to TCM Document to be processed using trained Named Entity Extraction Model, according to name Entity recognition as a result, obtain there are the candidate entity of relationship to and mark sheet;According to obtain there are the candidate entity of relationship to and mark sheet, the statistical inference of figure probability is carried out with factor graph model, global learning object relationship characteristic obtains the probability between entity there are relationship;The type of relationship between entity is determined in conjunction with relationship type between the entity that dependency analysis extracts the method for true triple and has marked according to the probability between obtained entity there are relationship.The present invention relates to knowledge engineering fields.
Description
Technical Field
The invention relates to the field of knowledge engineering, in particular to a method and a device for extracting relationships between entities in traditional Chinese medicine documents.
Background
China has spread a lot of ancient books and documents in the field of traditional Chinese medicine, and the ancient books and documents are the basic basis for learning traditional Chinese medicine. However, most of these documents are written in ancient ways, and most of them are unstructured texts, which are very time-consuming to use. If the entities and the entity relationships between them can be extracted from the literature of chinese medicine, information retrieval, knowledge mining, and the like can be efficiently performed using the extracted relationships between the entities.
The entity relationship extraction method in the prior art is difficult to accurately extract the relationship between entities from the unstructured text.
Disclosure of Invention
The invention aims to provide a method and a device for extracting relationships between entities in traditional Chinese medicine documents, so as to solve the problem that the relationships between the entities are difficult to accurately extract from unstructured texts in the prior art.
In order to solve the above technical problems, an embodiment of the present invention provides a method for extracting relationships between entities in a traditional Chinese medicine document, including:
aiming at the traditional Chinese medicine document to be processed, acquiring entity types and relationship types among entities which are labeled on partial contents of the traditional Chinese medicine document;
training a named entity recognition model according to the marked entity type;
carrying out named entity recognition on the traditional Chinese medicine document to be processed by utilizing a trained named entity recognition model, and obtaining a candidate entity pair and a feature table with a relationship according to a named entity recognition result;
according to the obtained candidate entity pair and the feature table with the relationship, a factor graph model is used for carrying out statistical reasoning on graph probability, entity relationship features are learned globally, and the probability of the relationship between the entities is obtained;
and determining the type of the relationship between the entities by combining a method for extracting the fact triple through dependency analysis and the labeled type of the relationship between the entities according to the obtained probability of the relationship between the entities.
Further, the training of the named entity recognition model according to the labeled entity types includes:
according to the marked entity type, using a natural language processing tool to train a named entity recognition model to obtain the named entity recognition model suitable for the traditional Chinese medicine literature;
and integrating the obtained named entity recognition model suitable for the traditional Chinese medicine literature into a natural language processing tool, replacing the original named entity recognition model, packaging and compiling.
Further, the conducting named entity recognition on the traditional Chinese medicine document to be processed by utilizing the trained named entity recognition model, and obtaining the candidate entity pair and the feature table with the relationship according to the named entity recognition result comprises:
carrying out named entity recognition on the traditional Chinese medicine document to be processed by utilizing a trained named entity recognition model;
carrying out Cartesian product operation on the identified entities to obtain candidate entity pairs;
extracting text features of entities in the candidate entity pair to obtain a named entity recognition result of the context of the candidate entities to form a feature table;
and determining whether a relationship exists between the two entities in the partial candidate entity pair.
Further, the determining the type of the relationship between the entities according to the obtained probability of the relationship between the entities by combining the method for extracting the fact triple through dependency analysis and the labeled type of the relationship between the entities includes:
acquiring entity pairs with the relation probability between the entities larger than a preset threshold, analyzing sentences of the entity pairs with the relation probability between the entities larger than the preset threshold by using a dependency analysis method, and extracting fact triples with verbs as cores;
constructing a fact triple with a predicate verb as a core by analyzing the grammatical relation of the sentence;
and determining the type of the relationship between the entities according to the predicate verbs between the entity pairs and the marked relationship type between the entities.
The embodiment of the present invention further provides a device for extracting relationships between entities in a traditional Chinese medicine document, including:
the acquisition module is used for acquiring entity types and relationship types among entities which are labeled to partial contents of traditional Chinese medicine documents to be processed;
the training module is used for training the named entity recognition model according to the marked entity type;
the recognition module is used for recognizing the named entities of the traditional Chinese medicine documents to be processed by utilizing the trained named entity recognition model and obtaining a candidate entity pair and a feature table with a relationship according to the recognition result of the named entities;
the determining module is used for carrying out statistical reasoning on graph probability by using the factor graph model according to the obtained candidate entity pair with the existing relationship and the feature table, and learning the relationship features of the entities globally to obtain the probability of the existing relationship between the entities;
and the extraction module is used for determining the type of the relationship between the entities by combining the method for extracting the fact triple through dependency analysis and the labeled type of the relationship between the entities according to the obtained probability of the relationship between the entities.
Further, the training module comprises:
the training unit is used for carrying out named entity recognition model training by using a natural language processing tool according to the marked entity types to obtain a named entity recognition model suitable for the traditional Chinese medicine literature;
and the replacing unit is used for integrating the obtained named entity recognition model suitable for the traditional Chinese medicine literature into a natural language processing tool, replacing the original named entity recognition model, packaging and compiling.
Further, the identification module includes:
the recognition unit is used for carrying out named entity recognition on the traditional Chinese medicine document to be processed by utilizing the trained named entity recognition model;
the operation unit is used for carrying out Cartesian product operation on the identified entities to obtain candidate entity pairs;
the forming unit is used for extracting text characteristics of the entities in the candidate entity pair to obtain a named entity recognition result of the contexts of the candidate entities and form a characteristic table;
and the first determining unit is used for determining whether a relationship exists between two entities in the partial candidate entity pair.
Further, the extraction module comprises:
the analysis unit is used for acquiring entity pairs with the relation probability between the entities larger than a preset threshold, analyzing sentences of the entity pairs with the relation probability between the entities larger than the preset threshold by using a dependency analysis method, and extracting fact triples with verbs as cores;
the construction unit is used for constructing a fact triple with a predicate verb as a core by analyzing the grammatical relation of the sentence;
and the second determining unit is used for determining the type of the relationship between the entities by combining the marked relationship type between the entities according to the verb predicates between the entity pairs.
The technical scheme of the invention has the following beneficial effects:
in the scheme, aiming at the traditional Chinese medicine document to be processed, the entity type and the relationship type between the entities marked on part of the content of the traditional Chinese medicine document are obtained; training a named entity recognition model according to the marked entity type; carrying out named entity recognition on the traditional Chinese medicine document to be processed by utilizing a trained named entity recognition model, and obtaining a candidate entity pair and a feature table with a relationship according to a named entity recognition result; according to the obtained candidate entity pair and the feature table with the relationship, a factor graph model is used for carrying out statistical reasoning on graph probability, entity relationship features are learned globally, and the probability of the relationship between the entities is obtained; and determining the type of the relationship between the entities by combining a method for extracting the fact triple through dependency analysis and the labeled type of the relationship between the entities according to the obtained probability of the relationship between the entities. Therefore, the probability of the relationship existing between the entities and the dependency analysis method of natural language processing are combined, and the relationship type between the entities is determined according to the extracted fact triples and the labeled relationship type between the entities, so that the accuracy of the extraction of the relationship type between the entities is improved, and the content of the traditional Chinese medicine literature can be clearly and structurally expressed.
Drawings
FIG. 1 is a schematic flow chart of a method for extracting relationships between entities in a Chinese medical literature according to an embodiment of the present invention;
fig. 2 is a schematic diagram of an entity identification result according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of candidate entity pairs according to an embodiment of the present invention;
FIG. 4 is a schematic representation of features provided by an embodiment of the present invention;
FIG. 5 is a labeled diagram of whether a relationship exists between a pair of candidate entities according to an embodiment of the present invention;
FIG. 6 is a diagram illustrating a probability result of relationships between entities according to an embodiment of the present invention;
FIG. 7 is a diagram illustrating the results of relationships between entities that are ultimately formed according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of an apparatus for extracting relationships between entities in a chinese medical literature according to an embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.
The invention provides a method and a device for extracting relationships between entities in traditional Chinese medicine documents, aiming at the problem that the relationships between the entities are difficult to extract from unstructured texts.
Example one
As shown in fig. 1, the method for extracting relationships between entities in a traditional chinese medicine document according to an embodiment of the present invention includes:
s101, aiming at the traditional Chinese medicine literature to be processed, acquiring entity types and relationship types among entities which are labeled on partial contents of the traditional Chinese medicine literature;
s102, training a named entity recognition model according to the marked entity type;
s103, carrying out named entity recognition on the traditional Chinese medicine document to be processed by using the trained named entity recognition model, and obtaining a candidate entity pair and a feature table with a relationship according to the named entity recognition result;
s104, according to the obtained candidate entity pair and the feature table with the relationship, carrying out statistical reasoning on graph probability by using a factor graph model, and learning the relationship features of the entities globally to obtain the probability of the relationship between the entities;
and S105, determining the type of the relationship between the entities by combining the method for extracting the fact triple through dependence analysis and the labeled type of the relationship between the entities according to the obtained probability of the relationship between the entities.
The method for extracting the relationship between the entities of the traditional Chinese medicine literature, disclosed by the embodiment of the invention, aims at the traditional Chinese medicine literature to be processed, and obtains the entity type and the relationship type between the entities, which are marked on part of the contents of the traditional Chinese medicine literature; training a named entity recognition model according to the marked entity type; carrying out named entity recognition on the traditional Chinese medicine document to be processed by utilizing a trained named entity recognition model, and obtaining a candidate entity pair and a feature table with a relationship according to a named entity recognition result; according to the obtained candidate entity pair and the feature table with the relationship, a factor graph model is used for carrying out statistical reasoning on graph probability, entity relationship features are learned globally, and the probability of the relationship between the entities is obtained; and determining the type of the relationship between the entities by combining a method for extracting the fact triple through dependency analysis and the labeled type of the relationship between the entities according to the obtained probability of the relationship between the entities. Therefore, the probability of the relationship existing between the entities and the dependency analysis method of natural language processing are combined, and the relationship type between the entities is determined according to the extracted fact triples and the labeled relationship type between the entities, so that the accuracy of the extraction of the relationship type between the entities is improved, and the content of the traditional Chinese medicine literature can be clearly and structurally expressed.
In the embodiment, the extraction of the relationship between the entities also lays a foundation for the construction of the knowledge graph and the intelligent auxiliary diagnosis and treatment system in the field of traditional Chinese medicine, and is an indispensable important link.
In this embodiment, before S101, according to the specific content of the to-be-processed chinese medical literature, the main chinese medical entity type and the inter-entity relationship type of the to-be-processed chinese medical literature may be determined, and 20% of the content may be labeled with the entity type and the inter-entity relationship type.
In a specific implementation of the method for extracting relationships between entities in the foregoing chinese medical literature, further, the training a named entity recognition model according to the labeled entity types includes:
according to the marked entity type, using a natural language processing tool to train a named entity recognition model to obtain the named entity recognition model suitable for the traditional Chinese medicine literature;
and integrating the obtained named entity recognition model suitable for the traditional Chinese medicine literature into a natural language processing tool, replacing the original named entity recognition model, packaging and compiling.
In this embodiment, according to the labeled entity type, a stanford natural language processing tool (deepdive) may be used to perform named entity recognition model training, so as to obtain a named entity recognition model suitable for the literature of traditional Chinese medicine, integrate the model into the deepdive, replace the original named entity recognition model in the deepdive, and package and compile the same.
In this embodiment, the deepdive is an information extraction framework tool for stanford natural language processing, and is mainly used for extracting information of modern texts and extracting relationships among people, organizations and places.
In a specific implementation of the method for extracting relationships between entities in the literature of traditional Chinese medicine, the step of performing named entity recognition on the literature of traditional Chinese medicine to be processed by using the trained named entity recognition model and obtaining a candidate entity pair and a feature table having relationships according to a result of the named entity recognition includes:
carrying out named entity recognition on the traditional Chinese medicine document to be processed by utilizing a trained named entity recognition model;
carrying out Cartesian product operation on the identified entities to obtain candidate entity pairs;
extracting text features of entities in the candidate entity pair to obtain a named entity recognition result of the context of the candidate entities to form a feature table;
and determining whether a relationship exists between the two entities in the partial candidate entity pair.
In this embodiment, S103 mainly performs data preparation to prepare three data, i.e., whether a relationship exists between two entities in a candidate entity pair, a feature table, and a partial candidate entity pair, specifically:
s1031, conducting named entity recognition on the traditional Chinese medicine document to be processed by using the deepdive integrated with the new named entity recognition model, and conducting Cartesian product operation on the recognized entities to obtain candidate entity pairs;
in this embodiment, an entity pair is a pair of two entities, for example, entity a and entity B form entity pair (a, B).
S1032, extracting text features of the entities in the candidate entity pair to obtain a named entity recognition result of the context of the candidate entities to form a feature table;
s1033, part (e.g., 20%) of the candidate entity pairs are marked, the candidate entity pair with the relationship is marked as true, and the candidate entity pair without the relationship is marked as false. Meanwhile, some rules can be specified to assist in annotation, for example, a and B have a relationship, and then B and a also have a relationship, and the rules can reduce the workload of manual annotation. The labeled data serves as a priori knowledge for probabilistic model learning. By this, the required data is prepared, which provides the basis for the later construction of the probabilistic model.
In the embodiment, a factor graph model is used for learning the probability of the relationship between the entities to construct a probability model; specifically, the method comprises the following steps: and according to the obtained candidate entity pair with the relationship and the feature table, carrying out statistical reasoning on graph probability by using a factor graph model, and globally learning entity relationship features to form a probability model of the relationship between the entities, wherein the probability model is used for determining the probability of the relationship between the entities.
In this embodiment, the factor graph is a two-dimensional graph called factor graph obtained based on a product of several local functions obtained by factorizing a global function having multiple variables.
In a specific implementation of the method for extracting relationships between entities in the foregoing traditional Chinese medicine literature, further, the determining the type of relationships between entities, according to the obtained probability that relationships exist between entities, by combining a method for extracting fact triples by dependency analysis and a labeled type of relationships between entities, includes:
acquiring entity pairs with the relation probability between the entities larger than a preset threshold, analyzing sentences of the entity pairs with the relation probability between the entities larger than the preset threshold by using a dependency analysis method, and extracting fact triples with verbs as cores;
constructing a fact triple with a predicate verb as a core by analyzing the grammatical relation of the sentence;
and determining the type of the relationship between the entities according to the predicate verbs between the entity pairs and the marked relationship type between the entities.
In this embodiment, after obtaining the probability of the relationship existing between the entities, determining the type of the relationship between the entities by combining the method of extracting the fact triple through dependency analysis and the labeled type of the relationship between the entities may specifically include the following steps: for entity pairs with the probability of existing relations higher than a preset threshold (for example, 0.8), analyzing sentences in which the entity pairs are located according to a dependency analysis method, and extracting fact triples with verbs as cores; constructing a fact triple with a predicate verb as a core by analyzing some grammatical relations such as a principal and a subordinate object of a sentence or a principal and subordinate anaplerosis containing a relation of a subordinate object; and determining the type of the relationship between the entities according to the predicate verbs between the entity pairs and the relationship type between the entities marked in the step S101, wherein the type of the relationship between the entities is used as a final result of the relationship between the entities.
In the embodiment, the dependency analysis method is used for decomposing the sentences into triples, namely, the entities and the relationship between the entities are used for expressing one sentence, the meaning of the sentence can be structurally expressed, and a foundation is laid for constructing a knowledge graph in the future.
In summary, the embodiment modifies the stanford natural language processing tool into an information extraction method suitable for the traditional Chinese medicine literature and combines the information extraction method with dependency analysis, so as to provide an extraction method for the relationship between the entities of the traditional Chinese medicine literature, which can analyze the unstructured traditional Chinese medicine literature, realize the structuralization of the traditional Chinese medicine literature, and improve the accuracy of the extraction of the relationship types between the entities.
In order to better understand the method for extracting the relationship between entities in the traditional Chinese medicine literature according to the embodiment of the present invention, taking "dialectics of pathogenesis of traditional Chinese medicine" as an example, the method for extracting the relationship between entities in the traditional Chinese medicine literature according to the embodiment of the present invention is described in detail, and specifically includes the following steps:
first, labeling the entity type and the relationship type between entities for part of the content of TCM pathogenesis dialectics, for example, 20%, and obtaining the labeled entity type and relationship type between entities.
In this embodiment, the entity types include: etiology (by), location (bw), and manifestation (bx); wherein the etiology includes wind, cold, fire, heat and yin; the disease position comprises entities such as lung, collaterals, stomach, spleen, intestinal tract and small intestine; the manifestations include the loss of lung qi, unclear lung qi, loss of lung clear and moist, and phlegm-heat in the interior.
In this embodiment, the relationships between entities in the disease evolution can be classified into six categories, namely, a combination (between etiologies), an invasion (between etiologies and disease positions), an invaded relationship, a change (disease positions and etiologies), an appearance relationship and a cause-effect relationship; wherein,
the relationship of combination (between etiologies) mainly includes verb leading factors such as combination, hold, clip, meet, and beat;
the relationship of invasion (etiology to disease location) is mainly dominated by verbs such as invasion, consumption, diffusion, burning, decoction, invasion, injury, middle-jiao, disturbance, impact, obstruction, flow and injury;
the infringed relationship is mainly dominated by the subject, the quilt and other verbs;
the relationship of change (location of disease) is mainly dominated by verbs such as depression, loss, stagnation, coagulation, clear, adverse, blockage, stasis, disorder, movement and closure; the relationship of changes (etiology) is mainly dominated by the verbs of paranoid, exuberance, congestion, coagulation, exuberance, depression, and tenuation;
the occurrence relationship is mainly dominated by verbs such as transformation, generation, transformation, expression, formation, seeing, transfer, implication, brewing and the like;
causality is mainly dominated by verbs that cause, then, become, be, have, cause, even, appear, etc.
Second, the named entity recognition model is trained based on the labeled entity types.
Thirdly, using a new named entity recognition model obtained by training to recognize the pathogenesis and dialectics of traditional Chinese medicine, for example: the disease location of the heart, the lung, the stomach and other entities can be identified, the etiology of the wind, the cold and other entities is identified, the performance of the phlegm reducing and other entities is identified, and partial identification results are shown in fig. 2; performing cartesian product operation on the identified entities to obtain candidate entity pairs, for example: the obtained jin and phlegm form candidate entity pairs, and partial results are shown in FIG. 3; extracting text characteristics of the candidate entity pairs according to results of the candidate entity pairs, for example, if the original sentence is that the lung is not clear due to wind-cold, the wind-cold is identified as the cause of the disease, one word in the original text is that the left word and the right word are that the words are right and left, and the named entity identification results of the words are o and o to form a characteristic table, as shown in fig. 4, wherein o represents that the entity types are other; determining whether a relationship exists between two entities in a part of candidate entity pairs, for example, determining whether a relationship exists between two entities in 20% of candidate entity pairs according to a preset rule, assuming that true represents that a relationship exists and false represents that no relationship exists; the preset rule may be, for example, that a and B have a relationship, and then B and a also have a relationship, and the result of the relationship part is shown in fig. 5.
Fourthly, according to the obtained candidate entity pair and the feature table with the relationship, the factor graph model is used for carrying out statistical reasoning on graph probability, entity relationship features are learned globally, a probability model of the relationship among the entities is formed, the probability model is used for determining the probability of the relationship among the entities, and the result is shown in FIG. 6;
fifthly, acquiring entities with higher relation probability among the entities, extracting a fact triple method by combining dependency analysis, and determining the specific relation among the entities according to the marked relation type among the entities in the first step; for example, the expression "wind attacking lung site" is obtained as the invasion relationship of the etiology to the disease site, and partial results are shown in fig. 7.
Example two
The present invention further provides a specific embodiment of an apparatus for extracting relationships between entities in a chinese medical literature, which corresponds to the specific embodiment of the method for extracting relationships between entities in the foregoing chinese medical literature, and the apparatus for extracting relationships between entities in the foregoing chinese medical literature can achieve the object of the present invention by executing the process steps in the specific embodiment of the method, so the explanation in the specific embodiment of the method for extracting relationships between entities in the foregoing chinese medical literature is also applicable to the specific embodiment of the apparatus for extracting relationships between entities in the foregoing chinese medical literature, and will not be described in detail in the following specific embodiment of the present invention.
As shown in fig. 8, an embodiment of the present invention further provides an apparatus for extracting relationships between entities in a traditional chinese medicine document, including:
an obtaining module 11, configured to obtain, for a to-be-processed traditional Chinese medicine document, an entity type and an inter-entity relationship type that are labeled for part of contents of the to-be-processed traditional Chinese medicine document;
a training module 12, configured to train a named entity recognition model according to the labeled entity type;
the recognition module 13 is configured to perform named entity recognition on the to-be-processed traditional Chinese medicine documents by using the trained named entity recognition model, and obtain a candidate entity pair and a feature table having a relationship according to a named entity recognition result;
a determining module 14, configured to perform statistical inference on graph probability by using a factor graph model according to the obtained candidate entity pair and the feature table with the existing relationship, and learn the entity relationship features globally to obtain the probability of the existing relationship between entities;
and the extraction module 15 is configured to determine the type of the relationship between the entities according to the obtained probability that the relationship exists between the entities, in combination with the method for extracting the fact triple through dependency analysis and the labeled type of the relationship between the entities.
The device for extracting the relationship between the entities of the traditional Chinese medicine literature, disclosed by the embodiment of the invention, aims at the traditional Chinese medicine literature to be processed, and obtains the entity type and the relationship type between the entities which are marked on part of the contents of the traditional Chinese medicine literature; training a named entity recognition model according to the marked entity type; carrying out named entity recognition on the traditional Chinese medicine document to be processed by utilizing a trained named entity recognition model, and obtaining a candidate entity pair and a feature table with a relationship according to a named entity recognition result; according to the obtained candidate entity pair and the feature table with the relationship, a factor graph model is used for carrying out statistical reasoning on graph probability, entity relationship features are learned globally, and the probability of the relationship between the entities is obtained; and determining the type of the relationship between the entities by combining a method for extracting the fact triple through dependency analysis and the labeled type of the relationship between the entities according to the obtained probability of the relationship between the entities. Therefore, the probability of the relationship existing between the entities and the dependency analysis method of natural language processing are combined, and the relationship type between the entities is determined according to the extracted fact triples and the labeled relationship type between the entities, so that the accuracy of the extraction of the relationship type between the entities is improved, and the content of the traditional Chinese medicine literature can be clearly and structurally expressed.
In an embodiment of the foregoing apparatus for extracting relationships between entities in the chinese medical literature, the training module further includes:
the training unit is used for carrying out named entity recognition model training by using a natural language processing tool according to the marked entity types to obtain a named entity recognition model suitable for the traditional Chinese medicine literature;
and the replacing unit is used for integrating the obtained named entity recognition model suitable for the traditional Chinese medicine literature into a natural language processing tool, replacing the original named entity recognition model, packaging and compiling.
In an embodiment of the foregoing apparatus for extracting relationships between entities in the chinese medical literature, the identification module further includes:
the recognition unit is used for carrying out named entity recognition on the traditional Chinese medicine document to be processed by utilizing the trained named entity recognition model;
the operation unit is used for carrying out Cartesian product operation on the identified entities to obtain candidate entity pairs;
the forming unit is used for extracting text characteristics of the entities in the candidate entity pair to obtain a named entity recognition result of the contexts of the candidate entities and form a characteristic table;
and the first determining unit is used for determining whether a relationship exists between two entities in the partial candidate entity pair.
In an embodiment of the foregoing apparatus for extracting relationships between entities in the chinese medical literature, the extracting module further includes:
the analysis unit is used for acquiring entity pairs with the relation probability between the entities larger than a preset threshold, analyzing sentences of the entity pairs with the relation probability between the entities larger than the preset threshold by using a dependency analysis method, and extracting fact triples with verbs as cores;
the construction unit is used for constructing a fact triple with a predicate verb as a core by analyzing the grammatical relation of the sentence;
and the second determining unit is used for determining the type of the relationship between the entities by combining the marked relationship type between the entities according to the verb predicates between the entity pairs.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.
Claims (8)
1. A method for extracting relationships between entities in traditional Chinese medicine documents is characterized by comprising the following steps:
aiming at the traditional Chinese medicine document to be processed, acquiring entity types and relationship types among entities which are labeled on partial contents of the traditional Chinese medicine document;
training a named entity recognition model according to the marked entity type;
carrying out named entity recognition on the traditional Chinese medicine document to be processed by utilizing a trained named entity recognition model, and obtaining a candidate entity pair and a feature table with a relationship according to a named entity recognition result;
according to the obtained candidate entity pair and the feature table with the relationship, a factor graph model is used for carrying out statistical reasoning on graph probability, entity relationship features are learned globally, and the probability of the relationship between the entities is obtained;
and determining the type of the relationship between the entities by combining a method for extracting the fact triple through dependency analysis and the labeled type of the relationship between the entities according to the obtained probability of the relationship between the entities.
2. The method of extracting relationships between entities of the TCM literature according to claim 1, wherein said training of the named entity recognition model based on the labeled entity types comprises:
according to the marked entity type, using a natural language processing tool to train a named entity recognition model to obtain the named entity recognition model suitable for the traditional Chinese medicine literature;
and integrating the obtained named entity recognition model suitable for the traditional Chinese medicine literature into a natural language processing tool, replacing the original named entity recognition model, packaging and compiling.
3. The method of claim 1, wherein the step of performing named entity recognition on the TCM document to be processed by using the trained named entity recognition model and obtaining the candidate entity pair and the feature list having the relationship according to the recognition result of the named entity comprises:
carrying out named entity recognition on the traditional Chinese medicine document to be processed by utilizing a trained named entity recognition model;
carrying out Cartesian product operation on the identified entities to obtain candidate entity pairs;
extracting text features of entities in the candidate entity pair to obtain a named entity recognition result of the context of the candidate entities to form a feature table;
and determining whether a relationship exists between the two entities in the partial candidate entity pair.
4. The method of claim 1, wherein determining the type of the inter-entity relationship includes, based on the obtained probability of the existence of the relationship between the entities, combining a method of extracting fact triples by dependency analysis and the labeled inter-entity relationship type:
acquiring entity pairs with the relation probability between the entities larger than a preset threshold, analyzing sentences of the entity pairs with the relation probability between the entities larger than the preset threshold by using a dependency analysis method, and extracting fact triples with verbs as cores;
constructing a fact triple with a predicate verb as a core by analyzing the grammatical relation of the sentence;
and determining the type of the relationship between the entities according to the predicate verbs between the entity pairs and the marked relationship type between the entities.
5. An apparatus for extracting relationships between entities in a traditional Chinese medicine document, comprising:
the acquisition module is used for acquiring entity types and relationship types among entities which are labeled to partial contents of traditional Chinese medicine documents to be processed;
the training module is used for training the named entity recognition model according to the marked entity type;
the recognition module is used for recognizing the named entities of the traditional Chinese medicine documents to be processed by utilizing the trained named entity recognition model and obtaining a candidate entity pair and a feature table with a relationship according to the recognition result of the named entities;
the determining module is used for carrying out statistical reasoning on graph probability by using the factor graph model according to the obtained candidate entity pair with the existing relationship and the feature table, and learning the relationship features of the entities globally to obtain the probability of the existing relationship between the entities;
and the extraction module is used for determining the type of the relationship between the entities by combining the method for extracting the fact triple through dependency analysis and the labeled type of the relationship between the entities according to the obtained probability of the relationship between the entities.
6. The apparatus for extracting relationships between entities of TCM literature according to claim 5, wherein said training module comprises:
the training unit is used for carrying out named entity recognition model training by using a natural language processing tool according to the marked entity types to obtain a named entity recognition model suitable for the traditional Chinese medicine literature;
and the replacing unit is used for integrating the obtained named entity recognition model suitable for the traditional Chinese medicine literature into a natural language processing tool, replacing the original named entity recognition model, packaging and compiling.
7. The apparatus for extracting relationships between entities of TCM literature according to claim 5, wherein said identification module comprises:
the recognition unit is used for carrying out named entity recognition on the traditional Chinese medicine document to be processed by utilizing the trained named entity recognition model;
the operation unit is used for carrying out Cartesian product operation on the identified entities to obtain candidate entity pairs;
the forming unit is used for extracting text characteristics of the entities in the candidate entity pair to obtain a named entity recognition result of the contexts of the candidate entities and form a characteristic table;
and the first determining unit is used for determining whether a relationship exists between two entities in the partial candidate entity pair.
8. The apparatus for extracting relationships between entities of TCM literature according to claim 5, wherein said extraction module comprises:
the analysis unit is used for acquiring entity pairs with the relation probability between the entities larger than a preset threshold, analyzing sentences of the entity pairs with the relation probability between the entities larger than the preset threshold by using a dependency analysis method, and extracting fact triples with verbs as cores;
the construction unit is used for constructing a fact triple with a predicate verb as a core by analyzing the grammatical relation of the sentence;
and the second determining unit is used for determining the type of the relationship between the entities by combining the marked relationship type between the entities according to the verb predicates between the entity pairs.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910293263.9A CN110032649B (en) | 2019-04-12 | 2019-04-12 | Method and device for extracting relationships between entities in traditional Chinese medicine literature |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910293263.9A CN110032649B (en) | 2019-04-12 | 2019-04-12 | Method and device for extracting relationships between entities in traditional Chinese medicine literature |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110032649A true CN110032649A (en) | 2019-07-19 |
CN110032649B CN110032649B (en) | 2021-10-01 |
Family
ID=67238140
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910293263.9A Active CN110032649B (en) | 2019-04-12 | 2019-04-12 | Method and device for extracting relationships between entities in traditional Chinese medicine literature |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110032649B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110543571A (en) * | 2019-08-07 | 2019-12-06 | 北京市天元网络技术股份有限公司 | knowledge graph construction method and device for water conservancy informatization |
CN112036151A (en) * | 2020-09-09 | 2020-12-04 | 平安科技(深圳)有限公司 | Method and device for constructing gene disease relation knowledge base and computer equipment |
CN112036171A (en) * | 2020-09-04 | 2020-12-04 | 平安科技(深圳)有限公司 | Method, system and device for extracting specific medical names and relationships thereof |
CN112329440A (en) * | 2020-09-01 | 2021-02-05 | 浪潮云信息技术股份公司 | Relation extraction method and device based on two-stage screening and classification |
CN112599211A (en) * | 2020-12-25 | 2021-04-02 | 中电云脑(天津)科技有限公司 | Medical entity relationship extraction method and device |
CN112766485A (en) * | 2020-12-31 | 2021-05-07 | 平安科技(深圳)有限公司 | Training method, device, equipment and medium for named entity model |
CN112989032A (en) * | 2019-12-17 | 2021-06-18 | 医渡云(北京)技术有限公司 | Entity relationship classification method, apparatus, medium and electronic device |
CN114139610A (en) * | 2021-11-15 | 2022-03-04 | 中国中医科学院中医药信息研究所 | Traditional Chinese medicine clinical literature data structuring method and device based on deep learning |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107169079A (en) * | 2017-05-10 | 2017-09-15 | 浙江大学 | A kind of field text knowledge abstracting method based on Deepdive |
CN107247739A (en) * | 2017-05-10 | 2017-10-13 | 浙江大学 | A kind of financial publication text knowledge extracting method based on factor graph |
CN108280062A (en) * | 2018-01-19 | 2018-07-13 | 北京邮电大学 | Entity based on deep learning and entity-relationship recognition method and device |
CN108874878A (en) * | 2018-05-03 | 2018-11-23 | 众安信息技术服务有限公司 | A kind of building system and method for knowledge mapping |
CN108875051A (en) * | 2018-06-28 | 2018-11-23 | 中译语通科技股份有限公司 | Knowledge mapping method for auto constructing and system towards magnanimity non-structured text |
CN109062894A (en) * | 2018-07-19 | 2018-12-21 | 南京源成语义软件科技有限公司 | The automatic identification algorithm of Chinese natural language Entity Semantics relationship |
US20190005020A1 (en) * | 2017-06-30 | 2019-01-03 | Elsevier, Inc. | Systems and methods for extracting funder information from text |
CN109190113A (en) * | 2018-08-10 | 2019-01-11 | 北京科技大学 | A kind of knowledge mapping construction method of theory of traditional Chinese medical science ancient books and records |
CN109241538A (en) * | 2018-09-26 | 2019-01-18 | 上海德拓信息技术股份有限公司 | Based on the interdependent Chinese entity relation extraction method of keyword and verb |
-
2019
- 2019-04-12 CN CN201910293263.9A patent/CN110032649B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107169079A (en) * | 2017-05-10 | 2017-09-15 | 浙江大学 | A kind of field text knowledge abstracting method based on Deepdive |
CN107247739A (en) * | 2017-05-10 | 2017-10-13 | 浙江大学 | A kind of financial publication text knowledge extracting method based on factor graph |
US20190005020A1 (en) * | 2017-06-30 | 2019-01-03 | Elsevier, Inc. | Systems and methods for extracting funder information from text |
CN108280062A (en) * | 2018-01-19 | 2018-07-13 | 北京邮电大学 | Entity based on deep learning and entity-relationship recognition method and device |
CN108874878A (en) * | 2018-05-03 | 2018-11-23 | 众安信息技术服务有限公司 | A kind of building system and method for knowledge mapping |
CN108875051A (en) * | 2018-06-28 | 2018-11-23 | 中译语通科技股份有限公司 | Knowledge mapping method for auto constructing and system towards magnanimity non-structured text |
CN109062894A (en) * | 2018-07-19 | 2018-12-21 | 南京源成语义软件科技有限公司 | The automatic identification algorithm of Chinese natural language Entity Semantics relationship |
CN109190113A (en) * | 2018-08-10 | 2019-01-11 | 北京科技大学 | A kind of knowledge mapping construction method of theory of traditional Chinese medical science ancient books and records |
CN109241538A (en) * | 2018-09-26 | 2019-01-18 | 上海德拓信息技术股份有限公司 | Based on the interdependent Chinese entity relation extraction method of keyword and verb |
Non-Patent Citations (3)
Title |
---|
HUAIYU WAN等: ""Extracting relations from traditional Chinese medicine literature via heterogeneous entity networks"", 《JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION》 * |
朱玲等: ""基于关键动词的中医古籍概念实体间予以关系发现研究"", 《中国数字医学》 * |
林伟贇: ""基于海量网页的同类命名实体共现统计规律的研究"", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110543571A (en) * | 2019-08-07 | 2019-12-06 | 北京市天元网络技术股份有限公司 | knowledge graph construction method and device for water conservancy informatization |
CN112989032A (en) * | 2019-12-17 | 2021-06-18 | 医渡云(北京)技术有限公司 | Entity relationship classification method, apparatus, medium and electronic device |
CN112329440A (en) * | 2020-09-01 | 2021-02-05 | 浪潮云信息技术股份公司 | Relation extraction method and device based on two-stage screening and classification |
CN112036171A (en) * | 2020-09-04 | 2020-12-04 | 平安科技(深圳)有限公司 | Method, system and device for extracting specific medical names and relationships thereof |
WO2021169354A1 (en) * | 2020-09-04 | 2021-09-02 | 平安科技(深圳)有限公司 | Method and system for extracting specific medical references and relationship thereof, and apparatus |
CN112036171B (en) * | 2020-09-04 | 2024-06-25 | 平安科技(深圳)有限公司 | Extraction method, system and device for medical specific references and relation thereof |
CN112036151A (en) * | 2020-09-09 | 2020-12-04 | 平安科技(深圳)有限公司 | Method and device for constructing gene disease relation knowledge base and computer equipment |
CN112036151B (en) * | 2020-09-09 | 2024-04-05 | 平安科技(深圳)有限公司 | Gene disease relation knowledge base construction method, device and computer equipment |
CN112599211A (en) * | 2020-12-25 | 2021-04-02 | 中电云脑(天津)科技有限公司 | Medical entity relationship extraction method and device |
CN112599211B (en) * | 2020-12-25 | 2023-03-21 | 中电云脑(天津)科技有限公司 | Medical entity relationship extraction method and device |
CN112766485A (en) * | 2020-12-31 | 2021-05-07 | 平安科技(深圳)有限公司 | Training method, device, equipment and medium for named entity model |
WO2022142123A1 (en) * | 2020-12-31 | 2022-07-07 | 平安科技(深圳)有限公司 | Training method and apparatus for named entity model, device, and medium |
CN112766485B (en) * | 2020-12-31 | 2023-10-24 | 平安科技(深圳)有限公司 | Named entity model training method, device, equipment and medium |
CN114139610A (en) * | 2021-11-15 | 2022-03-04 | 中国中医科学院中医药信息研究所 | Traditional Chinese medicine clinical literature data structuring method and device based on deep learning |
CN114139610B (en) * | 2021-11-15 | 2024-04-26 | 中国中医科学院中医药信息研究所 | Deep learning-based traditional Chinese medicine clinical literature data structuring method and device |
Also Published As
Publication number | Publication date |
---|---|
CN110032649B (en) | 2021-10-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110032649B (en) | Method and device for extracting relationships between entities in traditional Chinese medicine literature | |
EP3862889A1 (en) | Responding to user queries by context-based intelligent agents | |
JP6309644B2 (en) | Method, system, and storage medium for realizing smart question answer | |
Alzahrani et al. | Understanding plagiarism linguistic patterns, textual features, and detection methods | |
CN107798140B (en) | Dialog system construction method, semantic controlled response method and device | |
CN105955956B (en) | A kind of implicit chapter relation recognition method of Chinese | |
Roy et al. | Supervising unsupervised open information extraction models | |
CN110851599B (en) | Automatic scoring method for Chinese composition and teaching assistance system | |
US20160140109A1 (en) | Generation of a semantic model from textual listings | |
CN105138864B (en) | Protein interactive relation data base construction method based on Biomedical literature | |
CN109145260A (en) | A kind of text information extraction method | |
CN108121702A (en) | Mathematics subjective item reads and appraises method and system | |
Wu et al. | Community answer generation based on knowledge graph | |
CN106257455A (en) | A kind of Bootstrapping algorithm based on dependence template extraction viewpoint evaluation object | |
CN113743097A (en) | Emotion triple extraction method based on span sharing and grammar dependency relationship enhancement | |
WO2022127040A1 (en) | Text processing method and apparatus, and device and storage medium | |
CN111768869A (en) | Medical guide mapping construction search system and method for intelligent question-answering system | |
CN112883286A (en) | BERT-based method, equipment and medium for analyzing microblog emotion of new coronary pneumonia epidemic situation | |
Gupta et al. | Plagiarism detection in text documents using sentence bounded stop word n-grams | |
Jin | Application optimization of NLP system under deep learning technology in text semantics and text classification | |
Terdalkar et al. | Framework for question-answering in Sanskrit through automated construction of knowledge graphs | |
Frattini et al. | Automatic extraction of cause-effect-relations from requirements artifacts | |
Chen et al. | Reconstructing event regions for event extraction via graph attention networks | |
Lei et al. | Open domain question answering with character-level deep learning models | |
CN109858550A (en) | Potential process failure mode machine identification method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |