CN110032649B - Method and device for extracting relationships between entities in traditional Chinese medicine literature - Google Patents

Method and device for extracting relationships between entities in traditional Chinese medicine literature Download PDF

Info

Publication number
CN110032649B
CN110032649B CN201910293263.9A CN201910293263A CN110032649B CN 110032649 B CN110032649 B CN 110032649B CN 201910293263 A CN201910293263 A CN 201910293263A CN 110032649 B CN110032649 B CN 110032649B
Authority
CN
China
Prior art keywords
entities
relationship
entity
chinese medicine
traditional chinese
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910293263.9A
Other languages
Chinese (zh)
Other versions
CN110032649A (en
Inventor
张德政
付雅慧
谢永红
阿孜古丽
刘宏岚
栗辉
田款阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology Beijing USTB
Original Assignee
University of Science and Technology Beijing USTB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology Beijing USTB filed Critical University of Science and Technology Beijing USTB
Priority to CN201910293263.9A priority Critical patent/CN110032649B/en
Publication of CN110032649A publication Critical patent/CN110032649A/en
Application granted granted Critical
Publication of CN110032649B publication Critical patent/CN110032649B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a method and a device for extracting relationships between entities in traditional Chinese medicine documents, which can improve the accuracy of extracting the relationship types between the entities. The method comprises the following steps: aiming at the traditional Chinese medicine document to be processed, acquiring entity types and relationship types among entities which are labeled on partial contents of the traditional Chinese medicine document; training a named entity recognition model according to the marked entity type; carrying out named entity recognition on the traditional Chinese medicine document to be processed by utilizing a trained named entity recognition model, and obtaining a candidate entity pair and a feature table with a relationship according to a named entity recognition result; according to the obtained candidate entity pair and the feature table with the relationship, a factor graph model is used for carrying out statistical reasoning on graph probability, entity relationship features are learned globally, and the probability of the relationship between the entities is obtained; and determining the type of the relationship between the entities by combining a method for extracting the fact triple through dependency analysis and the labeled type of the relationship between the entities according to the obtained probability of the relationship between the entities. The invention relates to the field of knowledge engineering.

Description

Method and device for extracting relationships between entities in traditional Chinese medicine literature
Technical Field
The invention relates to the field of knowledge engineering, in particular to a method and a device for extracting relationships between entities in traditional Chinese medicine documents.
Background
China has spread a lot of ancient books and documents in the field of traditional Chinese medicine, and the ancient books and documents are the basic basis for learning traditional Chinese medicine. However, most of these documents are written in ancient ways, and most of them are unstructured texts, which are very time-consuming to use. If the entities and the entity relationships between them can be extracted from the literature of chinese medicine, information retrieval, knowledge mining, and the like can be efficiently performed using the extracted relationships between the entities.
The entity relationship extraction method in the prior art is difficult to accurately extract the relationship between entities from the unstructured text.
Disclosure of Invention
The invention aims to provide a method and a device for extracting relationships between entities in traditional Chinese medicine documents, so as to solve the problem that the relationships between the entities are difficult to accurately extract from unstructured texts in the prior art.
In order to solve the above technical problems, an embodiment of the present invention provides a method for extracting relationships between entities in a traditional Chinese medicine document, including:
aiming at the traditional Chinese medicine document to be processed, acquiring entity types and relationship types among entities which are labeled on partial contents of the traditional Chinese medicine document;
training a named entity recognition model according to the marked entity type;
carrying out named entity recognition on the traditional Chinese medicine document to be processed by utilizing a trained named entity recognition model, and obtaining a candidate entity pair and a feature table with a relationship according to a named entity recognition result;
according to the obtained candidate entity pair and the feature table with the relationship, a factor graph model is used for carrying out statistical reasoning on graph probability, entity relationship features are learned globally, and the probability of the relationship between the entities is obtained;
and determining the type of the relationship between the entities by combining a method for extracting the fact triple through dependency analysis and the labeled type of the relationship between the entities according to the obtained probability of the relationship between the entities.
Further, the training of the named entity recognition model according to the labeled entity types includes:
according to the marked entity type, using a natural language processing tool to train a named entity recognition model to obtain the named entity recognition model suitable for the traditional Chinese medicine literature;
and integrating the obtained named entity recognition model suitable for the traditional Chinese medicine literature into a natural language processing tool, replacing the original named entity recognition model, packaging and compiling.
Further, the conducting named entity recognition on the traditional Chinese medicine document to be processed by utilizing the trained named entity recognition model, and obtaining the candidate entity pair and the feature table with the relationship according to the named entity recognition result comprises:
carrying out named entity recognition on the traditional Chinese medicine document to be processed by utilizing a trained named entity recognition model;
carrying out Cartesian product operation on the identified entities to obtain candidate entity pairs;
extracting text features of entities in the candidate entity pair to obtain a named entity recognition result of the context of the candidate entities to form a feature table;
and determining whether a relationship exists between the two entities in the partial candidate entity pair.
Further, the determining the type of the relationship between the entities according to the obtained probability of the relationship between the entities by combining the method for extracting the fact triple through dependency analysis and the labeled type of the relationship between the entities includes:
acquiring entity pairs with the relation probability between the entities larger than a preset threshold, analyzing sentences of the entity pairs with the relation probability between the entities larger than the preset threshold by using a dependency analysis method, and extracting fact triples with verbs as cores;
constructing a fact triple with a predicate verb as a core by analyzing the grammatical relation of the sentence;
and determining the type of the relationship between the entities according to the predicate verbs between the entity pairs and the marked relationship type between the entities.
The embodiment of the present invention further provides a device for extracting relationships between entities in a traditional Chinese medicine document, including:
the acquisition module is used for acquiring entity types and relationship types among entities which are labeled to partial contents of traditional Chinese medicine documents to be processed;
the training module is used for training the named entity recognition model according to the marked entity type;
the recognition module is used for recognizing the named entities of the traditional Chinese medicine documents to be processed by utilizing the trained named entity recognition model and obtaining a candidate entity pair and a feature table with a relationship according to the recognition result of the named entities;
the determining module is used for carrying out statistical reasoning on graph probability by using the factor graph model according to the obtained candidate entity pair with the existing relationship and the feature table, and learning the relationship features of the entities globally to obtain the probability of the existing relationship between the entities;
and the extraction module is used for determining the type of the relationship between the entities by combining the method for extracting the fact triple through dependency analysis and the labeled type of the relationship between the entities according to the obtained probability of the relationship between the entities.
Further, the training module comprises:
the training unit is used for carrying out named entity recognition model training by using a natural language processing tool according to the marked entity types to obtain a named entity recognition model suitable for the traditional Chinese medicine literature;
and the replacing unit is used for integrating the obtained named entity recognition model suitable for the traditional Chinese medicine literature into a natural language processing tool, replacing the original named entity recognition model, packaging and compiling.
Further, the identification module includes:
the recognition unit is used for carrying out named entity recognition on the traditional Chinese medicine document to be processed by utilizing the trained named entity recognition model;
the operation unit is used for carrying out Cartesian product operation on the identified entities to obtain candidate entity pairs;
the forming unit is used for extracting text characteristics of the entities in the candidate entity pair to obtain a named entity recognition result of the contexts of the candidate entities and form a characteristic table;
and the first determining unit is used for determining whether a relationship exists between two entities in the partial candidate entity pair.
Further, the extraction module comprises:
the analysis unit is used for acquiring entity pairs with the relation probability between the entities larger than a preset threshold, analyzing sentences of the entity pairs with the relation probability between the entities larger than the preset threshold by using a dependency analysis method, and extracting fact triples with verbs as cores;
the construction unit is used for constructing a fact triple with a predicate verb as a core by analyzing the grammatical relation of the sentence;
and the second determining unit is used for determining the type of the relationship between the entities by combining the marked relationship type between the entities according to the verb predicates between the entity pairs.
The technical scheme of the invention has the following beneficial effects:
in the scheme, aiming at the traditional Chinese medicine document to be processed, the entity type and the relationship type between the entities marked on part of the content of the traditional Chinese medicine document are obtained; training a named entity recognition model according to the marked entity type; carrying out named entity recognition on the traditional Chinese medicine document to be processed by utilizing a trained named entity recognition model, and obtaining a candidate entity pair and a feature table with a relationship according to a named entity recognition result; according to the obtained candidate entity pair and the feature table with the relationship, a factor graph model is used for carrying out statistical reasoning on graph probability, entity relationship features are learned globally, and the probability of the relationship between the entities is obtained; and determining the type of the relationship between the entities by combining a method for extracting the fact triple through dependency analysis and the labeled type of the relationship between the entities according to the obtained probability of the relationship between the entities. Therefore, the probability of the relationship existing between the entities and the dependency analysis method of natural language processing are combined, and the relationship type between the entities is determined according to the extracted fact triples and the labeled relationship type between the entities, so that the accuracy of the extraction of the relationship type between the entities is improved, and the content of the traditional Chinese medicine literature can be clearly and structurally expressed.
Drawings
FIG. 1 is a schematic flow chart of a method for extracting relationships between entities in a Chinese medical literature according to an embodiment of the present invention;
fig. 2 is a schematic diagram of an entity identification result according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of candidate entity pairs according to an embodiment of the present invention;
FIG. 4 is a schematic representation of features provided by an embodiment of the present invention;
FIG. 5 is a labeled diagram of whether a relationship exists between a pair of candidate entities according to an embodiment of the present invention;
FIG. 6 is a diagram illustrating a probability result of relationships between entities according to an embodiment of the present invention;
FIG. 7 is a diagram illustrating the results of relationships between entities that are ultimately formed according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of an apparatus for extracting relationships between entities in a chinese medical literature according to an embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.
The invention provides a method and a device for extracting relationships between entities in traditional Chinese medicine documents, aiming at the problem that the relationships between the entities are difficult to extract from unstructured texts.
Example one
As shown in fig. 1, the method for extracting relationships between entities in a traditional chinese medicine document according to an embodiment of the present invention includes:
s101, aiming at the traditional Chinese medicine literature to be processed, acquiring entity types and relationship types among entities which are labeled on partial contents of the traditional Chinese medicine literature;
s102, training a named entity recognition model according to the marked entity type;
s103, carrying out named entity recognition on the traditional Chinese medicine document to be processed by using the trained named entity recognition model, and obtaining a candidate entity pair and a feature table with a relationship according to the named entity recognition result;
s104, according to the obtained candidate entity pair and the feature table with the relationship, carrying out statistical reasoning on graph probability by using a factor graph model, and learning the relationship features of the entities globally to obtain the probability of the relationship between the entities;
and S105, determining the type of the relationship between the entities by combining the method for extracting the fact triple through dependence analysis and the labeled type of the relationship between the entities according to the obtained probability of the relationship between the entities.
The method for extracting the relationship between the entities of the traditional Chinese medicine literature, disclosed by the embodiment of the invention, aims at the traditional Chinese medicine literature to be processed, and obtains the entity type and the relationship type between the entities, which are marked on part of the contents of the traditional Chinese medicine literature; training a named entity recognition model according to the marked entity type; carrying out named entity recognition on the traditional Chinese medicine document to be processed by utilizing a trained named entity recognition model, and obtaining a candidate entity pair and a feature table with a relationship according to a named entity recognition result; according to the obtained candidate entity pair and the feature table with the relationship, a factor graph model is used for carrying out statistical reasoning on graph probability, entity relationship features are learned globally, and the probability of the relationship between the entities is obtained; and determining the type of the relationship between the entities by combining a method for extracting the fact triple through dependency analysis and the labeled type of the relationship between the entities according to the obtained probability of the relationship between the entities. Therefore, the probability of the relationship existing between the entities and the dependency analysis method of natural language processing are combined, and the relationship type between the entities is determined according to the extracted fact triples and the labeled relationship type between the entities, so that the accuracy of the extraction of the relationship type between the entities is improved, and the content of the traditional Chinese medicine literature can be clearly and structurally expressed.
In the embodiment, the extraction of the relationship between the entities also lays a foundation for the construction of the knowledge graph and the intelligent auxiliary diagnosis and treatment system in the field of traditional Chinese medicine, and is an indispensable important link.
In this embodiment, before S101, according to the specific content of the to-be-processed chinese medical literature, the main chinese medical entity type and the inter-entity relationship type of the to-be-processed chinese medical literature may be determined, and 20% of the content may be labeled with the entity type and the inter-entity relationship type.
In a specific implementation of the method for extracting relationships between entities in the foregoing chinese medical literature, further, the training a named entity recognition model according to the labeled entity types includes:
according to the marked entity type, using a natural language processing tool to train a named entity recognition model to obtain the named entity recognition model suitable for the traditional Chinese medicine literature;
and integrating the obtained named entity recognition model suitable for the traditional Chinese medicine literature into a natural language processing tool, replacing the original named entity recognition model, packaging and compiling.
In this embodiment, according to the labeled entity type, a stanford natural language processing tool (deepdive) may be used to perform named entity recognition model training, so as to obtain a named entity recognition model suitable for the literature of traditional Chinese medicine, integrate the model into the deepdive, replace the original named entity recognition model in the deepdive, and package and compile the same.
In this embodiment, the deepdive is an information extraction framework tool for stanford natural language processing, and is mainly used for extracting information of modern texts and extracting relationships among people, organizations and places.
In a specific implementation of the method for extracting relationships between entities in the literature of traditional Chinese medicine, the step of performing named entity recognition on the literature of traditional Chinese medicine to be processed by using the trained named entity recognition model and obtaining a candidate entity pair and a feature table having relationships according to a result of the named entity recognition includes:
carrying out named entity recognition on the traditional Chinese medicine document to be processed by utilizing a trained named entity recognition model;
carrying out Cartesian product operation on the identified entities to obtain candidate entity pairs;
extracting text features of entities in the candidate entity pair to obtain a named entity recognition result of the context of the candidate entities to form a feature table;
and determining whether a relationship exists between the two entities in the partial candidate entity pair.
In this embodiment, S103 mainly performs data preparation to prepare three data, i.e., whether a relationship exists between two entities in a candidate entity pair, a feature table, and a partial candidate entity pair, specifically:
s1031, conducting named entity recognition on the traditional Chinese medicine document to be processed by using the deepdive integrated with the new named entity recognition model, and conducting Cartesian product operation on the recognized entities to obtain candidate entity pairs;
in this embodiment, an entity pair is a pair of two entities, for example, entity a and entity B form entity pair (a, B).
S1032, extracting text features of the entities in the candidate entity pair to obtain a named entity recognition result of the context of the candidate entities to form a feature table;
s1033, part (e.g., 20%) of the candidate entity pairs are marked, the candidate entity pair with the relationship is marked as true, and the candidate entity pair without the relationship is marked as false. Meanwhile, some rules can be specified to assist in annotation, for example, a and B have a relationship, and then B and a also have a relationship, and the rules can reduce the workload of manual annotation. The labeled data serves as a priori knowledge for probabilistic model learning. By this, the required data is prepared, which provides the basis for the later construction of the probabilistic model.
In the embodiment, a factor graph model is used for learning the probability of the relationship between the entities to construct a probability model; specifically, the method comprises the following steps: and according to the obtained candidate entity pair with the relationship and the feature table, carrying out statistical reasoning on graph probability by using a factor graph model, and globally learning entity relationship features to form a probability model of the relationship between the entities, wherein the probability model is used for determining the probability of the relationship between the entities.
In this embodiment, the factor graph is a two-dimensional graph called factor graph obtained based on a product of several local functions obtained by factorizing a global function having multiple variables.
In a specific implementation of the method for extracting relationships between entities in the foregoing traditional Chinese medicine literature, further, the determining the type of relationships between entities, according to the obtained probability that relationships exist between entities, by combining a method for extracting fact triples by dependency analysis and a labeled type of relationships between entities, includes:
acquiring entity pairs with the relation probability between the entities larger than a preset threshold, analyzing sentences of the entity pairs with the relation probability between the entities larger than the preset threshold by using a dependency analysis method, and extracting fact triples with verbs as cores;
constructing a fact triple with a predicate verb as a core by analyzing the grammatical relation of the sentence;
and determining the type of the relationship between the entities according to the predicate verbs between the entity pairs and the marked relationship type between the entities.
In this embodiment, after obtaining the probability of the relationship existing between the entities, determining the type of the relationship between the entities by combining the method of extracting the fact triple through dependency analysis and the labeled type of the relationship between the entities may specifically include the following steps: for entity pairs with the probability of existing relations higher than a preset threshold (for example, 0.8), analyzing sentences in which the entity pairs are located according to a dependency analysis method, and extracting fact triples with verbs as cores; constructing a fact triple with a predicate verb as a core by analyzing some grammatical relations such as a principal and a subordinate object of a sentence or a principal and subordinate anaplerosis containing a relation of a subordinate object; and determining the type of the relationship between the entities according to the predicate verbs between the entity pairs and the relationship type between the entities marked in the step S101, wherein the type of the relationship between the entities is used as a final result of the relationship between the entities.
In the embodiment, the dependency analysis method is used for decomposing the sentences into triples, namely, the entities and the relationship between the entities are used for expressing one sentence, the meaning of the sentence can be structurally expressed, and a foundation is laid for constructing a knowledge graph in the future.
In summary, the embodiment modifies the stanford natural language processing tool into an information extraction method suitable for the traditional Chinese medicine literature and combines the information extraction method with dependency analysis, so as to provide an extraction method for the relationship between the entities of the traditional Chinese medicine literature, which can analyze the unstructured traditional Chinese medicine literature, realize the structuralization of the traditional Chinese medicine literature, and improve the accuracy of the extraction of the relationship types between the entities.
In order to better understand the method for extracting the relationship between entities in the traditional Chinese medicine literature according to the embodiment of the present invention, taking "dialectics of pathogenesis of traditional Chinese medicine" as an example, the method for extracting the relationship between entities in the traditional Chinese medicine literature according to the embodiment of the present invention is described in detail, and specifically includes the following steps:
first, labeling the entity type and the relationship type between entities for part of the content of TCM pathogenesis dialectics, for example, 20%, and obtaining the labeled entity type and relationship type between entities.
In this embodiment, the entity types include: etiology (by), location (bw), and manifestation (bx); wherein the etiology includes wind, cold, fire, heat and yin; the disease position comprises entities such as lung, collaterals, stomach, spleen, intestinal tract and small intestine; the manifestations include the loss of lung qi, unclear lung qi, loss of lung clear and moist, and phlegm-heat in the interior.
In this embodiment, the relationships between entities in the disease evolution can be classified into six categories, namely, a combination (between etiologies), an invasion (between etiologies and disease positions), an invaded relationship, a change (disease positions and etiologies), an appearance relationship and a cause-effect relationship; wherein the content of the first and second substances,
the relationship of combination (between etiologies) mainly includes verb leading factors such as combination, hold, clip, meet, and beat;
the relationship of invasion (etiology to disease location) is mainly dominated by verbs such as invasion, consumption, diffusion, burning, decoction, invasion, injury, middle-jiao, disturbance, impact, obstruction, flow and injury;
the infringed relationship is mainly dominated by the subject, the quilt and other verbs;
the relationship of change (location of disease) is mainly dominated by verbs such as depression, loss, stagnation, coagulation, clear, adverse, blockage, stasis, disorder, movement and closure; the relationship of changes (etiology) is mainly dominated by the verbs of paranoid, exuberance, congestion, coagulation, exuberance, depression, and tenuation;
the occurrence relationship is mainly dominated by verbs such as transformation, generation, transformation, expression, formation, seeing, transfer, implication, brewing and the like;
causality is mainly dominated by verbs that cause, then, become, be, have, cause, even, appear, etc.
Second, the named entity recognition model is trained based on the labeled entity types.
Thirdly, using a new named entity recognition model obtained by training to recognize the pathogenesis and dialectics of traditional Chinese medicine, for example: the disease location of the heart, the lung, the stomach and other entities can be identified, the etiology of the wind, the cold and other entities is identified, the performance of the phlegm reducing and other entities is identified, and partial identification results are shown in fig. 2; performing cartesian product operation on the identified entities to obtain candidate entity pairs, for example: the obtained jin and phlegm form candidate entity pairs, and partial results are shown in FIG. 3; extracting text characteristics of the candidate entity pairs according to results of the candidate entity pairs, for example, if the original sentence is that the lung is not clear due to wind-cold, the wind-cold is identified as the cause of the disease, one word in the original text is that the left word and the right word are that the words are right and left, and the named entity identification results of the words are o and o to form a characteristic table, as shown in fig. 4, wherein o represents that the entity types are other; determining whether a relationship exists between two entities in a part of candidate entity pairs, for example, determining whether a relationship exists between two entities in 20% of candidate entity pairs according to a preset rule, assuming that true represents that a relationship exists and false represents that no relationship exists; the preset rule may be, for example, that a and B have a relationship, and then B and a also have a relationship, and the result of the relationship part is shown in fig. 5.
Fourthly, according to the obtained candidate entity pair and the feature table with the relationship, the factor graph model is used for carrying out statistical reasoning on graph probability, entity relationship features are learned globally, a probability model of the relationship among the entities is formed, the probability model is used for determining the probability of the relationship among the entities, and the result is shown in FIG. 6;
fifthly, acquiring entities with higher relation probability among the entities, extracting a fact triple method by combining dependency analysis, and determining the specific relation among the entities according to the marked relation type among the entities in the first step; for example, the expression "wind attacking lung site" is obtained as the invasion relationship of the etiology to the disease site, and partial results are shown in fig. 7.
Example two
The present invention further provides a specific embodiment of an apparatus for extracting relationships between entities in a chinese medical literature, which corresponds to the specific embodiment of the method for extracting relationships between entities in the foregoing chinese medical literature, and the apparatus for extracting relationships between entities in the foregoing chinese medical literature can achieve the object of the present invention by executing the process steps in the specific embodiment of the method, so the explanation in the specific embodiment of the method for extracting relationships between entities in the foregoing chinese medical literature is also applicable to the specific embodiment of the apparatus for extracting relationships between entities in the foregoing chinese medical literature, and will not be described in detail in the following specific embodiment of the present invention.
As shown in fig. 8, an embodiment of the present invention further provides an apparatus for extracting relationships between entities in a traditional chinese medicine document, including:
an obtaining module 11, configured to obtain, for a to-be-processed traditional Chinese medicine document, an entity type and an inter-entity relationship type that are labeled for part of contents of the to-be-processed traditional Chinese medicine document;
a training module 12, configured to train a named entity recognition model according to the labeled entity type;
the recognition module 13 is configured to perform named entity recognition on the to-be-processed traditional Chinese medicine documents by using the trained named entity recognition model, and obtain a candidate entity pair and a feature table having a relationship according to a named entity recognition result;
a determining module 14, configured to perform statistical inference on graph probability by using a factor graph model according to the obtained candidate entity pair and the feature table with the existing relationship, and learn the entity relationship features globally to obtain the probability of the existing relationship between entities;
and the extraction module 15 is configured to determine the type of the relationship between the entities according to the obtained probability that the relationship exists between the entities, in combination with the method for extracting the fact triple through dependency analysis and the labeled type of the relationship between the entities.
The device for extracting the relationship between the entities of the traditional Chinese medicine literature, disclosed by the embodiment of the invention, aims at the traditional Chinese medicine literature to be processed, and obtains the entity type and the relationship type between the entities which are marked on part of the contents of the traditional Chinese medicine literature; training a named entity recognition model according to the marked entity type; carrying out named entity recognition on the traditional Chinese medicine document to be processed by utilizing a trained named entity recognition model, and obtaining a candidate entity pair and a feature table with a relationship according to a named entity recognition result; according to the obtained candidate entity pair and the feature table with the relationship, a factor graph model is used for carrying out statistical reasoning on graph probability, entity relationship features are learned globally, and the probability of the relationship between the entities is obtained; and determining the type of the relationship between the entities by combining a method for extracting the fact triple through dependency analysis and the labeled type of the relationship between the entities according to the obtained probability of the relationship between the entities. Therefore, the probability of the relationship existing between the entities and the dependency analysis method of natural language processing are combined, and the relationship type between the entities is determined according to the extracted fact triples and the labeled relationship type between the entities, so that the accuracy of the extraction of the relationship type between the entities is improved, and the content of the traditional Chinese medicine literature can be clearly and structurally expressed.
In an embodiment of the foregoing apparatus for extracting relationships between entities in the chinese medical literature, the training module further includes:
the training unit is used for carrying out named entity recognition model training by using a natural language processing tool according to the marked entity types to obtain a named entity recognition model suitable for the traditional Chinese medicine literature;
and the replacing unit is used for integrating the obtained named entity recognition model suitable for the traditional Chinese medicine literature into a natural language processing tool, replacing the original named entity recognition model, packaging and compiling.
In an embodiment of the foregoing apparatus for extracting relationships between entities in the chinese medical literature, the identification module further includes:
the recognition unit is used for carrying out named entity recognition on the traditional Chinese medicine document to be processed by utilizing the trained named entity recognition model;
the operation unit is used for carrying out Cartesian product operation on the identified entities to obtain candidate entity pairs;
the forming unit is used for extracting text characteristics of the entities in the candidate entity pair to obtain a named entity recognition result of the contexts of the candidate entities and form a characteristic table;
and the first determining unit is used for determining whether a relationship exists between two entities in the partial candidate entity pair.
In an embodiment of the foregoing apparatus for extracting relationships between entities in the chinese medical literature, the extracting module further includes:
the analysis unit is used for acquiring entity pairs with the relation probability between the entities larger than a preset threshold, analyzing sentences of the entity pairs with the relation probability between the entities larger than the preset threshold by using a dependency analysis method, and extracting fact triples with verbs as cores;
the construction unit is used for constructing a fact triple with a predicate verb as a core by analyzing the grammatical relation of the sentence;
and the second determining unit is used for determining the type of the relationship between the entities by combining the marked relationship type between the entities according to the verb predicates between the entity pairs.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (2)

1. A method for extracting relationships between entities in traditional Chinese medicine documents is characterized by comprising the following steps:
aiming at the traditional Chinese medicine document to be processed, acquiring entity types and relationship types among entities which are labeled on partial contents of the traditional Chinese medicine document;
training a named entity recognition model according to the marked entity type;
carrying out named entity recognition on the traditional Chinese medicine document to be processed by utilizing a trained named entity recognition model, and obtaining a candidate entity pair and a feature table with a relationship according to a named entity recognition result; the method comprises the following steps:
carrying out named entity recognition on the traditional Chinese medicine document to be processed by utilizing a trained named entity recognition model;
carrying out Cartesian product operation on the identified entities to obtain candidate entity pairs;
extracting text features of entities in the candidate entity pair to obtain a named entity recognition result of the context of the candidate entities to form a feature table;
determining whether a relationship exists between two entities in a part of candidate entity pairs;
according to the obtained candidate entity pair and the feature table with the relationship, a factor graph model is used for carrying out statistical reasoning on graph probability, entity relationship features are learned globally, and the probability of the relationship between the entities is obtained;
determining the type of the relationship between the entities by combining a method for extracting the fact triple and the labeled type of the relationship between the entities according to the obtained probability of the relationship between the entities and the dependence analysis; the method comprises the following steps:
acquiring entity pairs with the relation probability between the entities larger than a preset threshold, analyzing sentences of the entity pairs with the relation probability between the entities larger than the preset threshold by using a dependency analysis method, and extracting fact triples with verbs as cores;
constructing a fact triple with a predicate verb as a core by analyzing the grammatical relation of the sentence;
determining the type of the relationship between the entities according to the predicate verbs between the entity pairs and the marked relationship type between the entities;
the training of the named entity recognition model according to the labeled entity types comprises the following steps:
according to the marked entity type, using a natural language processing tool to train a named entity recognition model to obtain the named entity recognition model suitable for the traditional Chinese medicine literature;
integrating the named entity recognition model suitable for the traditional Chinese medicine literature into a natural language processing tool, replacing the original named entity recognition model, packaging and compiling;
in the step, according to the marked entity type, a Stanford natural language processing tool deepdive is used for training a named entity recognition model to obtain a named entity recognition model suitable for Chinese medicine documents, the model is integrated and placed into the deepdive, the original named entity recognition model in the deepdive is replaced, and packaging and compiling are carried out;
in the step of performing named entity recognition on the traditional Chinese medicine document to be processed by using the trained named entity recognition model, and obtaining the candidate entity pair and the feature table with the relationship according to the named entity recognition result, data preparation is performed to prepare three parts of data of whether the relationship exists between two entities in the candidate entity pair, the feature table and part of the candidate entity pair, and the method specifically comprises the following steps:
carrying out named entity recognition on the traditional Chinese medicine document to be processed by using the deepive integrated with the new named entity recognition model, and carrying out Cartesian product operation on the recognized entities to obtain candidate entity pairs;
the entity pair is formed by two entities, and the entity A and the entity B form the entity pair;
extracting text features of entities in the candidate entity pair to obtain a named entity recognition result of the context of the candidate entities to form a feature table;
marking 20% of candidate entity pairs, marking the candidate entity pairs with the relationship as true, and marking the candidate entity pairs without the relationship as false; and simultaneously appointing some rules to assist the annotation, wherein the rules comprise: if there is a relationship between A and B, then there is also a relationship between B and A.
2. An apparatus for extracting relationships between entities in a traditional Chinese medicine document, comprising:
the acquisition module is used for acquiring entity types and relationship types among entities which are labeled to partial contents of traditional Chinese medicine documents to be processed;
the training module is used for training the named entity recognition model according to the marked entity type;
the recognition module is used for recognizing the named entities of the traditional Chinese medicine documents to be processed by utilizing the trained named entity recognition model and obtaining a candidate entity pair and a feature table with a relationship according to the recognition result of the named entities;
the identification module comprises:
the recognition unit is used for carrying out named entity recognition on the traditional Chinese medicine document to be processed by utilizing the trained named entity recognition model;
the operation unit is used for carrying out Cartesian product operation on the identified entities to obtain candidate entity pairs;
the forming unit is used for extracting text characteristics of the entities in the candidate entity pair to obtain a named entity recognition result of the contexts of the candidate entities and form a characteristic table;
a first determining unit, configured to determine whether a relationship exists between two entities in the partial candidate entity pair;
the determining module is used for carrying out statistical reasoning on graph probability by using the factor graph model according to the obtained candidate entity pair with the existing relationship and the feature table, and learning the relationship features of the entities globally to obtain the probability of the existing relationship between the entities;
the extraction module is used for determining the type of the relationship between the entities by combining a method for extracting the fact triple through dependence analysis and the labeled type of the relationship between the entities according to the obtained probability of the relationship between the entities;
the extraction module comprises:
the analysis unit is used for acquiring entity pairs with the relation probability between the entities larger than a preset threshold, analyzing sentences of the entity pairs with the relation probability between the entities larger than the preset threshold by using a dependency analysis method, and extracting fact triples with verbs as cores;
the construction unit is used for constructing a fact triple with a predicate verb as a core by analyzing the grammatical relation of the sentence;
the second determining unit is used for determining the type of the relationship between the entities according to the verb predicates between the entity pairs and the marked relationship type between the entities;
the training module comprises:
the training unit is used for carrying out named entity recognition model training by using a natural language processing tool according to the marked entity types to obtain a named entity recognition model suitable for the traditional Chinese medicine literature;
the replacing unit is used for integrating the obtained named entity recognition model suitable for the traditional Chinese medicine literature into a natural language processing tool, replacing the original named entity recognition model, packaging and compiling;
specifically, according to the labeled entity types, training a named entity recognition model by using a Deepdive of a Stanford natural language processing tool to obtain a named entity recognition model suitable for Chinese medicine documents, integrating the model into the Deepdive, replacing the original named entity recognition model in the Deepdive, packaging and compiling;
specifically, the data preparation by the identification module, the preparation of three parts of data, namely whether a relationship exists between two entities in a candidate entity pair, a feature table and a part of candidate entity pairs, specifically includes:
carrying out named entity recognition on the traditional Chinese medicine document to be processed by using the deepive integrated with the new named entity recognition model, and carrying out Cartesian product operation on the recognized entities to obtain candidate entity pairs;
the entity pair is formed by two entities, and the entity A and the entity B form the entity pair;
extracting text features of entities in the candidate entity pair to obtain a named entity recognition result of the context of the candidate entities to form a feature table;
marking 20% of candidate entity pairs, marking the candidate entity pairs with the relationship as true, and marking the candidate entity pairs without the relationship as false; and simultaneously appointing some rules to assist the annotation, wherein the rules comprise: if there is a relationship between A and B, then there is also a relationship between B and A.
CN201910293263.9A 2019-04-12 2019-04-12 Method and device for extracting relationships between entities in traditional Chinese medicine literature Active CN110032649B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910293263.9A CN110032649B (en) 2019-04-12 2019-04-12 Method and device for extracting relationships between entities in traditional Chinese medicine literature

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910293263.9A CN110032649B (en) 2019-04-12 2019-04-12 Method and device for extracting relationships between entities in traditional Chinese medicine literature

Publications (2)

Publication Number Publication Date
CN110032649A CN110032649A (en) 2019-07-19
CN110032649B true CN110032649B (en) 2021-10-01

Family

ID=67238140

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910293263.9A Active CN110032649B (en) 2019-04-12 2019-04-12 Method and device for extracting relationships between entities in traditional Chinese medicine literature

Country Status (1)

Country Link
CN (1) CN110032649B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110543571A (en) * 2019-08-07 2019-12-06 北京市天元网络技术股份有限公司 knowledge graph construction method and device for water conservancy informatization
CN112989032A (en) * 2019-12-17 2021-06-18 医渡云(北京)技术有限公司 Entity relationship classification method, apparatus, medium and electronic device
CN112329440B (en) * 2020-09-01 2023-07-25 浪潮云信息技术股份公司 Relation extraction method and device based on two-stage screening and classification
WO2021169354A1 (en) * 2020-09-04 2021-09-02 平安科技(深圳)有限公司 Method and system for extracting specific medical references and relationship thereof, and apparatus
CN112036151B (en) * 2020-09-09 2024-04-05 平安科技(深圳)有限公司 Gene disease relation knowledge base construction method, device and computer equipment
CN112599211B (en) * 2020-12-25 2023-03-21 中电云脑(天津)科技有限公司 Medical entity relationship extraction method and device
CN112766485B (en) * 2020-12-31 2023-10-24 平安科技(深圳)有限公司 Named entity model training method, device, equipment and medium
CN114139610B (en) * 2021-11-15 2024-04-26 中国中医科学院中医药信息研究所 Deep learning-based traditional Chinese medicine clinical literature data structuring method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169079A (en) * 2017-05-10 2017-09-15 浙江大学 A kind of field text knowledge abstracting method based on Deepdive
CN107247739A (en) * 2017-05-10 2017-10-13 浙江大学 A kind of financial publication text knowledge extracting method based on factor graph
CN108875051A (en) * 2018-06-28 2018-11-23 中译语通科技股份有限公司 Knowledge mapping method for auto constructing and system towards magnanimity non-structured text
CN109190113A (en) * 2018-08-10 2019-01-11 北京科技大学 A kind of knowledge mapping construction method of theory of traditional Chinese medical science ancient books and records
CN109241538A (en) * 2018-09-26 2019-01-18 上海德拓信息技术股份有限公司 Based on the interdependent Chinese entity relation extraction method of keyword and verb

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10740560B2 (en) * 2017-06-30 2020-08-11 Elsevier, Inc. Systems and methods for extracting funder information from text
CN108280062A (en) * 2018-01-19 2018-07-13 北京邮电大学 Entity based on deep learning and entity-relationship recognition method and device
CN108874878B (en) * 2018-05-03 2021-02-26 众安信息技术服务有限公司 Knowledge graph construction system and method
CN109062894A (en) * 2018-07-19 2018-12-21 南京源成语义软件科技有限公司 The automatic identification algorithm of Chinese natural language Entity Semantics relationship

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169079A (en) * 2017-05-10 2017-09-15 浙江大学 A kind of field text knowledge abstracting method based on Deepdive
CN107247739A (en) * 2017-05-10 2017-10-13 浙江大学 A kind of financial publication text knowledge extracting method based on factor graph
CN108875051A (en) * 2018-06-28 2018-11-23 中译语通科技股份有限公司 Knowledge mapping method for auto constructing and system towards magnanimity non-structured text
CN109190113A (en) * 2018-08-10 2019-01-11 北京科技大学 A kind of knowledge mapping construction method of theory of traditional Chinese medical science ancient books and records
CN109241538A (en) * 2018-09-26 2019-01-18 上海德拓信息技术股份有限公司 Based on the interdependent Chinese entity relation extraction method of keyword and verb

Also Published As

Publication number Publication date
CN110032649A (en) 2019-07-19

Similar Documents

Publication Publication Date Title
CN110032649B (en) Method and device for extracting relationships between entities in traditional Chinese medicine literature
US11449556B2 (en) Responding to user queries by context-based intelligent agents
JP6309644B2 (en) Method, system, and storage medium for realizing smart question answer
CN107798140B (en) Dialog system construction method, semantic controlled response method and device
US20220004714A1 (en) Event extraction method and apparatus, and storage medium
Alzahrani et al. Understanding plagiarism linguistic patterns, textual features, and detection methods
Cetto et al. Graphene: Semantically-linked propositions in open information extraction
Roy et al. Supervising unsupervised open information extraction models
CN106874643A (en) Build the method and system that knowledge base realizes assisting in diagnosis and treatment automatically based on term vector
CN105138864B (en) Protein interactive relation data base construction method based on Biomedical literature
CN106446018B (en) Query information processing method and device based on artificial intelligence
CN111768869B (en) Medical guide mapping construction search system and method for intelligent question-answering system
CN108121702A (en) Mathematics subjective item reads and appraises method and system
Wu et al. Community answer generation based on knowledge graph
Golik et al. Improving term extraction with linguistic analysis in the biomedical domain.
CN109522396B (en) Knowledge processing method and system for national defense science and technology field
CN112883286A (en) BERT-based method, equipment and medium for analyzing microblog emotion of new coronary pneumonia epidemic situation
WO2022127040A1 (en) Text processing method and apparatus, and device and storage medium
CN106528731A (en) Sensitive word filtering method and system
CN104317882A (en) Decision-based Chinese word segmentation and fusion method
Gupta et al. Plagiarism detection in text documents using sentence bounded stop word n-grams
Jin Application optimization of NLP system under deep learning technology in text semantics and text classification
Lei et al. Open domain question answering with character-level deep learning models
CN110162615B (en) Intelligent question and answer method and device, electronic equipment and storage medium
Shen et al. Dependency parse reranking with rich subtree features

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant