CN110032649A - Relation extraction method and device between a kind of entity of TCM Document - Google Patents

Relation extraction method and device between a kind of entity of TCM Document Download PDF

Info

Publication number
CN110032649A
CN110032649A CN201910293263.9A CN201910293263A CN110032649A CN 110032649 A CN110032649 A CN 110032649A CN 201910293263 A CN201910293263 A CN 201910293263A CN 110032649 A CN110032649 A CN 110032649A
Authority
CN
China
Prior art keywords
entity
relationship
type
named
marked
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910293263.9A
Other languages
Chinese (zh)
Other versions
CN110032649B (en
Inventor
张德政
付雅慧
谢永红
阿孜古丽
刘宏岚
栗辉
田款阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology Beijing USTB
Original Assignee
University of Science and Technology Beijing USTB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology Beijing USTB filed Critical University of Science and Technology Beijing USTB
Priority to CN201910293263.9A priority Critical patent/CN110032649B/en
Publication of CN110032649A publication Critical patent/CN110032649A/en
Application granted granted Critical
Publication of CN110032649B publication Critical patent/CN110032649B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The present invention provides Relation extraction method and device between the entity of TCM Document a kind of, can be improved the accuracy rate that relationship type between entity extracts.The described method includes: being directed to TCM Document to be processed, relationship type between the entity type and entity marked to its partial content is obtained;According to the entity type training Named Entity Extraction Model marked;Entity recognition is named to TCM Document to be processed using trained Named Entity Extraction Model, according to name Entity recognition as a result, obtain there are the candidate entity of relationship to and mark sheet;According to obtain there are the candidate entity of relationship to and mark sheet, the statistical inference of figure probability is carried out with factor graph model, global learning object relationship characteristic obtains the probability between entity there are relationship;The type of relationship between entity is determined in conjunction with relationship type between the entity that dependency analysis extracts the method for true triple and has marked according to the probability between obtained entity there are relationship.The present invention relates to knowledge engineering fields.

Description

Relation extraction method and device between a kind of entity of TCM Document
Technical field
The present invention relates to knowledge engineering fields, particularly relate to Relation extraction method and dress between a kind of entity of TCM Document It sets.
Background technique
China is handed down many literature of ancient book in terms of traditional Chinese medical science field, is the basic foundation of learning Chinese medicine.But these Document is largely write in a manner of ancient Chinese prose, is all more a little non-structured texts, uses the very consuming time.If can From extracted in TCM Document each entity and its between entity relationship, then the relationship that can use between extracted entity has Effect ground carries out information retrieval, knowledge excavation etc..
Entity relation extraction method in the prior art, it is difficult to accurately be extracted between entity from non-structured text Relationship.
Summary of the invention
The technical problem to be solved in the present invention is to provide Relation extraction method and device between a kind of entity of TCM Document, with It solves the problems, such as to be difficult to present in the prior art accurately from the relationship extracted in non-structured text between entity.
In order to solve the above technical problems, the embodiment of the present invention provides Relation extraction method between the entity of TCM Document a kind of, Include:
For TCM Document to be processed, relation object between the entity type and entity marked to its partial content is obtained Type;
According to the entity type training Named Entity Extraction Model marked;
Entity recognition is named to TCM Document to be processed using trained Named Entity Extraction Model, according to life Name Entity recognition as a result, obtain there are the candidate entity of relationship to and mark sheet;
According to obtain there are the candidate entity of relationship to and mark sheet, pushed away with the statistics that factor graph model carries out figure probability Reason, global learning object relationship characteristic obtain the probability between entity there are relationship;
According to the probability between obtained entity there are relationship, the method and of true triple is extracted in conjunction with dependency analysis Relationship type between the entity of mark determines the type of relationship between entity.
Further, the basis has marked entity type training Named Entity Extraction Model includes:
According to the entity type marked, it is named entity recognition model training using natural language processing tool, is obtained To the Named Entity Extraction Model for being suitable for TCM Document;
It will obtain being put into natural language processing tool suitable for the Named Entity Extraction Model of TCM Document is integrated, replace The Named Entity Extraction Model of its script is changed, and is packaged, compiles.
Further, described that reality is named to TCM Document to be processed using trained Named Entity Extraction Model Body identification, according to name Entity recognition as a result, obtain there are the candidate entity of relationship to and mark sheet include:
Entity recognition is named to TCM Document to be processed using trained Named Entity Extraction Model;
Cartesian product operation is done to the entity recognized and obtains candidate entity pair;
The extraction that text feature is carried out to the entity of candidate entity centering, obtains the name entity of the context of candidate entity Recognition result, constitutive characteristic table;
It whether there is relationship between determining candidate two entity of entity centering in part.
Further, there are the probability of relationship between the entity that the basis obtains, and extract the fact three in conjunction with dependency analysis Relationship type between the method for tuple and the entity marked determines that the type of relationship between entity includes:
The entity pair between entity there are relationship probability greater than preset threshold is obtained, using the method for dependency analysis to entity Between there are relationship probability be greater than preset threshold entity the sentence at place is analyzed, extract using verb as core the fact Triple;
By the grammatical relation of parsing sentence, the triple of the fact that using predicate verb as core is constructed;
According to the predicate verb between entity pair, in conjunction with relationship type between the entity marked, relationship between entity is determined Type.
The embodiment of the present invention also provides Relation extraction device between the entity of TCM Document a kind of, comprising:
Obtain module, for being directed to TCM Document to be processed, obtain the entity type that its partial content has been marked and Relationship type between entity;
Training module, for according to the entity type training Named Entity Extraction Model marked;
Identification module, for being named reality to TCM Document to be processed using trained Named Entity Extraction Model Body identification, according to name Entity recognition as a result, obtain there are the candidate entity of relationship to and mark sheet;
Determining module, for according to obtain there are the candidate entity of relationship to and mark sheet, carried out with factor graph model The statistical inference of figure probability, global learning object relationship characteristic, obtains the probability between entity there are relationship;
Abstraction module extracts the fact three in conjunction with dependency analysis for there are the probability of relationship according between obtained entity Relationship type between the method for tuple and the entity marked determines the type of relationship between entity.
Further, the training module includes:
Training unit, for being named entity using natural language processing tool and knowing according to the entity type marked Other model training obtains the Named Entity Extraction Model suitable for TCM Document;
Replacement unit is put into natural language for integrating the obtained Named Entity Extraction Model suitable for TCM Document In handling implement, the Named Entity Extraction Model of its script is replaced, and is packaged, compiles.
Further, the identification module includes:
Recognition unit, for being named reality to TCM Document to be processed using trained Named Entity Extraction Model Body identification;
Arithmetic element obtains candidate entity pair for doing cartesian product operation to the entity recognized;
Component units carry out the extraction of text feature for the entity to candidate entity centering, obtain the upper of candidate entity Name Entity recognition hereafter is as a result, constitutive characteristic table;
First determination unit, for whether there is relationship between candidate two entity of entity centering in determining part.
Further, the abstraction module includes:
Analytical unit utilizes interdependent point for obtaining between entity there are the entity pair that relationship probability is greater than preset threshold The method of analysis analyzes the sentence at place the entity for being greater than preset threshold there are relationship probability between entity, extracts with dynamic The fact that word is core triple;
Construction unit constructs the triple of the fact that using predicate verb as core for the grammatical relation by parsing sentence;
Second determination unit, for according to the predicate verb between entity pair, in conjunction with relationship type between the entity marked, Determine the type of relationship between entity.
The advantageous effects of the above technical solutions of the present invention are as follows:
In above scheme, for TCM Document to be processed, the entity type marked to its partial content and reality are obtained Relationship type between body;According to the entity type training Named Entity Extraction Model marked;Known using trained name entity Other model is named Entity recognition to TCM Document to be processed, according to name Entity recognition as a result, obtaining that there are relationships Candidate entity to and mark sheet;According to obtain there are the candidate entity of relationship to and mark sheet, with factor graph model carry out figure The statistical inference of probability, global learning object relationship characteristic, obtains the probability between entity there are relationship;According to obtained entity Between there are the probability of relationship, in conjunction with relationship type between the entity that dependency analysis extracts the method for true triple and has marked, Determine the type of relationship between entity.In this way, by there are the dependency analysis sides of the probability of relationship and natural language processing between entity Relationship type between the fact that method be combined with each other, and foundation extracts triple and the entity marked, determines relationship type between entity, from And the accuracy rate of relationship type extraction between entity is improved, and clearly can structurally state TCM Document content.
Detailed description of the invention
The flow diagram of Fig. 1 Relation extraction method between the entity of TCM Document provided in an embodiment of the present invention;
Fig. 2 is Entity recognition result schematic diagram provided in an embodiment of the present invention;
Fig. 3 is candidate entity provided in an embodiment of the present invention to result schematic diagram;
Fig. 4 is character representation provided in an embodiment of the present invention intention;
Fig. 5 is the label schematic diagram that whether there is relationship between candidate entity pair provided in an embodiment of the present invention;
There are the probability results schematic diagrames of relationship between entity provided in an embodiment of the present invention by Fig. 6;
Fig. 7 relational result schematic diagram between finally formed entity provided in an embodiment of the present invention;
The structural schematic diagram of Fig. 8 Relation extraction device between the entity of TCM Document provided in an embodiment of the present invention.
Specific embodiment
To keep the technical problem to be solved in the present invention, technical solution and advantage clearer, below in conjunction with attached drawing and tool Body embodiment is described in detail.
The present invention is difficult to accurately existing from aiming at the problem that extracting the relationship between entity in non-structured text, Relation extraction method and device between a kind of entity of TCM Document is provided.
Embodiment one
As shown in Figure 1, Relation extraction method between the entity of TCM Document provided in an embodiment of the present invention, comprising:
S101 is closed between the entity type and entity that have marked for TCM Document to be processed, acquisition to its partial content Set type;
S102, according to the entity type training Named Entity Extraction Model marked;
S103 is named Entity recognition to TCM Document to be processed using trained Named Entity Extraction Model, According to name Entity recognition as a result, obtain there are the candidate entity of relationship to and mark sheet;
S104, according to obtain there are the candidate entity of relationship to and mark sheet, with factor graph model carry out figure probability Statistical inference, global learning object relationship characteristic, obtains the probability between entity there are relationship;
S105 extracts the side of true triple in conjunction with dependency analysis according to the probability between obtained entity there are relationship Relationship type between method and the entity marked determines the type of relationship between entity.
Relation extraction method between the entity of TCM Document described in the embodiment of the present invention, for TCM Document to be processed, Relationship type between entity type and entity that acquisition has marked its partial content;According to the entity type training name marked Entity recognition model;Entity recognition is named to TCM Document to be processed using trained Named Entity Extraction Model, According to name Entity recognition as a result, obtain there are the candidate entity of relationship to and mark sheet;According to obtain there are the times of relationship Select entity to and mark sheet, the statistical inference of figure probability is carried out with factor graph model, global learning object relationship characteristic obtains reality There are the probability of relationship between body;According to the probability between obtained entity there are relationship, the fact three is extracted in conjunction with dependency analysis Relationship type between the method for tuple and the entity marked determines the type of relationship between entity.In this way, being closed existing between entity The probability of system and the dependency analysis method of natural language processing be combined with each other, according to the fact that extract triple and the reality marked Relationship type between body determines relationship type between entity, to improve the accuracy rate that relationship type extracts between entity, and can be clear Structurally state TCM Document content.
In the present embodiment, the extraction of relationship is also the knowledge mapping building of traditional Chinese medical science field and intelligent assisting in diagnosis and treatment system between entity System lays the foundation, and is an indispensable important link.
In the present embodiment, before S101, according to the particular content of TCM Document to be processed, it can first determine that it is main Chinese medicine entity type and entity between relationship type, and relationship type between entity type and entity is carried out to wherein 20% content Mark.
In the foregoing between the entity of doctor's document in the specific embodiment of Relation extraction method, further, the basis Marked entity type training Named Entity Extraction Model include:
According to the entity type marked, it is named entity recognition model training using natural language processing tool, is obtained To the Named Entity Extraction Model for being suitable for TCM Document;
It will obtain being put into natural language processing tool suitable for the Named Entity Extraction Model of TCM Document is integrated, replace The Named Entity Extraction Model of its script is changed, and is packaged, compiles.
In the present embodiment, according to the entity type marked, Stamford natural language processing tool can be used (deepdive) it is named entity recognition model training, the Named Entity Extraction Model suitable for TCM Document is obtained, by this Model integrated is put into deepdive, replaces the Named Entity Extraction Model of script in deepdive, and is packaged, is compiled.
In the present embodiment, deepdive is a kind of information extraction framing tools of Stamford natural language processing, main to use In the information extraction of modern text, people, tissue, the relationship between place are extracted.
In the foregoing between the entity of doctor's document in the specific embodiment of Relation extraction method, further, the utilization Trained Named Entity Extraction Model is named Entity recognition to TCM Document to be processed, according to name Entity recognition knot Fruit, obtain there are the candidate entity of relationship to and mark sheet include:
Entity recognition is named to TCM Document to be processed using trained Named Entity Extraction Model;
Cartesian product operation is done to the entity recognized and obtains candidate entity pair;
The extraction that text feature is carried out to the entity of candidate entity centering, obtains the name entity of the context of candidate entity Recognition result, constitutive characteristic table;
It whether there is relationship between determining candidate two entity of entity centering in part.
In the present embodiment, S103 is substantially carried out data preparation, prepares candidate entity to, the candidate entity pair of mark sheet and part In between two entities with the presence or absence of being related to this three parts data, it is specific:
S1031, using the above-mentioned deepdive for being integrated with new Named Entity Extraction Model to TCM Document to be processed It is named Entity recognition, cartesian product operation is done to the entity recognized and obtains candidate entity pair;
In the present embodiment, entity partners to exactly two entities, for example, entity A and entity B constitute entity to (A, B)。
S1032 carries out the extraction of text feature to the entity of candidate entity centering, obtains the life of the context of candidate entity Name Entity recognition is as a result, constitutive characteristic table;
S1033, to the candidate entity in part (for example, 20%) to being marked, there are the candidate entity of relationship to label for True is designated as false there is no relationship.It can specify some rules simultaneously, to assist marking, such as have relationship between A and B, Also there is relationship between so B and A, these rules can reduce the workload manually marked.The data of label are as probabilistic model The priori knowledge of habit.So far, required data preparation is completed, and probabilistic model building of these data for after provides basis.
In the present embodiment, the probability between entity there are relationship is learnt using factor graph model, to construct probability Model;It is specific: according to obtain there are the candidate entity of relationship to and mark sheet, the system of figure probability is carried out with factor graph model Reasoning is counted, global learning object relationship characteristic forms the probabilistic model of relationship between entity, and the probabilistic model is real for determining There are the probability of relationship between body.
In the present embodiment, factor graph is that an overall situation function Factorization with multivariable is obtained several local letters Several products, the two-dimensional plot obtained based on this are called factor graph.
In the foregoing between the entity of doctor's document in the specific embodiment of Relation extraction method, further, the basis There are the probability of relationship between obtained entity, in conjunction between the entity that dependency analysis extracts the method for true triple and has marked Relationship type determines that the type of relationship between entity includes:
The entity pair between entity there are relationship probability greater than preset threshold is obtained, using the method for dependency analysis to entity Between there are relationship probability be greater than preset threshold entity the sentence at place is analyzed, extract using verb as core the fact Triple;
By the grammatical relation of parsing sentence, the triple of the fact that using predicate verb as core is constructed;
According to the predicate verb between entity pair, in conjunction with relationship type between the entity marked, relationship between entity is determined Type.
In the present embodiment, obtaining extracting true ternary in conjunction with dependency analysis there are after the probability of relationship between entity Relationship type determines the type of relationship between entity between the method for group and the entity marked, can specifically include following steps: right In there are the entity pair that the probability of relationship is higher than preset threshold (for example, 0.8), these entities are analyzed according to the method for dependency analysis To the sentence at place, the triple of the fact that using verb as core is extracted;By the Subject, Predicate and Object of parsing sentence or contain guest's Jie relationship Subject-predicate it is dynamic some grammatical relations such as mend, construct the triple of the fact that using predicate verb as core;According to the meaning between entity pair Language verb determines the type of relationship between entity in conjunction with relationship type between the entity marked in S101, as between final entity The result of relationship.
In the present embodiment, sentence is disassembled as triple using the method for dependency analysis, that is, entity and its between Relationship states a sentence, and the meaning of sentence can not only obtain structuring expression, also establish for future building knowledge mapping Basis.
To sum up, Stamford natural language processing tool is revised as the information extraction side suitable for TCM Document by the present embodiment Method simultaneously combines it with dependency analysis, proposes a kind of abstracting method of relationship between the entity for TCM Document, can be to non-knot The TCM Document of structure is analyzed, and realizes structuring to TCM Document, and improve relationship type between entity extract it is accurate Rate.
Relation extraction method between the entity of TCM Document described in embodiment for a better understanding of the present invention, with " Chinese medicine Interpretation of the cause, onset and process of an illness disputatious science " for, Relation extraction method is described in detail the entity of the TCM Document between described in the embodiment of the present invention, It can specifically include following steps:
First, to the partial content of " pathogenesis disputatious science ", for example, 20% content carries out between entity type and entity Relationship type mark, and obtain relationship type between the entity type and entity marked.
In the present embodiment, the entity type includes: the cause of disease (by), sick position (bw) and performance (bx);Wherein, the cause of disease includes The entities such as wind, cold, pry- and yin;Sick position includes the entities such as lung, network, stomach, spleen, enteron aisle and small intestine;Performance comprising Lung Qi obstraction, Lung qi is unclear, lung loses the entities such as clear and rich and accumulation phlegm-heat in the interior.
In the present embodiment, it can classify between relationship entity in patient's condition differentiation, be divided into six classes, respectively combine (between the cause of disease) relationship, infringement (cause of disease is to sick position) relationship, by infringement relationship, variation (sick position, the cause of disease) relationship, occur relationship and Causality;Wherein,
In conjunction with (between the cause of disease) relationship mainly have be harmonious, and, and press from both sides, press from both sides, meet, fight knot etc. verbs dominate;
(cause of disease is to the sick position) relationship of infringement mainly by infringement, invasion, criminal, consume, diffuse, burn, decoct, enter, hurt, in, disturb, rush It is leading the verbs such as to hit, block, flowing, damaging;
By infringement relationship mainly by by, by etc. verbs dominate;
Variation (sick position) relationship mainly by it is strongly fragrant, lose, the resistance of stagnant, solidifying, clear, inverse, numbness, become silted up, it is inverse disorderly, the verbs such as move, close and dominate;Become Change (cause of disease) relationship mainly by absurd row, it is flourishing, stop up Sheng, condensation, Sheng, pent-up, the verbs such as rise and dominate;
Appearance relationship mainly the verbs such as is given birth to by becoming, gives birth to, changes, shows, formed, sees, transfers the possession of, accumulates, makes and dominated;
Causality mainly by causing, then, at, be, have, cause, even, the verbs such as occur and dominate.
Second, Named Entity Extraction Model is trained according to the entity type marked.
Third identified " pathogenesis disputatious science " using the obtained new Named Entity Extraction Model of training, such as: it can To identify the entities such as the heart, lung, stomach as sick position, the entities such as wind, cold are the cause of disease, and the entities such as resolving sputum are performance, the part knot of identification Fruit is as shown in Figure 2;Cartesian product operation is done to the entity recognized, obtains candidate entity pair, such as: saliva can be obtained, phlegm is constituted Candidate entity pair, partial results are as shown in Figure 3;According to candidate entity pair as a result, its text feature is extracted, for example, former sentence is If the strongly fragrant lung of chill is not understood, recognizing chill is the cause of disease, if its one word in left and right in original text is and strongly fragrant, their name entity knowledge Other result be o and o, constitutive characteristic table, as shown in Figure 4, wherein o presentation-entity type be other;And determine the candidate entity in part It whether there is relationship between two entity of centering, for example, can determine 20% candidate two entity of entity centering according to default rule Between whether there is relationship, it is assumed that true indicates that there are relationship, false indicates that relationship is not present;Wherein, the default rule It can be, such as have relationship between A and B, then also there is relationship between B and A, relationship part result is as shown in Figure 5.
4th, according to obtain there are the candidate entity of relationship to and mark sheet, with factor graph model carry out figure probability Statistical inference, global learning object relationship characteristic form the probabilistic model of relationship between entity, the probabilistic model, for determining There are the probability of relationship between entity, as a result as shown in Figure 6;
5th, it obtains between entity there are the higher entity of relationship probability, extracts true triple in conjunction with dependency analysis Method, and according to relationship type between the entity marked in the first step, determine the physical relationship between entity;For example, obtaining " wind criminal Lung position " the words is infringement relationship of the cause of disease to sick position, and partial results are as shown in Figure 7.
Embodiment two
The present invention also provides the specific embodiments of Relation extraction device between a kind of entity of TCM Document, due to the present invention Between the entity of the TCM Document of offer between Relation extraction device and the entity of aforementioned TCM Document Relation extraction method specific reality Apply that mode is corresponding, Relation extraction device can be by executing in above method specific embodiment between the entity of the TCM Document Process step achieve the object of the present invention, therefore Relation extraction method specific embodiment between the entity of above-mentioned TCM Document In explanation, be also applied for the specific embodiment of Relation extraction device between the entity of TCM Document provided by the invention, It will not be described in great detail in present invention specific embodiment below.
As shown in figure 8, the embodiment of the present invention also provides Relation extraction device between the entity of TCM Document a kind of, comprising:
Module 11 is obtained, for being directed to TCM Document to be processed, obtains the entity type marked to its partial content The relationship type between entity;
Training module 12, for according to the entity type training Named Entity Extraction Model marked;
Identification module 13, for being named using trained Named Entity Extraction Model to TCM Document to be processed Entity recognition, according to name Entity recognition as a result, obtain there are the candidate entity of relationship to and mark sheet;
Determining module 14, for according to obtain there are the candidate entity of relationship to and mark sheet, with factor graph model into The statistical inference of row figure probability, global learning object relationship characteristic, obtains the probability between entity there are relationship;
Abstraction module 15 is extracted true in conjunction with dependency analysis for according between obtained entity, there are the probability of relationship Relationship type between the method for triple and the entity marked determines the type of relationship between entity.
Relation extraction device between the entity of TCM Document described in the embodiment of the present invention, for TCM Document to be processed, Relationship type between entity type and entity that acquisition has marked its partial content;According to the entity type training name marked Entity recognition model;Entity recognition is named to TCM Document to be processed using trained Named Entity Extraction Model, According to name Entity recognition as a result, obtain there are the candidate entity of relationship to and mark sheet;According to obtain there are the times of relationship Select entity to and mark sheet, the statistical inference of figure probability is carried out with factor graph model, global learning object relationship characteristic obtains reality There are the probability of relationship between body;According to the probability between obtained entity there are relationship, the fact three is extracted in conjunction with dependency analysis Relationship type between the method for tuple and the entity marked determines the type of relationship between entity.In this way, being closed existing between entity The probability of system and the dependency analysis method of natural language processing be combined with each other, according to the fact that extract triple and the reality marked Relationship type between body determines relationship type between entity, to improve the accuracy rate that relationship type extracts between entity, and can be clear Structurally state TCM Document content.
In the foregoing between the entity of doctor's document in the specific embodiment of Relation extraction device, further, the training Module includes:
Training unit, for being named entity using natural language processing tool and knowing according to the entity type marked Other model training obtains the Named Entity Extraction Model suitable for TCM Document;
Replacement unit is put into natural language for integrating the obtained Named Entity Extraction Model suitable for TCM Document In handling implement, the Named Entity Extraction Model of its script is replaced, and is packaged, compiles.
In the foregoing between the entity of doctor's document in the specific embodiment of Relation extraction device, further, the identification Module includes:
Recognition unit, for being named reality to TCM Document to be processed using trained Named Entity Extraction Model Body identification;
Arithmetic element obtains candidate entity pair for doing cartesian product operation to the entity recognized;
Component units carry out the extraction of text feature for the entity to candidate entity centering, obtain the upper of candidate entity Name Entity recognition hereafter is as a result, constitutive characteristic table;
First determination unit, for whether there is relationship between candidate two entity of entity centering in determining part.
In the foregoing between the entity of doctor's document in the specific embodiment of Relation extraction device, further, the extraction Module includes:
Analytical unit utilizes interdependent point for obtaining between entity there are the entity pair that relationship probability is greater than preset threshold The method of analysis analyzes the sentence at place the entity for being greater than preset threshold there are relationship probability between entity, extracts with dynamic The fact that word is core triple;
Construction unit constructs the triple of the fact that using predicate verb as core for the grammatical relation by parsing sentence;
Second determination unit, for according to the predicate verb between entity pair, in conjunction with relationship type between the entity marked, Determine the type of relationship between entity.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.
The above is a preferred embodiment of the present invention, it is noted that for those skilled in the art For, without departing from the principles of the present invention, several improvements and modifications can also be made, these improvements and modifications It should be regarded as protection scope of the present invention.

Claims (8)

1. a kind of Relation extraction method between entity of TCM Document characterized by comprising
For TCM Document to be processed, relationship type between the entity type and entity marked to its partial content is obtained;
According to the entity type training Named Entity Extraction Model marked;
Entity recognition is named to TCM Document to be processed using trained Named Entity Extraction Model, it is real according to name Body recognition result, obtain there are the candidate entity of relationship to and mark sheet;
According to obtain there are the candidate entity of relationship to and mark sheet, the statistical inference of figure probability is carried out with factor graph model, Global learning object relationship characteristic, obtains the probability between entity there are relationship;
According to the probability between obtained entity there are relationship, the method for true triple is extracted in conjunction with dependency analysis and has been marked Entity between relationship type, determine the type of relationship between entity.
2. Relation extraction method between the entity of TCM Document according to claim 1, which is characterized in that the basis has been marked Note entity type training Named Entity Extraction Model include:
According to the entity type marked, it is named entity recognition model training using natural language processing tool, is fitted Named Entity Extraction Model for TCM Document;
It will obtain being put into natural language processing tool suitable for the Named Entity Extraction Model of TCM Document is integrated, replace The Named Entity Extraction Model of its script, and be packaged, compile.
3. Relation extraction method between the entity of TCM Document according to claim 1, which is characterized in that described to utilize training Good Named Entity Extraction Model is named Entity recognition to TCM Document to be processed, according to name Entity recognition as a result, Obtain there are the candidate entity of relationship to and mark sheet include:
Entity recognition is named to TCM Document to be processed using trained Named Entity Extraction Model;
Cartesian product operation is done to the entity recognized and obtains candidate entity pair;
The extraction that text feature is carried out to the entity of candidate entity centering, obtains the name Entity recognition of the context of candidate entity As a result, constitutive characteristic table;
It whether there is relationship between determining candidate two entity of entity centering in part.
4. Relation extraction method between the entity of TCM Document according to claim 1, which is characterized in that the basis obtains Entity between there are the probability of relationship, in conjunction with relationship between the entity that dependency analysis extracts the method for true triple and has marked Type determines that the type of relationship between entity includes:
The entity pair between entity there are relationship probability greater than preset threshold is obtained, using the method for dependency analysis between entity The entity for being greater than preset threshold there are relationship probability analyzes the sentence at place, extracts the ternary of the fact that using verb as core Group;
By the grammatical relation of parsing sentence, the triple of the fact that using predicate verb as core is constructed;
The type of relationship between entity is determined in conjunction with relationship type between the entity marked according to the predicate verb between entity pair.
5. Relation extraction device between a kind of entity of TCM Document characterized by comprising
Module is obtained, for being directed to TCM Document to be processed, obtains the entity type and entity marked to its partial content Between relationship type;
Training module, for according to the entity type training Named Entity Extraction Model marked;
Identification module is known for being named entity to TCM Document to be processed using trained Named Entity Extraction Model Not, according to name Entity recognition as a result, obtain there are the candidate entity of relationship to and mark sheet;
Determining module, for according to obtain there are the candidate entity of relationship to and mark sheet, carry out figure with factor graph model general The statistical inference of rate, global learning object relationship characteristic, obtains the probability between entity there are relationship;
Abstraction module extracts true triple in conjunction with dependency analysis for there are the probability of relationship according between obtained entity Method and the entity that has marked between relationship type, determine the type of relationship between entity.
6. Relation extraction device between the entity of TCM Document according to claim 5, which is characterized in that the training module Include:
Training unit, for being named Entity recognition mould using natural language processing tool according to the entity type marked Type training obtains the Named Entity Extraction Model suitable for TCM Document;
Replacement unit is put into natural language processing for integrating the obtained Named Entity Extraction Model suitable for TCM Document In tool, the Named Entity Extraction Model of its script is replaced, and is packaged, compiles.
7. Relation extraction device between the entity of TCM Document according to claim 5, which is characterized in that the identification module Include:
Recognition unit is known for being named entity to TCM Document to be processed using trained Named Entity Extraction Model Not;
Arithmetic element obtains candidate entity pair for doing cartesian product operation to the entity recognized;
Component units carry out the extraction of text feature for the entity to candidate entity centering, obtain the context of candidate entity Name Entity recognition as a result, constitutive characteristic table;
First determination unit, for whether there is relationship between candidate two entity of entity centering in determining part.
8. Relation extraction device between the entity of TCM Document according to claim 5, which is characterized in that the abstraction module Include:
Analytical unit utilizes dependency analysis for obtaining between entity there are the entity pair that relationship probability is greater than preset threshold Method analyzes the sentence at place the entity for being greater than preset threshold there are relationship probability between entity, and extraction is with verb The fact that core triple;
Construction unit constructs the triple of the fact that using predicate verb as core for the grammatical relation by parsing sentence;
Second determination unit, for being determined according to the predicate verb between entity pair in conjunction with relationship type between the entity marked The type of relationship between entity.
CN201910293263.9A 2019-04-12 2019-04-12 Method and device for extracting relationships between entities in traditional Chinese medicine literature Active CN110032649B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910293263.9A CN110032649B (en) 2019-04-12 2019-04-12 Method and device for extracting relationships between entities in traditional Chinese medicine literature

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910293263.9A CN110032649B (en) 2019-04-12 2019-04-12 Method and device for extracting relationships between entities in traditional Chinese medicine literature

Publications (2)

Publication Number Publication Date
CN110032649A true CN110032649A (en) 2019-07-19
CN110032649B CN110032649B (en) 2021-10-01

Family

ID=67238140

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910293263.9A Active CN110032649B (en) 2019-04-12 2019-04-12 Method and device for extracting relationships between entities in traditional Chinese medicine literature

Country Status (1)

Country Link
CN (1) CN110032649B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110543571A (en) * 2019-08-07 2019-12-06 北京市天元网络技术股份有限公司 knowledge graph construction method and device for water conservancy informatization
CN112036151A (en) * 2020-09-09 2020-12-04 平安科技(深圳)有限公司 Method and device for constructing gene disease relation knowledge base and computer equipment
CN112036171A (en) * 2020-09-04 2020-12-04 平安科技(深圳)有限公司 Method, system and device for extracting specific medical names and relationships thereof
CN112329440A (en) * 2020-09-01 2021-02-05 浪潮云信息技术股份公司 Relation extraction method and device based on two-stage screening and classification
CN112599211A (en) * 2020-12-25 2021-04-02 中电云脑(天津)科技有限公司 Medical entity relationship extraction method and device
CN112766485A (en) * 2020-12-31 2021-05-07 平安科技(深圳)有限公司 Training method, device, equipment and medium for named entity model
CN112989032A (en) * 2019-12-17 2021-06-18 医渡云(北京)技术有限公司 Entity relationship classification method, apparatus, medium and electronic device
CN114139610A (en) * 2021-11-15 2022-03-04 中国中医科学院中医药信息研究所 Traditional Chinese medicine clinical literature data structuring method and device based on deep learning
CN112036171B (en) * 2020-09-04 2024-06-25 平安科技(深圳)有限公司 Extraction method, system and device for medical specific references and relation thereof

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169079A (en) * 2017-05-10 2017-09-15 浙江大学 A kind of field text knowledge abstracting method based on Deepdive
CN107247739A (en) * 2017-05-10 2017-10-13 浙江大学 A kind of financial publication text knowledge extracting method based on factor graph
CN108280062A (en) * 2018-01-19 2018-07-13 北京邮电大学 Entity based on deep learning and entity-relationship recognition method and device
CN108875051A (en) * 2018-06-28 2018-11-23 中译语通科技股份有限公司 Knowledge mapping method for auto constructing and system towards magnanimity non-structured text
CN108874878A (en) * 2018-05-03 2018-11-23 众安信息技术服务有限公司 A kind of building system and method for knowledge mapping
CN109062894A (en) * 2018-07-19 2018-12-21 南京源成语义软件科技有限公司 The automatic identification algorithm of Chinese natural language Entity Semantics relationship
US20190005020A1 (en) * 2017-06-30 2019-01-03 Elsevier, Inc. Systems and methods for extracting funder information from text
CN109190113A (en) * 2018-08-10 2019-01-11 北京科技大学 A kind of knowledge mapping construction method of theory of traditional Chinese medical science ancient books and records
CN109241538A (en) * 2018-09-26 2019-01-18 上海德拓信息技术股份有限公司 Based on the interdependent Chinese entity relation extraction method of keyword and verb

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169079A (en) * 2017-05-10 2017-09-15 浙江大学 A kind of field text knowledge abstracting method based on Deepdive
CN107247739A (en) * 2017-05-10 2017-10-13 浙江大学 A kind of financial publication text knowledge extracting method based on factor graph
US20190005020A1 (en) * 2017-06-30 2019-01-03 Elsevier, Inc. Systems and methods for extracting funder information from text
CN108280062A (en) * 2018-01-19 2018-07-13 北京邮电大学 Entity based on deep learning and entity-relationship recognition method and device
CN108874878A (en) * 2018-05-03 2018-11-23 众安信息技术服务有限公司 A kind of building system and method for knowledge mapping
CN108875051A (en) * 2018-06-28 2018-11-23 中译语通科技股份有限公司 Knowledge mapping method for auto constructing and system towards magnanimity non-structured text
CN109062894A (en) * 2018-07-19 2018-12-21 南京源成语义软件科技有限公司 The automatic identification algorithm of Chinese natural language Entity Semantics relationship
CN109190113A (en) * 2018-08-10 2019-01-11 北京科技大学 A kind of knowledge mapping construction method of theory of traditional Chinese medical science ancient books and records
CN109241538A (en) * 2018-09-26 2019-01-18 上海德拓信息技术股份有限公司 Based on the interdependent Chinese entity relation extraction method of keyword and verb

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HUAIYU WAN等: ""Extracting relations from traditional Chinese medicine literature via heterogeneous entity networks"", 《JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION》 *
朱玲等: ""基于关键动词的中医古籍概念实体间予以关系发现研究"", 《中国数字医学》 *
林伟贇: ""基于海量网页的同类命名实体共现统计规律的研究"", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110543571A (en) * 2019-08-07 2019-12-06 北京市天元网络技术股份有限公司 knowledge graph construction method and device for water conservancy informatization
CN112989032A (en) * 2019-12-17 2021-06-18 医渡云(北京)技术有限公司 Entity relationship classification method, apparatus, medium and electronic device
CN112329440A (en) * 2020-09-01 2021-02-05 浪潮云信息技术股份公司 Relation extraction method and device based on two-stage screening and classification
CN112036171A (en) * 2020-09-04 2020-12-04 平安科技(深圳)有限公司 Method, system and device for extracting specific medical names and relationships thereof
WO2021169354A1 (en) * 2020-09-04 2021-09-02 平安科技(深圳)有限公司 Method and system for extracting specific medical references and relationship thereof, and apparatus
CN112036171B (en) * 2020-09-04 2024-06-25 平安科技(深圳)有限公司 Extraction method, system and device for medical specific references and relation thereof
CN112036151A (en) * 2020-09-09 2020-12-04 平安科技(深圳)有限公司 Method and device for constructing gene disease relation knowledge base and computer equipment
CN112036151B (en) * 2020-09-09 2024-04-05 平安科技(深圳)有限公司 Gene disease relation knowledge base construction method, device and computer equipment
CN112599211A (en) * 2020-12-25 2021-04-02 中电云脑(天津)科技有限公司 Medical entity relationship extraction method and device
CN112599211B (en) * 2020-12-25 2023-03-21 中电云脑(天津)科技有限公司 Medical entity relationship extraction method and device
CN112766485A (en) * 2020-12-31 2021-05-07 平安科技(深圳)有限公司 Training method, device, equipment and medium for named entity model
WO2022142123A1 (en) * 2020-12-31 2022-07-07 平安科技(深圳)有限公司 Training method and apparatus for named entity model, device, and medium
CN112766485B (en) * 2020-12-31 2023-10-24 平安科技(深圳)有限公司 Named entity model training method, device, equipment and medium
CN114139610A (en) * 2021-11-15 2022-03-04 中国中医科学院中医药信息研究所 Traditional Chinese medicine clinical literature data structuring method and device based on deep learning
CN114139610B (en) * 2021-11-15 2024-04-26 中国中医科学院中医药信息研究所 Deep learning-based traditional Chinese medicine clinical literature data structuring method and device

Also Published As

Publication number Publication date
CN110032649B (en) 2021-10-01

Similar Documents

Publication Publication Date Title
CN110032649A (en) Relation extraction method and device between a kind of entity of TCM Document
CN109460473B (en) Electronic medical record multi-label classification method based on symptom extraction and feature representation
WO2020119075A1 (en) General text information extraction method and apparatus, computer device and storage medium
CN110442869B (en) Medical text processing method and device, equipment and storage medium thereof
CN110851599B (en) Automatic scoring method for Chinese composition and teaching assistance system
CN106599032B (en) Text event extraction method combining sparse coding and structure sensing machine
CN108984526A (en) A kind of document subject matter vector abstracting method based on deep learning
CN110297908A (en) Diagnosis and treatment program prediction method and device
CN106844741A (en) A kind of answer method towards specific area
CN106897559B (en) A kind of symptom and sign class entity recognition method and device towards multi-data source
CN106844351B (en) Medical institution organization entity identification method and device oriented to multiple data sources
CN106874643A (en) Build the method and system that knowledge base realizes assisting in diagnosis and treatment automatically based on term vector
CN109472026A (en) Accurate emotion information extracting methods a kind of while for multiple name entities
CN108628970A (en) A kind of biomedical event joint abstracting method based on new marking mode
CN110362819B (en) Text emotion analysis method based on convolutional neural network
CN107145514B (en) Chinese sentence pattern classification method based on decision tree and SVM mixed model
WO2023029502A1 (en) Method and apparatus for constructing user portrait on the basis of inquiry session, device, and medium
CN105138864B (en) Protein interactive relation data base construction method based on Biomedical literature
CN107247739B (en) A kind of financial bulletin text knowledge extracting method based on factor graph
CN110188193A (en) A kind of electronic health record entity relation extraction method based on most short interdependent subtree
CN107180026B (en) Event phrase learning method and device based on word embedding semantic mapping
CN107679110A (en) The method and device of knowledge mapping is improved with reference to text classification and picture attribute extraction
CN104298714B (en) A kind of mass text automatic marking method based on abnormality processing
CN110069636B (en) Event time sequence relation identification method fusing dependency relationship and discourse and retrieval relationship
CN110119510A (en) A kind of Relation extraction method and device based on transmitting dependence and structural auxiliary word

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant