CN110032649A - Relation extraction method and device between a kind of entity of TCM Document - Google Patents
Relation extraction method and device between a kind of entity of TCM Document Download PDFInfo
- Publication number
- CN110032649A CN110032649A CN201910293263.9A CN201910293263A CN110032649A CN 110032649 A CN110032649 A CN 110032649A CN 201910293263 A CN201910293263 A CN 201910293263A CN 110032649 A CN110032649 A CN 110032649A
- Authority
- CN
- China
- Prior art keywords
- entity
- relationship
- type
- named
- marked
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Machine Translation (AREA)
Abstract
The present invention provides Relation extraction method and device between the entity of TCM Document a kind of, can be improved the accuracy rate that relationship type between entity extracts.The described method includes: being directed to TCM Document to be processed, relationship type between the entity type and entity marked to its partial content is obtained;According to the entity type training Named Entity Extraction Model marked;Entity recognition is named to TCM Document to be processed using trained Named Entity Extraction Model, according to name Entity recognition as a result, obtain there are the candidate entity of relationship to and mark sheet;According to obtain there are the candidate entity of relationship to and mark sheet, the statistical inference of figure probability is carried out with factor graph model, global learning object relationship characteristic obtains the probability between entity there are relationship;The type of relationship between entity is determined in conjunction with relationship type between the entity that dependency analysis extracts the method for true triple and has marked according to the probability between obtained entity there are relationship.The present invention relates to knowledge engineering fields.
Description
Technical field
The present invention relates to knowledge engineering fields, particularly relate to Relation extraction method and dress between a kind of entity of TCM Document
It sets.
Background technique
China is handed down many literature of ancient book in terms of traditional Chinese medical science field, is the basic foundation of learning Chinese medicine.But these
Document is largely write in a manner of ancient Chinese prose, is all more a little non-structured texts, uses the very consuming time.If can
From extracted in TCM Document each entity and its between entity relationship, then the relationship that can use between extracted entity has
Effect ground carries out information retrieval, knowledge excavation etc..
Entity relation extraction method in the prior art, it is difficult to accurately be extracted between entity from non-structured text
Relationship.
Summary of the invention
The technical problem to be solved in the present invention is to provide Relation extraction method and device between a kind of entity of TCM Document, with
It solves the problems, such as to be difficult to present in the prior art accurately from the relationship extracted in non-structured text between entity.
In order to solve the above technical problems, the embodiment of the present invention provides Relation extraction method between the entity of TCM Document a kind of,
Include:
For TCM Document to be processed, relation object between the entity type and entity marked to its partial content is obtained
Type;
According to the entity type training Named Entity Extraction Model marked;
Entity recognition is named to TCM Document to be processed using trained Named Entity Extraction Model, according to life
Name Entity recognition as a result, obtain there are the candidate entity of relationship to and mark sheet;
According to obtain there are the candidate entity of relationship to and mark sheet, pushed away with the statistics that factor graph model carries out figure probability
Reason, global learning object relationship characteristic obtain the probability between entity there are relationship;
According to the probability between obtained entity there are relationship, the method and of true triple is extracted in conjunction with dependency analysis
Relationship type between the entity of mark determines the type of relationship between entity.
Further, the basis has marked entity type training Named Entity Extraction Model includes:
According to the entity type marked, it is named entity recognition model training using natural language processing tool, is obtained
To the Named Entity Extraction Model for being suitable for TCM Document;
It will obtain being put into natural language processing tool suitable for the Named Entity Extraction Model of TCM Document is integrated, replace
The Named Entity Extraction Model of its script is changed, and is packaged, compiles.
Further, described that reality is named to TCM Document to be processed using trained Named Entity Extraction Model
Body identification, according to name Entity recognition as a result, obtain there are the candidate entity of relationship to and mark sheet include:
Entity recognition is named to TCM Document to be processed using trained Named Entity Extraction Model;
Cartesian product operation is done to the entity recognized and obtains candidate entity pair;
The extraction that text feature is carried out to the entity of candidate entity centering, obtains the name entity of the context of candidate entity
Recognition result, constitutive characteristic table;
It whether there is relationship between determining candidate two entity of entity centering in part.
Further, there are the probability of relationship between the entity that the basis obtains, and extract the fact three in conjunction with dependency analysis
Relationship type between the method for tuple and the entity marked determines that the type of relationship between entity includes:
The entity pair between entity there are relationship probability greater than preset threshold is obtained, using the method for dependency analysis to entity
Between there are relationship probability be greater than preset threshold entity the sentence at place is analyzed, extract using verb as core the fact
Triple;
By the grammatical relation of parsing sentence, the triple of the fact that using predicate verb as core is constructed;
According to the predicate verb between entity pair, in conjunction with relationship type between the entity marked, relationship between entity is determined
Type.
The embodiment of the present invention also provides Relation extraction device between the entity of TCM Document a kind of, comprising:
Obtain module, for being directed to TCM Document to be processed, obtain the entity type that its partial content has been marked and
Relationship type between entity;
Training module, for according to the entity type training Named Entity Extraction Model marked;
Identification module, for being named reality to TCM Document to be processed using trained Named Entity Extraction Model
Body identification, according to name Entity recognition as a result, obtain there are the candidate entity of relationship to and mark sheet;
Determining module, for according to obtain there are the candidate entity of relationship to and mark sheet, carried out with factor graph model
The statistical inference of figure probability, global learning object relationship characteristic, obtains the probability between entity there are relationship;
Abstraction module extracts the fact three in conjunction with dependency analysis for there are the probability of relationship according between obtained entity
Relationship type between the method for tuple and the entity marked determines the type of relationship between entity.
Further, the training module includes:
Training unit, for being named entity using natural language processing tool and knowing according to the entity type marked
Other model training obtains the Named Entity Extraction Model suitable for TCM Document;
Replacement unit is put into natural language for integrating the obtained Named Entity Extraction Model suitable for TCM Document
In handling implement, the Named Entity Extraction Model of its script is replaced, and is packaged, compiles.
Further, the identification module includes:
Recognition unit, for being named reality to TCM Document to be processed using trained Named Entity Extraction Model
Body identification;
Arithmetic element obtains candidate entity pair for doing cartesian product operation to the entity recognized;
Component units carry out the extraction of text feature for the entity to candidate entity centering, obtain the upper of candidate entity
Name Entity recognition hereafter is as a result, constitutive characteristic table;
First determination unit, for whether there is relationship between candidate two entity of entity centering in determining part.
Further, the abstraction module includes:
Analytical unit utilizes interdependent point for obtaining between entity there are the entity pair that relationship probability is greater than preset threshold
The method of analysis analyzes the sentence at place the entity for being greater than preset threshold there are relationship probability between entity, extracts with dynamic
The fact that word is core triple;
Construction unit constructs the triple of the fact that using predicate verb as core for the grammatical relation by parsing sentence;
Second determination unit, for according to the predicate verb between entity pair, in conjunction with relationship type between the entity marked,
Determine the type of relationship between entity.
The advantageous effects of the above technical solutions of the present invention are as follows:
In above scheme, for TCM Document to be processed, the entity type marked to its partial content and reality are obtained
Relationship type between body;According to the entity type training Named Entity Extraction Model marked;Known using trained name entity
Other model is named Entity recognition to TCM Document to be processed, according to name Entity recognition as a result, obtaining that there are relationships
Candidate entity to and mark sheet;According to obtain there are the candidate entity of relationship to and mark sheet, with factor graph model carry out figure
The statistical inference of probability, global learning object relationship characteristic, obtains the probability between entity there are relationship;According to obtained entity
Between there are the probability of relationship, in conjunction with relationship type between the entity that dependency analysis extracts the method for true triple and has marked,
Determine the type of relationship between entity.In this way, by there are the dependency analysis sides of the probability of relationship and natural language processing between entity
Relationship type between the fact that method be combined with each other, and foundation extracts triple and the entity marked, determines relationship type between entity, from
And the accuracy rate of relationship type extraction between entity is improved, and clearly can structurally state TCM Document content.
Detailed description of the invention
The flow diagram of Fig. 1 Relation extraction method between the entity of TCM Document provided in an embodiment of the present invention;
Fig. 2 is Entity recognition result schematic diagram provided in an embodiment of the present invention;
Fig. 3 is candidate entity provided in an embodiment of the present invention to result schematic diagram;
Fig. 4 is character representation provided in an embodiment of the present invention intention;
Fig. 5 is the label schematic diagram that whether there is relationship between candidate entity pair provided in an embodiment of the present invention;
There are the probability results schematic diagrames of relationship between entity provided in an embodiment of the present invention by Fig. 6;
Fig. 7 relational result schematic diagram between finally formed entity provided in an embodiment of the present invention;
The structural schematic diagram of Fig. 8 Relation extraction device between the entity of TCM Document provided in an embodiment of the present invention.
Specific embodiment
To keep the technical problem to be solved in the present invention, technical solution and advantage clearer, below in conjunction with attached drawing and tool
Body embodiment is described in detail.
The present invention is difficult to accurately existing from aiming at the problem that extracting the relationship between entity in non-structured text,
Relation extraction method and device between a kind of entity of TCM Document is provided.
Embodiment one
As shown in Figure 1, Relation extraction method between the entity of TCM Document provided in an embodiment of the present invention, comprising:
S101 is closed between the entity type and entity that have marked for TCM Document to be processed, acquisition to its partial content
Set type;
S102, according to the entity type training Named Entity Extraction Model marked;
S103 is named Entity recognition to TCM Document to be processed using trained Named Entity Extraction Model,
According to name Entity recognition as a result, obtain there are the candidate entity of relationship to and mark sheet;
S104, according to obtain there are the candidate entity of relationship to and mark sheet, with factor graph model carry out figure probability
Statistical inference, global learning object relationship characteristic, obtains the probability between entity there are relationship;
S105 extracts the side of true triple in conjunction with dependency analysis according to the probability between obtained entity there are relationship
Relationship type between method and the entity marked determines the type of relationship between entity.
Relation extraction method between the entity of TCM Document described in the embodiment of the present invention, for TCM Document to be processed,
Relationship type between entity type and entity that acquisition has marked its partial content;According to the entity type training name marked
Entity recognition model;Entity recognition is named to TCM Document to be processed using trained Named Entity Extraction Model,
According to name Entity recognition as a result, obtain there are the candidate entity of relationship to and mark sheet;According to obtain there are the times of relationship
Select entity to and mark sheet, the statistical inference of figure probability is carried out with factor graph model, global learning object relationship characteristic obtains reality
There are the probability of relationship between body;According to the probability between obtained entity there are relationship, the fact three is extracted in conjunction with dependency analysis
Relationship type between the method for tuple and the entity marked determines the type of relationship between entity.In this way, being closed existing between entity
The probability of system and the dependency analysis method of natural language processing be combined with each other, according to the fact that extract triple and the reality marked
Relationship type between body determines relationship type between entity, to improve the accuracy rate that relationship type extracts between entity, and can be clear
Structurally state TCM Document content.
In the present embodiment, the extraction of relationship is also the knowledge mapping building of traditional Chinese medical science field and intelligent assisting in diagnosis and treatment system between entity
System lays the foundation, and is an indispensable important link.
In the present embodiment, before S101, according to the particular content of TCM Document to be processed, it can first determine that it is main
Chinese medicine entity type and entity between relationship type, and relationship type between entity type and entity is carried out to wherein 20% content
Mark.
In the foregoing between the entity of doctor's document in the specific embodiment of Relation extraction method, further, the basis
Marked entity type training Named Entity Extraction Model include:
According to the entity type marked, it is named entity recognition model training using natural language processing tool, is obtained
To the Named Entity Extraction Model for being suitable for TCM Document;
It will obtain being put into natural language processing tool suitable for the Named Entity Extraction Model of TCM Document is integrated, replace
The Named Entity Extraction Model of its script is changed, and is packaged, compiles.
In the present embodiment, according to the entity type marked, Stamford natural language processing tool can be used
(deepdive) it is named entity recognition model training, the Named Entity Extraction Model suitable for TCM Document is obtained, by this
Model integrated is put into deepdive, replaces the Named Entity Extraction Model of script in deepdive, and is packaged, is compiled.
In the present embodiment, deepdive is a kind of information extraction framing tools of Stamford natural language processing, main to use
In the information extraction of modern text, people, tissue, the relationship between place are extracted.
In the foregoing between the entity of doctor's document in the specific embodiment of Relation extraction method, further, the utilization
Trained Named Entity Extraction Model is named Entity recognition to TCM Document to be processed, according to name Entity recognition knot
Fruit, obtain there are the candidate entity of relationship to and mark sheet include:
Entity recognition is named to TCM Document to be processed using trained Named Entity Extraction Model;
Cartesian product operation is done to the entity recognized and obtains candidate entity pair;
The extraction that text feature is carried out to the entity of candidate entity centering, obtains the name entity of the context of candidate entity
Recognition result, constitutive characteristic table;
It whether there is relationship between determining candidate two entity of entity centering in part.
In the present embodiment, S103 is substantially carried out data preparation, prepares candidate entity to, the candidate entity pair of mark sheet and part
In between two entities with the presence or absence of being related to this three parts data, it is specific:
S1031, using the above-mentioned deepdive for being integrated with new Named Entity Extraction Model to TCM Document to be processed
It is named Entity recognition, cartesian product operation is done to the entity recognized and obtains candidate entity pair;
In the present embodiment, entity partners to exactly two entities, for example, entity A and entity B constitute entity to (A,
B)。
S1032 carries out the extraction of text feature to the entity of candidate entity centering, obtains the life of the context of candidate entity
Name Entity recognition is as a result, constitutive characteristic table;
S1033, to the candidate entity in part (for example, 20%) to being marked, there are the candidate entity of relationship to label for
True is designated as false there is no relationship.It can specify some rules simultaneously, to assist marking, such as have relationship between A and B,
Also there is relationship between so B and A, these rules can reduce the workload manually marked.The data of label are as probabilistic model
The priori knowledge of habit.So far, required data preparation is completed, and probabilistic model building of these data for after provides basis.
In the present embodiment, the probability between entity there are relationship is learnt using factor graph model, to construct probability
Model;It is specific: according to obtain there are the candidate entity of relationship to and mark sheet, the system of figure probability is carried out with factor graph model
Reasoning is counted, global learning object relationship characteristic forms the probabilistic model of relationship between entity, and the probabilistic model is real for determining
There are the probability of relationship between body.
In the present embodiment, factor graph is that an overall situation function Factorization with multivariable is obtained several local letters
Several products, the two-dimensional plot obtained based on this are called factor graph.
In the foregoing between the entity of doctor's document in the specific embodiment of Relation extraction method, further, the basis
There are the probability of relationship between obtained entity, in conjunction between the entity that dependency analysis extracts the method for true triple and has marked
Relationship type determines that the type of relationship between entity includes:
The entity pair between entity there are relationship probability greater than preset threshold is obtained, using the method for dependency analysis to entity
Between there are relationship probability be greater than preset threshold entity the sentence at place is analyzed, extract using verb as core the fact
Triple;
By the grammatical relation of parsing sentence, the triple of the fact that using predicate verb as core is constructed;
According to the predicate verb between entity pair, in conjunction with relationship type between the entity marked, relationship between entity is determined
Type.
In the present embodiment, obtaining extracting true ternary in conjunction with dependency analysis there are after the probability of relationship between entity
Relationship type determines the type of relationship between entity between the method for group and the entity marked, can specifically include following steps: right
In there are the entity pair that the probability of relationship is higher than preset threshold (for example, 0.8), these entities are analyzed according to the method for dependency analysis
To the sentence at place, the triple of the fact that using verb as core is extracted;By the Subject, Predicate and Object of parsing sentence or contain guest's Jie relationship
Subject-predicate it is dynamic some grammatical relations such as mend, construct the triple of the fact that using predicate verb as core;According to the meaning between entity pair
Language verb determines the type of relationship between entity in conjunction with relationship type between the entity marked in S101, as between final entity
The result of relationship.
In the present embodiment, sentence is disassembled as triple using the method for dependency analysis, that is, entity and its between
Relationship states a sentence, and the meaning of sentence can not only obtain structuring expression, also establish for future building knowledge mapping
Basis.
To sum up, Stamford natural language processing tool is revised as the information extraction side suitable for TCM Document by the present embodiment
Method simultaneously combines it with dependency analysis, proposes a kind of abstracting method of relationship between the entity for TCM Document, can be to non-knot
The TCM Document of structure is analyzed, and realizes structuring to TCM Document, and improve relationship type between entity extract it is accurate
Rate.
Relation extraction method between the entity of TCM Document described in embodiment for a better understanding of the present invention, with " Chinese medicine
Interpretation of the cause, onset and process of an illness disputatious science " for, Relation extraction method is described in detail the entity of the TCM Document between described in the embodiment of the present invention,
It can specifically include following steps:
First, to the partial content of " pathogenesis disputatious science ", for example, 20% content carries out between entity type and entity
Relationship type mark, and obtain relationship type between the entity type and entity marked.
In the present embodiment, the entity type includes: the cause of disease (by), sick position (bw) and performance (bx);Wherein, the cause of disease includes
The entities such as wind, cold, pry- and yin;Sick position includes the entities such as lung, network, stomach, spleen, enteron aisle and small intestine;Performance comprising Lung Qi obstraction,
Lung qi is unclear, lung loses the entities such as clear and rich and accumulation phlegm-heat in the interior.
In the present embodiment, it can classify between relationship entity in patient's condition differentiation, be divided into six classes, respectively combine
(between the cause of disease) relationship, infringement (cause of disease is to sick position) relationship, by infringement relationship, variation (sick position, the cause of disease) relationship, occur relationship and
Causality;Wherein,
In conjunction with (between the cause of disease) relationship mainly have be harmonious, and, and press from both sides, press from both sides, meet, fight knot etc. verbs dominate;
(cause of disease is to the sick position) relationship of infringement mainly by infringement, invasion, criminal, consume, diffuse, burn, decoct, enter, hurt, in, disturb, rush
It is leading the verbs such as to hit, block, flowing, damaging;
By infringement relationship mainly by by, by etc. verbs dominate;
Variation (sick position) relationship mainly by it is strongly fragrant, lose, the resistance of stagnant, solidifying, clear, inverse, numbness, become silted up, it is inverse disorderly, the verbs such as move, close and dominate;Become
Change (cause of disease) relationship mainly by absurd row, it is flourishing, stop up Sheng, condensation, Sheng, pent-up, the verbs such as rise and dominate;
Appearance relationship mainly the verbs such as is given birth to by becoming, gives birth to, changes, shows, formed, sees, transfers the possession of, accumulates, makes and dominated;
Causality mainly by causing, then, at, be, have, cause, even, the verbs such as occur and dominate.
Second, Named Entity Extraction Model is trained according to the entity type marked.
Third identified " pathogenesis disputatious science " using the obtained new Named Entity Extraction Model of training, such as: it can
To identify the entities such as the heart, lung, stomach as sick position, the entities such as wind, cold are the cause of disease, and the entities such as resolving sputum are performance, the part knot of identification
Fruit is as shown in Figure 2;Cartesian product operation is done to the entity recognized, obtains candidate entity pair, such as: saliva can be obtained, phlegm is constituted
Candidate entity pair, partial results are as shown in Figure 3;According to candidate entity pair as a result, its text feature is extracted, for example, former sentence is
If the strongly fragrant lung of chill is not understood, recognizing chill is the cause of disease, if its one word in left and right in original text is and strongly fragrant, their name entity knowledge
Other result be o and o, constitutive characteristic table, as shown in Figure 4, wherein o presentation-entity type be other;And determine the candidate entity in part
It whether there is relationship between two entity of centering, for example, can determine 20% candidate two entity of entity centering according to default rule
Between whether there is relationship, it is assumed that true indicates that there are relationship, false indicates that relationship is not present;Wherein, the default rule
It can be, such as have relationship between A and B, then also there is relationship between B and A, relationship part result is as shown in Figure 5.
4th, according to obtain there are the candidate entity of relationship to and mark sheet, with factor graph model carry out figure probability
Statistical inference, global learning object relationship characteristic form the probabilistic model of relationship between entity, the probabilistic model, for determining
There are the probability of relationship between entity, as a result as shown in Figure 6;
5th, it obtains between entity there are the higher entity of relationship probability, extracts true triple in conjunction with dependency analysis
Method, and according to relationship type between the entity marked in the first step, determine the physical relationship between entity;For example, obtaining " wind criminal
Lung position " the words is infringement relationship of the cause of disease to sick position, and partial results are as shown in Figure 7.
Embodiment two
The present invention also provides the specific embodiments of Relation extraction device between a kind of entity of TCM Document, due to the present invention
Between the entity of the TCM Document of offer between Relation extraction device and the entity of aforementioned TCM Document Relation extraction method specific reality
Apply that mode is corresponding, Relation extraction device can be by executing in above method specific embodiment between the entity of the TCM Document
Process step achieve the object of the present invention, therefore Relation extraction method specific embodiment between the entity of above-mentioned TCM Document
In explanation, be also applied for the specific embodiment of Relation extraction device between the entity of TCM Document provided by the invention,
It will not be described in great detail in present invention specific embodiment below.
As shown in figure 8, the embodiment of the present invention also provides Relation extraction device between the entity of TCM Document a kind of, comprising:
Module 11 is obtained, for being directed to TCM Document to be processed, obtains the entity type marked to its partial content
The relationship type between entity;
Training module 12, for according to the entity type training Named Entity Extraction Model marked;
Identification module 13, for being named using trained Named Entity Extraction Model to TCM Document to be processed
Entity recognition, according to name Entity recognition as a result, obtain there are the candidate entity of relationship to and mark sheet;
Determining module 14, for according to obtain there are the candidate entity of relationship to and mark sheet, with factor graph model into
The statistical inference of row figure probability, global learning object relationship characteristic, obtains the probability between entity there are relationship;
Abstraction module 15 is extracted true in conjunction with dependency analysis for according between obtained entity, there are the probability of relationship
Relationship type between the method for triple and the entity marked determines the type of relationship between entity.
Relation extraction device between the entity of TCM Document described in the embodiment of the present invention, for TCM Document to be processed,
Relationship type between entity type and entity that acquisition has marked its partial content;According to the entity type training name marked
Entity recognition model;Entity recognition is named to TCM Document to be processed using trained Named Entity Extraction Model,
According to name Entity recognition as a result, obtain there are the candidate entity of relationship to and mark sheet;According to obtain there are the times of relationship
Select entity to and mark sheet, the statistical inference of figure probability is carried out with factor graph model, global learning object relationship characteristic obtains reality
There are the probability of relationship between body;According to the probability between obtained entity there are relationship, the fact three is extracted in conjunction with dependency analysis
Relationship type between the method for tuple and the entity marked determines the type of relationship between entity.In this way, being closed existing between entity
The probability of system and the dependency analysis method of natural language processing be combined with each other, according to the fact that extract triple and the reality marked
Relationship type between body determines relationship type between entity, to improve the accuracy rate that relationship type extracts between entity, and can be clear
Structurally state TCM Document content.
In the foregoing between the entity of doctor's document in the specific embodiment of Relation extraction device, further, the training
Module includes:
Training unit, for being named entity using natural language processing tool and knowing according to the entity type marked
Other model training obtains the Named Entity Extraction Model suitable for TCM Document;
Replacement unit is put into natural language for integrating the obtained Named Entity Extraction Model suitable for TCM Document
In handling implement, the Named Entity Extraction Model of its script is replaced, and is packaged, compiles.
In the foregoing between the entity of doctor's document in the specific embodiment of Relation extraction device, further, the identification
Module includes:
Recognition unit, for being named reality to TCM Document to be processed using trained Named Entity Extraction Model
Body identification;
Arithmetic element obtains candidate entity pair for doing cartesian product operation to the entity recognized;
Component units carry out the extraction of text feature for the entity to candidate entity centering, obtain the upper of candidate entity
Name Entity recognition hereafter is as a result, constitutive characteristic table;
First determination unit, for whether there is relationship between candidate two entity of entity centering in determining part.
In the foregoing between the entity of doctor's document in the specific embodiment of Relation extraction device, further, the extraction
Module includes:
Analytical unit utilizes interdependent point for obtaining between entity there are the entity pair that relationship probability is greater than preset threshold
The method of analysis analyzes the sentence at place the entity for being greater than preset threshold there are relationship probability between entity, extracts with dynamic
The fact that word is core triple;
Construction unit constructs the triple of the fact that using predicate verb as core for the grammatical relation by parsing sentence;
Second determination unit, for according to the predicate verb between entity pair, in conjunction with relationship type between the entity marked,
Determine the type of relationship between entity.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality
Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation
In any actual relationship or order or sequence.
The above is a preferred embodiment of the present invention, it is noted that for those skilled in the art
For, without departing from the principles of the present invention, several improvements and modifications can also be made, these improvements and modifications
It should be regarded as protection scope of the present invention.
Claims (8)
1. a kind of Relation extraction method between entity of TCM Document characterized by comprising
For TCM Document to be processed, relationship type between the entity type and entity marked to its partial content is obtained;
According to the entity type training Named Entity Extraction Model marked;
Entity recognition is named to TCM Document to be processed using trained Named Entity Extraction Model, it is real according to name
Body recognition result, obtain there are the candidate entity of relationship to and mark sheet;
According to obtain there are the candidate entity of relationship to and mark sheet, the statistical inference of figure probability is carried out with factor graph model,
Global learning object relationship characteristic, obtains the probability between entity there are relationship;
According to the probability between obtained entity there are relationship, the method for true triple is extracted in conjunction with dependency analysis and has been marked
Entity between relationship type, determine the type of relationship between entity.
2. Relation extraction method between the entity of TCM Document according to claim 1, which is characterized in that the basis has been marked
Note entity type training Named Entity Extraction Model include:
According to the entity type marked, it is named entity recognition model training using natural language processing tool, is fitted
Named Entity Extraction Model for TCM Document;
It will obtain being put into natural language processing tool suitable for the Named Entity Extraction Model of TCM Document is integrated, replace
The Named Entity Extraction Model of its script, and be packaged, compile.
3. Relation extraction method between the entity of TCM Document according to claim 1, which is characterized in that described to utilize training
Good Named Entity Extraction Model is named Entity recognition to TCM Document to be processed, according to name Entity recognition as a result,
Obtain there are the candidate entity of relationship to and mark sheet include:
Entity recognition is named to TCM Document to be processed using trained Named Entity Extraction Model;
Cartesian product operation is done to the entity recognized and obtains candidate entity pair;
The extraction that text feature is carried out to the entity of candidate entity centering, obtains the name Entity recognition of the context of candidate entity
As a result, constitutive characteristic table;
It whether there is relationship between determining candidate two entity of entity centering in part.
4. Relation extraction method between the entity of TCM Document according to claim 1, which is characterized in that the basis obtains
Entity between there are the probability of relationship, in conjunction with relationship between the entity that dependency analysis extracts the method for true triple and has marked
Type determines that the type of relationship between entity includes:
The entity pair between entity there are relationship probability greater than preset threshold is obtained, using the method for dependency analysis between entity
The entity for being greater than preset threshold there are relationship probability analyzes the sentence at place, extracts the ternary of the fact that using verb as core
Group;
By the grammatical relation of parsing sentence, the triple of the fact that using predicate verb as core is constructed;
The type of relationship between entity is determined in conjunction with relationship type between the entity marked according to the predicate verb between entity pair.
5. Relation extraction device between a kind of entity of TCM Document characterized by comprising
Module is obtained, for being directed to TCM Document to be processed, obtains the entity type and entity marked to its partial content
Between relationship type;
Training module, for according to the entity type training Named Entity Extraction Model marked;
Identification module is known for being named entity to TCM Document to be processed using trained Named Entity Extraction Model
Not, according to name Entity recognition as a result, obtain there are the candidate entity of relationship to and mark sheet;
Determining module, for according to obtain there are the candidate entity of relationship to and mark sheet, carry out figure with factor graph model general
The statistical inference of rate, global learning object relationship characteristic, obtains the probability between entity there are relationship;
Abstraction module extracts true triple in conjunction with dependency analysis for there are the probability of relationship according between obtained entity
Method and the entity that has marked between relationship type, determine the type of relationship between entity.
6. Relation extraction device between the entity of TCM Document according to claim 5, which is characterized in that the training module
Include:
Training unit, for being named Entity recognition mould using natural language processing tool according to the entity type marked
Type training obtains the Named Entity Extraction Model suitable for TCM Document;
Replacement unit is put into natural language processing for integrating the obtained Named Entity Extraction Model suitable for TCM Document
In tool, the Named Entity Extraction Model of its script is replaced, and is packaged, compiles.
7. Relation extraction device between the entity of TCM Document according to claim 5, which is characterized in that the identification module
Include:
Recognition unit is known for being named entity to TCM Document to be processed using trained Named Entity Extraction Model
Not;
Arithmetic element obtains candidate entity pair for doing cartesian product operation to the entity recognized;
Component units carry out the extraction of text feature for the entity to candidate entity centering, obtain the context of candidate entity
Name Entity recognition as a result, constitutive characteristic table;
First determination unit, for whether there is relationship between candidate two entity of entity centering in determining part.
8. Relation extraction device between the entity of TCM Document according to claim 5, which is characterized in that the abstraction module
Include:
Analytical unit utilizes dependency analysis for obtaining between entity there are the entity pair that relationship probability is greater than preset threshold
Method analyzes the sentence at place the entity for being greater than preset threshold there are relationship probability between entity, and extraction is with verb
The fact that core triple;
Construction unit constructs the triple of the fact that using predicate verb as core for the grammatical relation by parsing sentence;
Second determination unit, for being determined according to the predicate verb between entity pair in conjunction with relationship type between the entity marked
The type of relationship between entity.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910293263.9A CN110032649B (en) | 2019-04-12 | 2019-04-12 | Method and device for extracting relationships between entities in traditional Chinese medicine literature |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910293263.9A CN110032649B (en) | 2019-04-12 | 2019-04-12 | Method and device for extracting relationships between entities in traditional Chinese medicine literature |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110032649A true CN110032649A (en) | 2019-07-19 |
CN110032649B CN110032649B (en) | 2021-10-01 |
Family
ID=67238140
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910293263.9A Active CN110032649B (en) | 2019-04-12 | 2019-04-12 | Method and device for extracting relationships between entities in traditional Chinese medicine literature |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110032649B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110543571A (en) * | 2019-08-07 | 2019-12-06 | 北京市天元网络技术股份有限公司 | knowledge graph construction method and device for water conservancy informatization |
CN112036151A (en) * | 2020-09-09 | 2020-12-04 | 平安科技(深圳)有限公司 | Method and device for constructing gene disease relation knowledge base and computer equipment |
CN112036171A (en) * | 2020-09-04 | 2020-12-04 | 平安科技(深圳)有限公司 | Method, system and device for extracting specific medical names and relationships thereof |
CN112329440A (en) * | 2020-09-01 | 2021-02-05 | 浪潮云信息技术股份公司 | Relation extraction method and device based on two-stage screening and classification |
CN112599211A (en) * | 2020-12-25 | 2021-04-02 | 中电云脑(天津)科技有限公司 | Medical entity relationship extraction method and device |
CN112766485A (en) * | 2020-12-31 | 2021-05-07 | 平安科技(深圳)有限公司 | Training method, device, equipment and medium for named entity model |
CN112989032A (en) * | 2019-12-17 | 2021-06-18 | 医渡云(北京)技术有限公司 | Entity relationship classification method, apparatus, medium and electronic device |
CN114139610A (en) * | 2021-11-15 | 2022-03-04 | 中国中医科学院中医药信息研究所 | Traditional Chinese medicine clinical literature data structuring method and device based on deep learning |
CN112036171B (en) * | 2020-09-04 | 2024-06-25 | 平安科技(深圳)有限公司 | Extraction method, system and device for medical specific references and relation thereof |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107169079A (en) * | 2017-05-10 | 2017-09-15 | 浙江大学 | A kind of field text knowledge abstracting method based on Deepdive |
CN107247739A (en) * | 2017-05-10 | 2017-10-13 | 浙江大学 | A kind of financial publication text knowledge extracting method based on factor graph |
CN108280062A (en) * | 2018-01-19 | 2018-07-13 | 北京邮电大学 | Entity based on deep learning and entity-relationship recognition method and device |
CN108875051A (en) * | 2018-06-28 | 2018-11-23 | 中译语通科技股份有限公司 | Knowledge mapping method for auto constructing and system towards magnanimity non-structured text |
CN108874878A (en) * | 2018-05-03 | 2018-11-23 | 众安信息技术服务有限公司 | A kind of building system and method for knowledge mapping |
CN109062894A (en) * | 2018-07-19 | 2018-12-21 | 南京源成语义软件科技有限公司 | The automatic identification algorithm of Chinese natural language Entity Semantics relationship |
US20190005020A1 (en) * | 2017-06-30 | 2019-01-03 | Elsevier, Inc. | Systems and methods for extracting funder information from text |
CN109190113A (en) * | 2018-08-10 | 2019-01-11 | 北京科技大学 | A kind of knowledge mapping construction method of theory of traditional Chinese medical science ancient books and records |
CN109241538A (en) * | 2018-09-26 | 2019-01-18 | 上海德拓信息技术股份有限公司 | Based on the interdependent Chinese entity relation extraction method of keyword and verb |
-
2019
- 2019-04-12 CN CN201910293263.9A patent/CN110032649B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107169079A (en) * | 2017-05-10 | 2017-09-15 | 浙江大学 | A kind of field text knowledge abstracting method based on Deepdive |
CN107247739A (en) * | 2017-05-10 | 2017-10-13 | 浙江大学 | A kind of financial publication text knowledge extracting method based on factor graph |
US20190005020A1 (en) * | 2017-06-30 | 2019-01-03 | Elsevier, Inc. | Systems and methods for extracting funder information from text |
CN108280062A (en) * | 2018-01-19 | 2018-07-13 | 北京邮电大学 | Entity based on deep learning and entity-relationship recognition method and device |
CN108874878A (en) * | 2018-05-03 | 2018-11-23 | 众安信息技术服务有限公司 | A kind of building system and method for knowledge mapping |
CN108875051A (en) * | 2018-06-28 | 2018-11-23 | 中译语通科技股份有限公司 | Knowledge mapping method for auto constructing and system towards magnanimity non-structured text |
CN109062894A (en) * | 2018-07-19 | 2018-12-21 | 南京源成语义软件科技有限公司 | The automatic identification algorithm of Chinese natural language Entity Semantics relationship |
CN109190113A (en) * | 2018-08-10 | 2019-01-11 | 北京科技大学 | A kind of knowledge mapping construction method of theory of traditional Chinese medical science ancient books and records |
CN109241538A (en) * | 2018-09-26 | 2019-01-18 | 上海德拓信息技术股份有限公司 | Based on the interdependent Chinese entity relation extraction method of keyword and verb |
Non-Patent Citations (3)
Title |
---|
HUAIYU WAN等: ""Extracting relations from traditional Chinese medicine literature via heterogeneous entity networks"", 《JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION》 * |
朱玲等: ""基于关键动词的中医古籍概念实体间予以关系发现研究"", 《中国数字医学》 * |
林伟贇: ""基于海量网页的同类命名实体共现统计规律的研究"", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110543571A (en) * | 2019-08-07 | 2019-12-06 | 北京市天元网络技术股份有限公司 | knowledge graph construction method and device for water conservancy informatization |
CN112989032A (en) * | 2019-12-17 | 2021-06-18 | 医渡云(北京)技术有限公司 | Entity relationship classification method, apparatus, medium and electronic device |
CN112329440A (en) * | 2020-09-01 | 2021-02-05 | 浪潮云信息技术股份公司 | Relation extraction method and device based on two-stage screening and classification |
CN112036171A (en) * | 2020-09-04 | 2020-12-04 | 平安科技(深圳)有限公司 | Method, system and device for extracting specific medical names and relationships thereof |
WO2021169354A1 (en) * | 2020-09-04 | 2021-09-02 | 平安科技(深圳)有限公司 | Method and system for extracting specific medical references and relationship thereof, and apparatus |
CN112036171B (en) * | 2020-09-04 | 2024-06-25 | 平安科技(深圳)有限公司 | Extraction method, system and device for medical specific references and relation thereof |
CN112036151A (en) * | 2020-09-09 | 2020-12-04 | 平安科技(深圳)有限公司 | Method and device for constructing gene disease relation knowledge base and computer equipment |
CN112036151B (en) * | 2020-09-09 | 2024-04-05 | 平安科技(深圳)有限公司 | Gene disease relation knowledge base construction method, device and computer equipment |
CN112599211A (en) * | 2020-12-25 | 2021-04-02 | 中电云脑(天津)科技有限公司 | Medical entity relationship extraction method and device |
CN112599211B (en) * | 2020-12-25 | 2023-03-21 | 中电云脑(天津)科技有限公司 | Medical entity relationship extraction method and device |
CN112766485A (en) * | 2020-12-31 | 2021-05-07 | 平安科技(深圳)有限公司 | Training method, device, equipment and medium for named entity model |
WO2022142123A1 (en) * | 2020-12-31 | 2022-07-07 | 平安科技(深圳)有限公司 | Training method and apparatus for named entity model, device, and medium |
CN112766485B (en) * | 2020-12-31 | 2023-10-24 | 平安科技(深圳)有限公司 | Named entity model training method, device, equipment and medium |
CN114139610A (en) * | 2021-11-15 | 2022-03-04 | 中国中医科学院中医药信息研究所 | Traditional Chinese medicine clinical literature data structuring method and device based on deep learning |
CN114139610B (en) * | 2021-11-15 | 2024-04-26 | 中国中医科学院中医药信息研究所 | Deep learning-based traditional Chinese medicine clinical literature data structuring method and device |
Also Published As
Publication number | Publication date |
---|---|
CN110032649B (en) | 2021-10-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110032649A (en) | Relation extraction method and device between a kind of entity of TCM Document | |
CN109460473B (en) | Electronic medical record multi-label classification method based on symptom extraction and feature representation | |
WO2020119075A1 (en) | General text information extraction method and apparatus, computer device and storage medium | |
CN110442869B (en) | Medical text processing method and device, equipment and storage medium thereof | |
CN110851599B (en) | Automatic scoring method for Chinese composition and teaching assistance system | |
CN106599032B (en) | Text event extraction method combining sparse coding and structure sensing machine | |
CN108984526A (en) | A kind of document subject matter vector abstracting method based on deep learning | |
CN110297908A (en) | Diagnosis and treatment program prediction method and device | |
CN106844741A (en) | A kind of answer method towards specific area | |
CN106897559B (en) | A kind of symptom and sign class entity recognition method and device towards multi-data source | |
CN106844351B (en) | Medical institution organization entity identification method and device oriented to multiple data sources | |
CN106874643A (en) | Build the method and system that knowledge base realizes assisting in diagnosis and treatment automatically based on term vector | |
CN109472026A (en) | Accurate emotion information extracting methods a kind of while for multiple name entities | |
CN108628970A (en) | A kind of biomedical event joint abstracting method based on new marking mode | |
CN110362819B (en) | Text emotion analysis method based on convolutional neural network | |
CN107145514B (en) | Chinese sentence pattern classification method based on decision tree and SVM mixed model | |
WO2023029502A1 (en) | Method and apparatus for constructing user portrait on the basis of inquiry session, device, and medium | |
CN105138864B (en) | Protein interactive relation data base construction method based on Biomedical literature | |
CN107247739B (en) | A kind of financial bulletin text knowledge extracting method based on factor graph | |
CN110188193A (en) | A kind of electronic health record entity relation extraction method based on most short interdependent subtree | |
CN107180026B (en) | Event phrase learning method and device based on word embedding semantic mapping | |
CN107679110A (en) | The method and device of knowledge mapping is improved with reference to text classification and picture attribute extraction | |
CN104298714B (en) | A kind of mass text automatic marking method based on abnormality processing | |
CN110069636B (en) | Event time sequence relation identification method fusing dependency relationship and discourse and retrieval relationship | |
CN110119510A (en) | A kind of Relation extraction method and device based on transmitting dependence and structural auxiliary word |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |