CN110457677A - Entity-relationship recognition method and device, storage medium, computer equipment - Google Patents
Entity-relationship recognition method and device, storage medium, computer equipment Download PDFInfo
- Publication number
- CN110457677A CN110457677A CN201910559111.9A CN201910559111A CN110457677A CN 110457677 A CN110457677 A CN 110457677A CN 201910559111 A CN201910559111 A CN 201910559111A CN 110457677 A CN110457677 A CN 110457677A
- Authority
- CN
- China
- Prior art keywords
- training sample
- text
- entity relationship
- entity
- label
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000003860 storage Methods 0.000 title claims abstract description 18
- 238000012549 training Methods 0.000 claims abstract description 197
- 239000013598 vector Substances 0.000 claims abstract description 109
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 69
- 238000004590 computer program Methods 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 5
- 239000007787 solid Substances 0.000 claims description 4
- 239000000203 mixture Substances 0.000 claims description 3
- 230000010365 information processing Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 4
- 239000000284 extract Substances 0.000 description 4
- 238000000605 extraction Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000012216 screening Methods 0.000 description 3
- 230000011218 segmentation Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 240000008042 Zea mays Species 0.000 description 1
- 235000005824 Zea mays ssp. parviglumis Nutrition 0.000 description 1
- 235000002017 Zea mays subsp mays Nutrition 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 235000005822 corn Nutrition 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000003475 lamination Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Animal Behavior & Ethology (AREA)
- Character Discrimination (AREA)
- Machine Translation (AREA)
Abstract
This application discloses entity-relationship recognition method and device, storage medium, computer equipments, are related to technical field of information processing, can effectively promote the recognition accuracy to entity relationship.Wherein method includes: to obtain the text vector of text to be identified according to the text to be identified got using preset first instance relation recognition model;The convolution algorithm result of the text vector is obtained according to the text vector of text to be identified;According to the text vector and obtained convolution algorithm as a result, determining the entity relationship for including in text to be identified;Wherein, the preset first instance relation recognition model is obtained based on the training of credible training sample set.The application is suitable for the identification of text entities relationship.
Description
Technical field
This application involves technical field of information processing, are situated between particularly with regard to entity-relationship recognition method and device, storage
Matter and computer equipment.
Background technique
With the development of science and technology, more and more for the relation recognition method between some words and word, it fits
Scene is also more and more extensive, such as the upper and lower relation between some place names, the hierarchical relationship between national structure, article kind
The inclusion relation etc. of class, and these needs are trained neural network using a large amount of sample data, and then establish corresponding
Identification model is to realize the extraction to the relationship (that is, entity relationship) in text between word and word.
The shortcomings of the prior art is that can effectively construct training sample set based on remote supervisory to realize to identification
The training of model, but training sample set is still easy to be mixed into during building the training sample of mistake, trained to the later period
To the accuracy of identification of identification model be affected, the standard of entity relationship is extracted to text so as to cause the identification model after training
True rate is lower, influences the usage experience of user.
Summary of the invention
In view of this, this application provides entity-relationship recognition method and device, storage medium, computer equipments, mainly
Purpose is the training sample for solving to be easy to be mixed into mistake when constructing training sample currently based on remote supervisory, so as to cause training
The lower technical problem of the accuracy rate that identification model afterwards extracts entity relationship to text.
According to the one aspect of the application, a kind of entity-relationship recognition method is provided, this method comprises:
Using preset first instance relation recognition model, text to be identified is obtained according to the text to be identified got
Text vector;
The convolution algorithm result of the text vector is obtained according to the text vector of text to be identified;
According to the text vector and obtained convolution algorithm as a result, determining the entity relationship for including in text to be identified;
Wherein, the preset first instance relation recognition model is obtained based on the training of credible training sample set.
According to the another aspect of the application, a kind of entity-relationship recognition device is provided, which includes:
Module is obtained, for utilizing preset first instance relation recognition model, is obtained according to the text to be identified got
To the text vector of text to be identified;
Convolution algorithm module, for obtaining the convolution algorithm knot of the text vector according to the text vector of text to be identified
Fruit;
Entity relationship module, for according to the text vector and obtained convolution algorithm as a result, determining text to be identified
In include entity relationship;
Wherein, the preset first instance relation recognition model is obtained based on the training of credible training sample set.
According to the application another aspect, a kind of storage medium is provided, computer program, described program are stored thereon with
Above-mentioned entity-relationship recognition method is realized when being executed by processor.
According to the application another aspect, a kind of computer equipment is provided, including storage medium, processor and be stored in
On storage medium and the computer program that can run on a processor, the processor realize above-mentioned entity when executing described program
Relation recognition method.
By above-mentioned technical proposal, entity-relationship recognition method and device provided by the present application, storage medium, computer are set
It is standby, and the easy training sample set for being mixed into error training sample constructed currently based on remote supervisory, and then the use that training obtains
It is compared in the lower identification model of accuracy rate for extracting entity relationship to text, the application is known using preset first instance relationship
Other model obtains the text vector of text to be identified according to the text to be identified got, according to the text of text to be identified to
The convolution algorithm of the text vector is measured as a result, and according to the text vector and obtained convolution algorithm as a result, really
The entity relationship for including in fixed text to be identified, wherein preset first instance relation recognition model is based on credible trained sample
What this training was got, therefore, the preset first instance relationship that the credible training sample set training based on high quality obtains is known
Other model can effectively promote the recognition accuracy to entity relationship.
Above description is only the general introduction of technical scheme, in order to better understand the technological means of the application,
And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects, features and advantages of the application can
It is clearer and more comprehensible, below the special specific embodiment for lifting the application.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present application, constitutes part of this application, this Shen
Illustrative embodiments and their description please are not constituted an undue limitation on the present application for explaining the application.In the accompanying drawings:
Fig. 1 shows a kind of flow diagram of entity-relationship recognition method provided by the embodiments of the present application;
Fig. 2 shows the flow diagrams of another entity-relationship recognition method provided by the embodiments of the present application;
Fig. 3 shows a kind of structural schematic diagram of entity-relationship recognition device provided by the embodiments of the present application.
Specific embodiment
The application is described in detail below with reference to attached drawing and in conjunction with the embodiments.It should be noted that not conflicting
In the case of, the features in the embodiments and the embodiments of the present application can be combined with each other.
For the training sample for being easy to be mixed into mistake when remote supervisory constructs training sample at present, after training
Identification model the lower technical problem of accuracy rate of entity relationship is extracted to text.Present embodiments provide a kind of entity relationship
Recognition methods can extract the higher entity-relationship recognition model of entity relationship accuracy to text by building, to improve
Recognition accuracy to the entity relationship in text, as shown in Figure 1, this method comprises:
101, using preset first instance relation recognition model, text to be identified is obtained according to the text to be identified got
This text vector.Wherein, the preset first instance relation recognition model is obtained based on the training of credible training sample set
, the credible training sample set is constructed by the credible training sample with entity relationship label.
Text to be identified is obtained, the text to be identified got is pre-processed, the text vector initialized, and
The text vector of initialization is inputted to the embeding layer of preset first instance relation recognition model, is generated for characterizing text to be identified
This text vector.
Wherein, pretreatment can specifically be set according to actual application scenarios, such as set the pretreatment as participle
Processing carries out the mark of word segmentation to text to be identified that is, as unit of word;Or set the pretreatment as word Screening Treatment, i.e.,
After carrying out the mark of word segmentation to text to be identified as unit of word, unessential word is rejected, for example, " can, should " etc. helps
The unessential word such as the interjections such as verb, and " oh, ", to promote the identification effect to the entity relationship in text to be identified
Rate does not limit pretreatment specifically herein.
102, the convolution algorithm result of the text vector is obtained according to the text vector of text to be identified.
It is defeated after the text vector of text to be identified completes a series of operation via convolutional layer, pond layer and full articulamentum
Multidimensional characteristic vectors out comprising entity relationship in initialization text vector, to realize in the text vector of text to be identified
The capture and extraction of relation information between word.
103, according to the text vector and obtained convolution algorithm as a result, determining that the entity for including in text to be identified closes
System.
By convolution algorithm that convolutional layer obtains as a result, and the obtained convolution algorithm result input of Chi Huahou of pond layer it is pre-
If first instance relation recognition model full articulamentum, full articulamentum is using activation primitive softmax to obtained each single item
The convolution algorithm result of convolution kernel output is associated, and the convolution algorithm after being associated with is as a result, and will be after obtained association
Convolution algorithm result is combined with the convolution algorithm result for the Chi Huahou that pond layer exports, and the recessiveness exported in text to be identified is special
Sign, recessive character are used to characterize the entity relationship in text to be identified between word.
It can be according to above scheme, using preset first instance relation recognition model, according to acquisition for the present embodiment
To text to be identified obtain the text vector of text to be identified, according to the text vector of text to be identified obtain the text to
The convolution algorithm of amount according to the text vector and obtained convolution algorithm as a result, determining in text to be identified as a result, and wrap
The entity relationship contained, wherein preset first instance relation recognition model is obtained based on the training of credible training sample set, with
Currently based on the easy training sample set for being mixed into error training sample that remote supervisory constructs, and then train what is obtained to be used for text
The lower identification model of the accuracy rate of this extraction entity relationship is compared, and the present embodiment can be based on the credible training sample of high quality
The preset first instance relation recognition model that training is got, can effectively promote the recognition accuracy to entity relationship.
Further, as the refinement and extension of above-described embodiment specific embodiment, in order to completely illustrate the present embodiment
Specific implementation process, provide another entity-relationship recognition method, as shown in Fig. 2, this method comprises:
201, the second instance relation recognition model of initialization is trained, obtains preset second instance relation recognition
Model.
Preset second instance relation recognition model is for constructing credible training sample set, in the second instance to initialization
Relation recognition model obtains the initialization instruction for training the second instance relation recognition model of the initialization before being trained
Practice sample set.Initialization training sample set is obtained specifically, obtaining a large amount of text data as initialization training sample set
Training sample, each training sample include triple, and three features of triple, that is, text, three features are for indicating the instruction
Practice sample in word name entity and word between entity relationship, i.e., two name entity E1, E2 and they between
Relationship R is expressed as (E1, R, E2).Label label is carried out to each training sample, obtains the label instruction with entity relationship label
Practice sample, the entity relationship classification that label, that is, training sample is included, for example, the entity relationship class in training sample between word
Do not belong to internet business class, financial class perhaps GEOGRAPHIC ATTRIBUTES etc. or the specific entity relationship classification of refinement, for example, " in
State ", " Shanghai " are geographical inclusion relation, " financial service ", " insurance production in the knowledge mapping based on insurance products intelligent customer service
Product " are clause inclusion relation etc., so that the second instance relation recognition model of initialization is according to the ternary in label training sample
Group carries out Tag Estimation to label training sample, and by the true mark of Tag Estimation result and the label training sample real marking
Label are compared, to obtain the higher preset second instance relation recognition model of recognition accuracy by repetitive exercise.
In order to illustrate the specific embodiment of step 201, as a kind of preferred embodiment, step 201 be can specifically include:
Using the second instance relation recognition model of initialization, entity relationship is carried out to the label training sample with entity relationship label
Prediction;The second instance relationship of initialization is known according to the entity relationship label of entity relationship prediction result and label training sample
Network parameter in other model is trained, and obtains preset second instance relation recognition model.
Label training sample with entity relationship label is the initialization text vector marked with entity relationship, initially
The second instance relation recognition model of change is according to the initialization text vector with entity relationship label, via embeding layer, convolution
Layer, pond layer and full articulamentum complete a series of operation, are believed with realizing the relationship in initialization text vector between word
The capture and extraction of breath, thus multidimensional characteristic vectors of the output comprising entity relationship in initialization text vector, so as to according to defeated
Multidimensional characteristic vectors out, training obtain the preset second instance relation recognition model of entity relationship in text for identification.
In practical application scene, text vector is initialized via the second instance of initialization with entity relationship label
The embeding layer of relation recognition model exports to obtain the term vector for corresponding to the initialization text vector, and the term vector that output is obtained is defeated
The convolutional layer for entering the second instance relation recognition model of initialization, specifically, in the term vector that convolutional layer exports embeding layer
Adjacent n term vector carries out convolution algorithm, for example, set convolution kernel length is 3, i.e., using dimension for 3 convolution kernel to owning
3 adjacent term vectors carry out convolution algorithm, obtain the convolution algorithm result of each single item convolution kernel output.By each single item convolution kernel
The pond layer of the second instance relation recognition model of the convolution algorithm result input initialization of output, specifically, pond layer is to defeated
The convolution algorithm result of each single item convolution kernel output entered carries out pond operation, extracts the convolution fortune of the Chi Huahou in a fixed step size
It calculates as a result, pond operation can be maximum pond, average pond etc..It is captured between adjacent term vector using convolutional layer and pond layer
Relation information, with realize to initialization text vector local message capture.
Convolution algorithm that convolutional layer is obtained as a result, and pond Hua Ceng Chi Huahou convolution algorithm result input initialization
The full articulamentum of second instance relation recognition model, convolution algorithm result of the full articulamentum to obtained each single item convolution kernel output
It is associated, the convolution algorithm after being associated with is as a result, and export the convolution algorithm result after obtained association with pond layer
The convolution algorithm result of Chi Huahou combine, the recessive character in initialization text vector is obtained, to realize to initialization
The capture for the global information for including in text vector.Wherein, recessive character is for characterizing in initialization text vector between word
Entity relationship.
According to the recessive character of obtained initialization text vector, after successive ignition training, obtain literary for identification
The preset second instance relation recognition model of entity relationship in this.
202, credible training sample set is constructed according to the credible training sample with entity relationship label.
In practical application scene, by screening remote supervisory training sample, obtain marking with entity relationship
Credible training sample.Specifically, obtain remote supervisory training sample, the remote supervisory training sample be initialize text to
Remote supervisory training sample is inputted preset second instance relation recognition model, and utilizes constructed Gaussian Mixture mould by amount
Type obtains the output for characterizing the entity relationship for including in remote supervisory training sample as a result, according to output result and with real
The label training sample of body relation mark obtains the credible training sample for constructing credible training sample set.
It is marked as a kind of preferred embodiment according to entity relationship in order to illustrate the specific embodiment of step 202
Credible training sample construct credible training sample set, can specifically include: utilizing preset second instance relation recognition model
Entity relationship prediction is carried out to remote supervisory training sample;According to entity relationship prediction result and the mark marked with entity relationship
Remember training sample, obtains the credible training sample with entity relationship label.
Preset second is utilized as a kind of preferred embodiment in order to further illustrate the specific embodiment of step 202
Entity-relationship recognition model carries out entity relationship prediction to remote supervisory training sample, can specifically include: utilizing preset the
Two entity-relationship recognition models carry out convolution algorithm to the label training sample with entity relationship label and obtain convolution algorithm knot
Fruit;According to the entity relationship label in the convolution algorithm result and the label training sample, to the Gaussian Mixture of initialization
Model is trained to obtain trained gauss hybrid models;Using trained gauss hybrid models, to remote supervisory training
Sample carries out entity relationship prediction.
In practical application scene, a large amount of remote supervisory training samples are inputted into preset second instance relation recognition mould
Type is sequentially output using the full articulamentum of preset second instance relation recognition model for characterizing in remote supervisory training sample
The output for the entity relationship for including as a result, using the gauss hybrid models (GMM:Gaussian Mixed Model) built,
According to the output sequence of output result, successively closed with the label in the remote supervisory training sample of the corresponding output result
Connection, by taking first group of output result as an example, for the remote supervisory training sample of first group of output result and the corresponding output result
In the realization process that is associated of label specifically:
It should be noted that the specific training process of trained GMM is, it is assumed that preset second instance relation recognition mould
The full articulamentum of type be sequentially output for characterize the entity relationship for including in remote supervisory training sample output result (that is,
For training initialization GMM training sample set) in include L group with entity relationship mark label training sample (xi, yi) and
The remote supervisory training sample x that u group is extracted by remote supervisoryL+ j, wherein 1≤i≤L, 1≤j≤u, then training sample set
D={ (x1, y1), (x2, y2) ..., (xL, yL), xL+ 1, xL+ 2 ..., xL+ u }, it is concentrated according to training sample and has entity relationship mark
The label training sample of note, building initialization GMM, and obtained according to the label training sample training with entity relationship label
The network parameter of GMM, to obtain trained GMM.
Assuming that the label training sample with entity relationship label includes m class, the label of entity relationship label is had with L group
For training sample, if γijIndicate label training sample xjBelong to the probability value of the i-th class, then its γijValue is for class shown in label
Biao Ji not be 0 for category label shown in remaining.For example, according to the demand of practical application scene, i-th Gaussian component is
The i-th class in training sample is marked, i.e. the i-th class is clause inclusion relation, γijIndicate label training sample xjBelonging to clause includes
The probability value of relationship.
The calculation formula of the probability distribution of GMM is as follows:
Wherein, N (x | μi,∑i) indicate GMM in i-th of Gaussian component, π is mixed coefficint, is equivalent to each component
Weight, x are feature vector (i.e. training sample), and μ is the mean vector of x, and ∑ is covariance matrix.
Using EM algorithm (EM:Expectation-Maximization algorithm), instructed according to L group echo
Practice the initial parameter π that sample determines GMMi、μi、∑i, the initial network parameter π of GMMi、μi、∑iCalculation formula it is as follows:
During carrying out parameter Estimation to GMM, using Expectation step (E step), according to initial network
Parameter πi、μi、∑iLabel classification belonging to predictive marker training sample;And utilize Maximization step (M
Step), the label classification of the label training sample obtained according to prediction updates initial parameter πi、μi、∑i。
Wherein, the calculation formula of E step is as follows:
The calculation formula of M step is as follows:
E step and M step are repeated in based on semi-supervised learning method until convergence, obtains trained parameter πi、
μi、∑i, to obtain trained GMM.Using trained GMM, entity relationship prediction is carried out to remote supervisory training sample,
According to entity relationship prediction result and with entity relationship label label training sample, obtain with entity relationship label can
Believe training sample.
It is pre- according to entity relationship as a kind of preferred embodiment in order to further illustrate the specific embodiment of step 202
Result and the label training sample with entity relationship label are surveyed, the credible training sample with entity relationship label, tool are obtained
If body may include: in the entity relationship of remote supervisory training sample that prediction obtains and the remote supervisory training sample just
Beginning entity relationship label is consistent, then using the remote supervisory training sample and label training sample as with entity relationship label
Credible training sample;If predict in the obtained entity relationship of remote supervisory training sample and the remote supervisory training sample just
Beginning entity relationship label is inconsistent, then deletes the remote supervisory training sample.
If predicting, obtained entity relationship is consistent with the initial labels of the remote supervisory training sample, it is determined that the long-range prison
The remote supervisory training sample that training sample is high confidence level is superintended and directed, if entity relationship and remote supervisory training sample that prediction obtains
This initial labels are inconsistent, it is determined that the remote supervisory training sample is the remote supervisory training sample of low confidence level, and straight
It connects and gives up the remote supervisory training sample.
Second group of output result is associated with the label in the remote supervisory training sample of the corresponding output result, according to
It is secondary to repeat the above steps, until all output results all complete by processing, the training text collection after being screened, i.e., credible instruction
Practice sample set, since the confidence level for screening obtained credible training sample set is higher, based on the credible training sample training
The recognition accuracy of the preset first instance relation recognition model got is also higher.
203, preset first instance relation recognition model is obtained based on the training of credible training sample set.
In order to illustrate the specific embodiment of step 203, as a kind of preferred embodiment, step 203 be can specifically include:
Using the first instance relation recognition model of initialization, to the credible training sample concentrate with entity relationship label can
Believe that training sample carries out entity relationship prediction;According to entity relationship prediction result and the entity relationship of credible training sample label pair
Network parameter in the first instance relation recognition model of initialization is trained, and obtains preset first instance relation recognition mould
Type.
By preset second instance relation recognition model and constructed GMM, the remote supervisory of low confidence level is filtered out
Training sample obtains credible training sample set, to realize the improvement to existing remote supervisory training method;And according to credible
Training sample set, the first instance relation recognition model training based on initialization obtain preset first instance relation recognition mould
Type improves the quasi- precision of identification of preset first instance relation recognition model by the quality of training for promotion sample set, in turn
The preset first instance relation recognition model enable more rapidly and accurately identifies each word in text to be identified
Between entity relationship, to determine semanteme that the text to be identified is characterized according to the obtained entity relationship of identification.
204, using preset first instance relation recognition model, the word of text to be identified is obtained using words vector dictionary
Vector sum term vector.
By the text vector for initialized after word segmentation processing to the text to be identified got, by initialization
Text vector inputs the embeding layer of preset first instance relation recognition model, and embeding layer utilizes preset term vector dictionary, and
It is matched based on text vector of the Word2Vec model to initialization, obtains the word vector sum word for characterizing text to be identified
Vector.Wherein, the corresponding word vector of each word in the text vector comprising initialization in preset words vector dictionary, and it is every
The corresponding term vector of a word.
205, convolution algorithms are carried out to obtained adjacent multiple word vector sum term vectors, obtain the text of text to be identified to
Amount.
The embeding layer of preset first instance relation recognition model further includes double-deck one-dimensional full convolutional coding structure, text to be identified
This word vector sum term vector obtains the text vector of text to be identified via double-deck one-dimensional full convolutional coding structure, output.Specifically
For, convolution algorithm (i.e. point multiplication operation) is carried out with the word vector sum term vector of text to be identified respectively using convolution kernel, and will
Text vector of all convolution algorithm results arrived as text to be identified.
206, the convolution algorithm result of the text vector is obtained according to the text vector of text to be identified.
The text vector of the text to be identified of embeding layer output is inputted to the volume of preset first instance relation recognition model
Lamination carries out convolution algorithm to text vector using the convolution kernel in convolutional layer, obtains convolution algorithm as a result, and utilizing default
First instance relation recognition model pond layer to convolution algorithm result carry out pond operation, obtain the convolution algorithm of Chi Huahou
As a result, what the convolution algorithm result of i.e. text vector was obtained by the convolution algorithm result that obtains via convolutional layer and via pond layer
The convolution algorithm result of Chi Huahou is constituted.
207, it is wrapped in text to be identified according to the convolution algorithm of the text vector and obtained text vector as a result, determining
The entity relationship contained.
Full articulamentum is according to text vector, and convolution algorithm that convolutional layer obtains is as a result, and the obtained Chi Huahou of pond layer
Convolution algorithm as a result, export the recessive character in text to be identified, recessive character for characterize in text to be identified word it
Between entity relationship.For example, if text to be identified is " Shanghai is located at China ", it is determined that the entity for including in text to be identified closes
System is geographical inclusion relation.
Technical solution by applying this embodiment, using preset first instance relation recognition model, according to getting
Text to be identified obtain the text vector of text to be identified, the text vector is obtained according to the text vector of text to be identified
Convolution algorithm as a result, and according to the text vector and obtained convolution algorithm as a result, determining in text to be identified and including
Entity relationship, wherein preset first instance relation recognition model be based on credible training sample set training obtain.With mesh
The preceding easy training sample set for being mixed into error training sample based on remote supervisory building, and then train what is obtained to be used for text
The lower identification model of accuracy rate for extracting entity relationship is compared, credible training sample set training of the present embodiment based on high quality
Obtained preset first instance relation recognition model, can effectively promote the recognition accuracy to entity relationship.
Further, the specific implementation as Fig. 1 method, the embodiment of the present application provide a kind of entity-relationship recognition dress
It sets, as shown in figure 3, the device includes: to obtain module 31, convolution algorithm module 32, entity relationship module 33.
Module 31 is obtained, can be used for using preset first instance relation recognition model, it is to be identified according to what is got
Text obtains the text vector of text to be identified;Wherein, the preset first instance relation recognition model is based on credible instruction
Practice what sample set training obtained;The acquisition module 31 is the main functional modules that the present apparatus identifies entity relationship.
Convolution algorithm module 32 can be used for obtaining institute according to the text vector for obtaining the text to be identified that module 31 obtains
State the convolution algorithm result of text vector;Convolution algorithm module 32 is the main functional modules that the present apparatus identifies entity relationship.
Entity relationship module 33 can be used for according to the text vector for obtaining the text to be identified that module 31 obtains, and
The convolution algorithm for the text vector that convolution algorithm module 32 obtains is as a result, determine that the entity for including in text to be identified closes
System;Entity relationship module 33 is that the present apparatus identifies the main functional modules of entity relationship and the corn module of the present apparatus.
In specific application scenarios, the acquisition module 31 specifically can be used for obtaining using term vector dictionary wait know
The word vector sum term vector of other text;Convolution algorithm is carried out to obtained adjacent multiple word vector sum term vectors, is obtained to be identified
The text vector of text.
In specific application scenarios, the acquisition module 31 specifically can be also used for the first instance using initialization
Relation recognition model carries out entity pass to the credible training sample with entity relationship label that the credible training sample is concentrated
System's prediction;The first instance relationship to initialization is marked according to entity relationship prediction result and the entity relationship of credible training sample
Network parameter in identification model is trained, and obtains preset first instance relation recognition model.
The device further includes sample module 34, second instance relation recognition model 35.
The sample module 34, can be used for constructing credible training sample set, and credible training sample set is by with entity
The credible training sample building of relation mark.
In specific application scenarios, the sample module 34 specifically can be used for utilizing preset second instance relationship
Identification model carries out entity relationship prediction to remote supervisory training sample;According to entity relationship prediction result and have entity relationship
The label training sample of label determines and obtains the credible training sample with entity relationship label.
In specific application scenarios, the sample module 34 specifically can be also used for closing using preset second instance
It is identification model, convolution algorithm is carried out to the label training sample with entity relationship label and obtains convolution algorithm result;According to
Entity relationship label in the convolution algorithm result and the label training sample, carries out the gauss hybrid models of initialization
Training obtains trained gauss hybrid models;Using trained gauss hybrid models, according to the label training sample pair
The remote supervisory training sample carries out entity relationship prediction.
In specific application scenarios, the sample module 34, if specifically can be also used for the remote supervisory that prediction obtains
The entity relationship of training sample is consistent with the initial solid relation mark in remote supervisory training sample label training sample,
Entity relationship label then is carried out to the remote supervisory training sample, and the remote supervisory of mark-up entity relationship is trained into sample
This and the label training sample are as the credible training sample with entity relationship label;If predicting obtained remote supervisory instruction
Entity relationship and the initial solid relation mark in remote supervisory label training sample for practicing sample are inconsistent, then delete institute
State remote supervisory training sample.
The second instance relation recognition model 35 can be used for carrying out the second instance relation recognition model of initialization
Training, obtains the preset second instance relation recognition model.
In specific application scenarios, the second instance relation recognition model 35 specifically can be used for utilizing initialization
Second instance relation recognition model, to entity relationship label label training sample carry out entity relationship prediction;According to
Entity relationship prediction result and the entity relationship of label training sample are marked in the second instance relation recognition model of initialization
Network parameter be trained, obtain preset second instance relation recognition model.
It should be noted that each functional unit involved by a kind of entity-relationship recognition device provided by the embodiments of the present application
Other are accordingly described, can be with reference to the corresponding description in Fig. 1 and Fig. 2, and details are not described herein.
Based on above-mentioned method as depicted in figs. 1 and 2, correspondingly, the embodiment of the present application also provides a kind of storage medium,
On be stored with computer program, which realizes above-mentioned entity-relationship recognition side as depicted in figs. 1 and 2 when being executed by processor
Method.
Based on this understanding, the technical solution of the application can be embodied in the form of software products, which produces
Product can store in a non-volatile memory medium (can be CD-ROM, USB flash disk, mobile hard disk etc.), including some instructions
With so that computer equipment (can be personal computer, server or the network equipment an etc.) execution the application is each
Method described in implement scene.
It is above-mentioned in order to realize based on above-mentioned method as shown in Figure 1 and Figure 2 and virtual bench embodiment shown in Fig. 3
Purpose, the embodiment of the present application also provides a kind of computer equipments, are specifically as follows personal computer, server, the network equipment
Deng the entity device includes storage medium and processor;Storage medium, for storing computer program;Processor, for executing
Computer program is to realize above-mentioned entity-relationship recognition method as depicted in figs. 1 and 2.
Optionally, which can also include user interface, network interface, camera, radio frequency (Radio
Frequency, RF) circuit, sensor, voicefrequency circuit, WI-FI module etc..User interface may include display screen
(Display), input unit such as keyboard (Keyboard) etc., optional user interface can also connect including USB interface, card reader
Mouthful etc..Network interface optionally may include standard wireline interface and wireless interface (such as blue tooth interface, WI-FI interface).
It will be understood by those skilled in the art that a kind of computer equipment structure provided in this embodiment is not constituted to the reality
The restriction of body equipment may include more or fewer components, perhaps combine certain components or different component layouts.
It can also include operating system, network communication module in storage medium.Operating system is that management computer equipment is hard
The program of part and software resource supports the operation of message handling program and other softwares and/or program.Network communication module is used
Communication between each component in realization storage medium inside, and communicated between other hardware and softwares in the entity device.
Through the above description of the embodiments, those skilled in the art can be understood that the application can borrow
It helps software that the mode of necessary general hardware platform is added to realize, hardware realization can also be passed through.Pass through the skill of application the application
Art scheme, and the easy training sample set for being mixed into error training sample constructed currently based on remote supervisory, and then training obtains
For to text extract entity relationship the lower identification model of accuracy rate compare, credible instruction of the present embodiment based on high quality
Practice the preset first instance relation recognition model that sample set training obtains, the identification that can be effectively promoted to entity relationship is accurate
Degree.
It will be appreciated by those skilled in the art that the accompanying drawings are only schematic diagrams of a preferred implementation scenario, module in attached drawing or
Process is not necessarily implemented necessary to the application.It will be appreciated by those skilled in the art that the mould in device in implement scene
Block can according to implement scene describe be distributed in the device of implement scene, can also carry out corresponding change be located at be different from
In one or more devices of this implement scene.The module of above-mentioned implement scene can be merged into a module, can also be into one
Step splits into multiple submodule.
Above-mentioned the application serial number is for illustration only, does not represent the superiority and inferiority of implement scene.Disclosed above is only the application
Several specific implementation scenes, still, the application is not limited to this, and the changes that any person skilled in the art can think of is all
The protection scope of the application should be fallen into.
Claims (10)
1. a kind of entity-relationship recognition method characterized by comprising
Using preset first instance relation recognition model, the text of text to be identified is obtained according to the text to be identified got
Vector;
The convolution algorithm result of the text vector is obtained according to the text vector of text to be identified;
According to the text vector and obtained convolution algorithm as a result, determining the entity relationship for including in text to be identified;
Wherein, the preset first instance relation recognition model be based on credible training sample set training obtain, it is described can
Letter training sample set is constructed by the credible training sample with entity relationship label.
2. the method according to claim 1, wherein the text to be identified that the basis is got obtain it is to be identified
The text vector of text, specifically includes:
The word vector sum term vector of text to be identified is obtained using term vector dictionary;
Convolution algorithm is carried out to obtained adjacent multiple word vector sum term vectors, obtains the text vector of text to be identified.
3. the method according to claim 1, wherein the credible training sample set is by with entity relationship mark
The credible training sample building of note, it specifically includes:
Entity relationship prediction is carried out to remote supervisory training sample using preset second instance relation recognition model;
According to entity relationship prediction result and the label training sample marked with entity relationship, obtain marking with entity relationship
Credible training sample.
4. according to the method described in claim 3, it is characterized in that, described utilize preset second instance relation recognition model pair
Remote supervisory training sample carries out entity relationship prediction, specifically includes:
Using preset second instance relation recognition model, convolution fortune is carried out to the label training sample with entity relationship label
Calculation obtains convolution algorithm result;
According to the entity relationship label in the convolution algorithm result and the label training sample, to the Gaussian Mixture of initialization
Model is trained to obtain trained gauss hybrid models;
Using trained gauss hybrid models, entity relationship prediction is carried out to the remote supervisory training sample.
5. the method according to claim 3 or 4, which is characterized in that described according to entity relationship prediction result and with real
The label training sample of body relation mark obtains specifically including with the credible training sample of entity relationship label:
If the initial solid of the entity relationship and the remote supervisory training sample of predicting obtained remote supervisory training sample closes
System's label is consistent, then the remote supervisory training sample and the label training sample are used as with entity relationship mark can
Believe training sample;
If the entity relationship for predicting obtained remote supervisory training sample and the initial solid in the remote supervisory training sample
Relation mark is inconsistent, then deletes the remote supervisory training sample.
6. according to the method described in claim 3, it is characterized in that, the preset second instance relation recognition model is to first
What the second instance relation recognition model of beginningization was trained;
The preset second instance relation recognition model is trained to the second instance relation recognition model of initialization
It arrives, specifically includes:
Using the second instance relation recognition model of initialization, entity is carried out to the label training sample with entity relationship label
Relationship Prediction;
The second instance relationship of initialization is known according to the entity relationship label of entity relationship prediction result and label training sample
Network parameter in other model is trained, and obtains preset second instance relation recognition model.
7. method according to claim 1 or 3, which is characterized in that the preset first instance relation recognition model is
It is obtained, is specifically included based on the training of credible training sample set:
Using the first instance relation recognition model of initialization, marked to what the credible training sample was concentrated with entity relationship
Credible training sample carry out entity relationship prediction;
The first instance relationship of initialization is known according to entity relationship prediction result and the entity relationship of credible training sample label
Network parameter in other model is trained, and obtains preset first instance relation recognition model.
8. a kind of entity-relationship recognition device characterized by comprising
Obtain module, for utilize preset first instance relation recognition model, according to the text to be identified got obtain to
Identify the text vector of text;
Convolution algorithm module, for obtaining the convolution algorithm result of the text vector according to the text vector of text to be identified;
Entity relationship module, for being wrapped according to the text vector and obtained convolution algorithm as a result, determining in text to be identified
The entity relationship contained;
Wherein, the preset first instance relation recognition model is obtained based on the training of credible training sample set.
9. a kind of storage medium, is stored thereon with computer program, which is characterized in that realization when described program is executed by processor
Entity-relationship recognition method described in any one of claims 1 to 7.
10. a kind of computer equipment, including storage medium, processor and storage can be run on a storage medium and on a processor
Computer program, which is characterized in that the processor is realized described in any one of claims 1 to 7 when executing described program
Entity-relationship recognition method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910559111.9A CN110457677B (en) | 2019-06-26 | 2019-06-26 | Entity relationship identification method and device, storage medium and computer equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910559111.9A CN110457677B (en) | 2019-06-26 | 2019-06-26 | Entity relationship identification method and device, storage medium and computer equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110457677A true CN110457677A (en) | 2019-11-15 |
CN110457677B CN110457677B (en) | 2023-11-17 |
Family
ID=68481090
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910559111.9A Active CN110457677B (en) | 2019-06-26 | 2019-06-26 | Entity relationship identification method and device, storage medium and computer equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110457677B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111192692A (en) * | 2020-01-02 | 2020-05-22 | 上海联影智能医疗科技有限公司 | Entity relationship determination method and device, electronic equipment and storage medium |
CN111274412A (en) * | 2020-01-22 | 2020-06-12 | 腾讯科技(深圳)有限公司 | Information extraction method, information extraction model training device and storage medium |
CN111338338A (en) * | 2020-02-20 | 2020-06-26 | 山东科技大学 | Robot speed self-adaptive control method based on road surface characteristic cluster analysis |
CN111552812A (en) * | 2020-04-29 | 2020-08-18 | 深圳数联天下智能科技有限公司 | Method and device for determining relation category between entities and computer equipment |
CN111651575A (en) * | 2020-05-29 | 2020-09-11 | 泰康保险集团股份有限公司 | Session text processing method, device, medium and electronic equipment |
CN112069329A (en) * | 2020-09-11 | 2020-12-11 | 腾讯科技(深圳)有限公司 | Text corpus processing method, device, equipment and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107239446A (en) * | 2017-05-27 | 2017-10-10 | 中国矿业大学 | A kind of intelligence relationship extracting method based on neutral net Yu notice mechanism |
CN107729497A (en) * | 2017-10-20 | 2018-02-23 | 同济大学 | A kind of word insert depth learning method of knowledge based collection of illustrative plates |
CN107943784A (en) * | 2017-11-02 | 2018-04-20 | 南华大学 | Relation extraction method based on generation confrontation network |
CN109299457A (en) * | 2018-09-06 | 2019-02-01 | 北京奇艺世纪科技有限公司 | A kind of opining mining method, device and equipment |
WO2019094895A1 (en) * | 2017-11-13 | 2019-05-16 | Promptu Systems Corporation | Systems and methods for adaptive proper name entity recognition and understanding |
CN109815339A (en) * | 2019-01-02 | 2019-05-28 | 平安科技(深圳)有限公司 | Based on TextCNN Knowledge Extraction Method, device, computer equipment and storage medium |
-
2019
- 2019-06-26 CN CN201910559111.9A patent/CN110457677B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107239446A (en) * | 2017-05-27 | 2017-10-10 | 中国矿业大学 | A kind of intelligence relationship extracting method based on neutral net Yu notice mechanism |
CN107729497A (en) * | 2017-10-20 | 2018-02-23 | 同济大学 | A kind of word insert depth learning method of knowledge based collection of illustrative plates |
CN107943784A (en) * | 2017-11-02 | 2018-04-20 | 南华大学 | Relation extraction method based on generation confrontation network |
WO2019094895A1 (en) * | 2017-11-13 | 2019-05-16 | Promptu Systems Corporation | Systems and methods for adaptive proper name entity recognition and understanding |
CN109299457A (en) * | 2018-09-06 | 2019-02-01 | 北京奇艺世纪科技有限公司 | A kind of opining mining method, device and equipment |
CN109815339A (en) * | 2019-01-02 | 2019-05-28 | 平安科技(深圳)有限公司 | Based on TextCNN Knowledge Extraction Method, device, computer equipment and storage medium |
Non-Patent Citations (1)
Title |
---|
ANDRES VIGNAGA: "Typing Textual Entities and M2T/T2M Transformations in a Model Management Environment", 《2009 INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY》, pages 115 - 122 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111192692A (en) * | 2020-01-02 | 2020-05-22 | 上海联影智能医疗科技有限公司 | Entity relationship determination method and device, electronic equipment and storage medium |
CN111192692B (en) * | 2020-01-02 | 2023-12-08 | 上海联影智能医疗科技有限公司 | Entity relationship determination method and device, electronic equipment and storage medium |
CN111274412A (en) * | 2020-01-22 | 2020-06-12 | 腾讯科技(深圳)有限公司 | Information extraction method, information extraction model training device and storage medium |
CN111338338A (en) * | 2020-02-20 | 2020-06-26 | 山东科技大学 | Robot speed self-adaptive control method based on road surface characteristic cluster analysis |
CN111338338B (en) * | 2020-02-20 | 2024-01-16 | 山东科技大学 | Robot speed self-adaptive control method based on road surface feature cluster analysis |
CN111552812A (en) * | 2020-04-29 | 2020-08-18 | 深圳数联天下智能科技有限公司 | Method and device for determining relation category between entities and computer equipment |
CN111552812B (en) * | 2020-04-29 | 2023-05-12 | 深圳数联天下智能科技有限公司 | Method, device and computer equipment for determining relationship category between entities |
CN111651575A (en) * | 2020-05-29 | 2020-09-11 | 泰康保险集团股份有限公司 | Session text processing method, device, medium and electronic equipment |
CN111651575B (en) * | 2020-05-29 | 2023-09-12 | 泰康保险集团股份有限公司 | Session text processing method, device, medium and electronic equipment |
CN112069329A (en) * | 2020-09-11 | 2020-12-11 | 腾讯科技(深圳)有限公司 | Text corpus processing method, device, equipment and storage medium |
CN112069329B (en) * | 2020-09-11 | 2024-03-15 | 腾讯科技(深圳)有限公司 | Text corpus processing method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110457677B (en) | 2023-11-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111476284B (en) | Image recognition model training and image recognition method and device and electronic equipment | |
CN110457677A (en) | Entity-relationship recognition method and device, storage medium, computer equipment | |
CN113486981B (en) | RGB image classification method based on multi-scale feature attention fusion network | |
CN109165645A (en) | A kind of image processing method, device and relevant device | |
CN108304835A (en) | character detecting method and device | |
CN107908789A (en) | Method and apparatus for generating information | |
CN107688823A (en) | A kind of characteristics of image acquisition methods and device, electronic equipment | |
CN109117781A (en) | Method for building up, device and the more attribute recognition approaches of more attribute Recognition Models | |
CN108229341A (en) | Sorting technique and device, electronic equipment, computer storage media, program | |
CN112989085B (en) | Image processing method, device, computer equipment and storage medium | |
CN107545038B (en) | Text classification method and equipment | |
CN111475613A (en) | Case classification method and device, computer equipment and storage medium | |
CN110287311B (en) | Text classification method and device, storage medium and computer equipment | |
CN108734212A (en) | A kind of method and relevant apparatus of determining classification results | |
CN109902285A (en) | Corpus classification method, device, computer equipment and storage medium | |
CN108959474A (en) | Entity relationship extracting method | |
CN113657087B (en) | Information matching method and device | |
CN113094533B (en) | Image-text cross-modal retrieval method based on mixed granularity matching | |
CN105989336A (en) | Scene recognition method based on deconvolution deep network learning with weight | |
CN109492093A (en) | File classification method and electronic device based on gauss hybrid models and EM algorithm | |
CN108319888A (en) | The recognition methods of video type and device, terminal | |
CN107239775A (en) | Terrain classification method and device | |
US11893773B2 (en) | Finger vein comparison method, computer equipment, and storage medium | |
CN109583367A (en) | Image text row detection method and device, storage medium and electronic equipment | |
CN117036843A (en) | Target detection model training method, target detection method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |