CN110457677A - Entity-relationship recognition method and device, storage medium, computer equipment - Google Patents

Entity-relationship recognition method and device, storage medium, computer equipment Download PDF

Info

Publication number
CN110457677A
CN110457677A CN201910559111.9A CN201910559111A CN110457677A CN 110457677 A CN110457677 A CN 110457677A CN 201910559111 A CN201910559111 A CN 201910559111A CN 110457677 A CN110457677 A CN 110457677A
Authority
CN
China
Prior art keywords
training sample
text
entity relationship
entity
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910559111.9A
Other languages
Chinese (zh)
Other versions
CN110457677B (en
Inventor
肖京
徐亮
金戈
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910559111.9A priority Critical patent/CN110457677B/en
Publication of CN110457677A publication Critical patent/CN110457677A/en
Application granted granted Critical
Publication of CN110457677B publication Critical patent/CN110457677B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Animal Behavior & Ethology (AREA)
  • Character Discrimination (AREA)
  • Machine Translation (AREA)

Abstract

This application discloses entity-relationship recognition method and device, storage medium, computer equipments, are related to technical field of information processing, can effectively promote the recognition accuracy to entity relationship.Wherein method includes: to obtain the text vector of text to be identified according to the text to be identified got using preset first instance relation recognition model;The convolution algorithm result of the text vector is obtained according to the text vector of text to be identified;According to the text vector and obtained convolution algorithm as a result, determining the entity relationship for including in text to be identified;Wherein, the preset first instance relation recognition model is obtained based on the training of credible training sample set.The application is suitable for the identification of text entities relationship.

Description

Entity-relationship recognition method and device, storage medium, computer equipment
Technical field
This application involves technical field of information processing, are situated between particularly with regard to entity-relationship recognition method and device, storage Matter and computer equipment.
Background technique
With the development of science and technology, more and more for the relation recognition method between some words and word, it fits Scene is also more and more extensive, such as the upper and lower relation between some place names, the hierarchical relationship between national structure, article kind The inclusion relation etc. of class, and these needs are trained neural network using a large amount of sample data, and then establish corresponding Identification model is to realize the extraction to the relationship (that is, entity relationship) in text between word and word.
The shortcomings of the prior art is that can effectively construct training sample set based on remote supervisory to realize to identification The training of model, but training sample set is still easy to be mixed into during building the training sample of mistake, trained to the later period To the accuracy of identification of identification model be affected, the standard of entity relationship is extracted to text so as to cause the identification model after training True rate is lower, influences the usage experience of user.
Summary of the invention
In view of this, this application provides entity-relationship recognition method and device, storage medium, computer equipments, mainly Purpose is the training sample for solving to be easy to be mixed into mistake when constructing training sample currently based on remote supervisory, so as to cause training The lower technical problem of the accuracy rate that identification model afterwards extracts entity relationship to text.
According to the one aspect of the application, a kind of entity-relationship recognition method is provided, this method comprises:
Using preset first instance relation recognition model, text to be identified is obtained according to the text to be identified got Text vector;
The convolution algorithm result of the text vector is obtained according to the text vector of text to be identified;
According to the text vector and obtained convolution algorithm as a result, determining the entity relationship for including in text to be identified;
Wherein, the preset first instance relation recognition model is obtained based on the training of credible training sample set.
According to the another aspect of the application, a kind of entity-relationship recognition device is provided, which includes:
Module is obtained, for utilizing preset first instance relation recognition model, is obtained according to the text to be identified got To the text vector of text to be identified;
Convolution algorithm module, for obtaining the convolution algorithm knot of the text vector according to the text vector of text to be identified Fruit;
Entity relationship module, for according to the text vector and obtained convolution algorithm as a result, determining text to be identified In include entity relationship;
Wherein, the preset first instance relation recognition model is obtained based on the training of credible training sample set.
According to the application another aspect, a kind of storage medium is provided, computer program, described program are stored thereon with Above-mentioned entity-relationship recognition method is realized when being executed by processor.
According to the application another aspect, a kind of computer equipment is provided, including storage medium, processor and be stored in On storage medium and the computer program that can run on a processor, the processor realize above-mentioned entity when executing described program Relation recognition method.
By above-mentioned technical proposal, entity-relationship recognition method and device provided by the present application, storage medium, computer are set It is standby, and the easy training sample set for being mixed into error training sample constructed currently based on remote supervisory, and then the use that training obtains It is compared in the lower identification model of accuracy rate for extracting entity relationship to text, the application is known using preset first instance relationship Other model obtains the text vector of text to be identified according to the text to be identified got, according to the text of text to be identified to The convolution algorithm of the text vector is measured as a result, and according to the text vector and obtained convolution algorithm as a result, really The entity relationship for including in fixed text to be identified, wherein preset first instance relation recognition model is based on credible trained sample What this training was got, therefore, the preset first instance relationship that the credible training sample set training based on high quality obtains is known Other model can effectively promote the recognition accuracy to entity relationship.
Above description is only the general introduction of technical scheme, in order to better understand the technological means of the application, And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects, features and advantages of the application can It is clearer and more comprehensible, below the special specific embodiment for lifting the application.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present application, constitutes part of this application, this Shen Illustrative embodiments and their description please are not constituted an undue limitation on the present application for explaining the application.In the accompanying drawings:
Fig. 1 shows a kind of flow diagram of entity-relationship recognition method provided by the embodiments of the present application;
Fig. 2 shows the flow diagrams of another entity-relationship recognition method provided by the embodiments of the present application;
Fig. 3 shows a kind of structural schematic diagram of entity-relationship recognition device provided by the embodiments of the present application.
Specific embodiment
The application is described in detail below with reference to attached drawing and in conjunction with the embodiments.It should be noted that not conflicting In the case of, the features in the embodiments and the embodiments of the present application can be combined with each other.
For the training sample for being easy to be mixed into mistake when remote supervisory constructs training sample at present, after training Identification model the lower technical problem of accuracy rate of entity relationship is extracted to text.Present embodiments provide a kind of entity relationship Recognition methods can extract the higher entity-relationship recognition model of entity relationship accuracy to text by building, to improve Recognition accuracy to the entity relationship in text, as shown in Figure 1, this method comprises:
101, using preset first instance relation recognition model, text to be identified is obtained according to the text to be identified got This text vector.Wherein, the preset first instance relation recognition model is obtained based on the training of credible training sample set , the credible training sample set is constructed by the credible training sample with entity relationship label.
Text to be identified is obtained, the text to be identified got is pre-processed, the text vector initialized, and The text vector of initialization is inputted to the embeding layer of preset first instance relation recognition model, is generated for characterizing text to be identified This text vector.
Wherein, pretreatment can specifically be set according to actual application scenarios, such as set the pretreatment as participle Processing carries out the mark of word segmentation to text to be identified that is, as unit of word;Or set the pretreatment as word Screening Treatment, i.e., After carrying out the mark of word segmentation to text to be identified as unit of word, unessential word is rejected, for example, " can, should " etc. helps The unessential word such as the interjections such as verb, and " oh, ", to promote the identification effect to the entity relationship in text to be identified Rate does not limit pretreatment specifically herein.
102, the convolution algorithm result of the text vector is obtained according to the text vector of text to be identified.
It is defeated after the text vector of text to be identified completes a series of operation via convolutional layer, pond layer and full articulamentum Multidimensional characteristic vectors out comprising entity relationship in initialization text vector, to realize in the text vector of text to be identified The capture and extraction of relation information between word.
103, according to the text vector and obtained convolution algorithm as a result, determining that the entity for including in text to be identified closes System.
By convolution algorithm that convolutional layer obtains as a result, and the obtained convolution algorithm result input of Chi Huahou of pond layer it is pre- If first instance relation recognition model full articulamentum, full articulamentum is using activation primitive softmax to obtained each single item The convolution algorithm result of convolution kernel output is associated, and the convolution algorithm after being associated with is as a result, and will be after obtained association Convolution algorithm result is combined with the convolution algorithm result for the Chi Huahou that pond layer exports, and the recessiveness exported in text to be identified is special Sign, recessive character are used to characterize the entity relationship in text to be identified between word.
It can be according to above scheme, using preset first instance relation recognition model, according to acquisition for the present embodiment To text to be identified obtain the text vector of text to be identified, according to the text vector of text to be identified obtain the text to The convolution algorithm of amount according to the text vector and obtained convolution algorithm as a result, determining in text to be identified as a result, and wrap The entity relationship contained, wherein preset first instance relation recognition model is obtained based on the training of credible training sample set, with Currently based on the easy training sample set for being mixed into error training sample that remote supervisory constructs, and then train what is obtained to be used for text The lower identification model of the accuracy rate of this extraction entity relationship is compared, and the present embodiment can be based on the credible training sample of high quality The preset first instance relation recognition model that training is got, can effectively promote the recognition accuracy to entity relationship.
Further, as the refinement and extension of above-described embodiment specific embodiment, in order to completely illustrate the present embodiment Specific implementation process, provide another entity-relationship recognition method, as shown in Fig. 2, this method comprises:
201, the second instance relation recognition model of initialization is trained, obtains preset second instance relation recognition Model.
Preset second instance relation recognition model is for constructing credible training sample set, in the second instance to initialization Relation recognition model obtains the initialization instruction for training the second instance relation recognition model of the initialization before being trained Practice sample set.Initialization training sample set is obtained specifically, obtaining a large amount of text data as initialization training sample set Training sample, each training sample include triple, and three features of triple, that is, text, three features are for indicating the instruction Practice sample in word name entity and word between entity relationship, i.e., two name entity E1, E2 and they between Relationship R is expressed as (E1, R, E2).Label label is carried out to each training sample, obtains the label instruction with entity relationship label Practice sample, the entity relationship classification that label, that is, training sample is included, for example, the entity relationship class in training sample between word Do not belong to internet business class, financial class perhaps GEOGRAPHIC ATTRIBUTES etc. or the specific entity relationship classification of refinement, for example, " in State ", " Shanghai " are geographical inclusion relation, " financial service ", " insurance production in the knowledge mapping based on insurance products intelligent customer service Product " are clause inclusion relation etc., so that the second instance relation recognition model of initialization is according to the ternary in label training sample Group carries out Tag Estimation to label training sample, and by the true mark of Tag Estimation result and the label training sample real marking Label are compared, to obtain the higher preset second instance relation recognition model of recognition accuracy by repetitive exercise.
In order to illustrate the specific embodiment of step 201, as a kind of preferred embodiment, step 201 be can specifically include: Using the second instance relation recognition model of initialization, entity relationship is carried out to the label training sample with entity relationship label Prediction;The second instance relationship of initialization is known according to the entity relationship label of entity relationship prediction result and label training sample Network parameter in other model is trained, and obtains preset second instance relation recognition model.
Label training sample with entity relationship label is the initialization text vector marked with entity relationship, initially The second instance relation recognition model of change is according to the initialization text vector with entity relationship label, via embeding layer, convolution Layer, pond layer and full articulamentum complete a series of operation, are believed with realizing the relationship in initialization text vector between word The capture and extraction of breath, thus multidimensional characteristic vectors of the output comprising entity relationship in initialization text vector, so as to according to defeated Multidimensional characteristic vectors out, training obtain the preset second instance relation recognition model of entity relationship in text for identification.
In practical application scene, text vector is initialized via the second instance of initialization with entity relationship label The embeding layer of relation recognition model exports to obtain the term vector for corresponding to the initialization text vector, and the term vector that output is obtained is defeated The convolutional layer for entering the second instance relation recognition model of initialization, specifically, in the term vector that convolutional layer exports embeding layer Adjacent n term vector carries out convolution algorithm, for example, set convolution kernel length is 3, i.e., using dimension for 3 convolution kernel to owning 3 adjacent term vectors carry out convolution algorithm, obtain the convolution algorithm result of each single item convolution kernel output.By each single item convolution kernel The pond layer of the second instance relation recognition model of the convolution algorithm result input initialization of output, specifically, pond layer is to defeated The convolution algorithm result of each single item convolution kernel output entered carries out pond operation, extracts the convolution fortune of the Chi Huahou in a fixed step size It calculates as a result, pond operation can be maximum pond, average pond etc..It is captured between adjacent term vector using convolutional layer and pond layer Relation information, with realize to initialization text vector local message capture.
Convolution algorithm that convolutional layer is obtained as a result, and pond Hua Ceng Chi Huahou convolution algorithm result input initialization The full articulamentum of second instance relation recognition model, convolution algorithm result of the full articulamentum to obtained each single item convolution kernel output It is associated, the convolution algorithm after being associated with is as a result, and export the convolution algorithm result after obtained association with pond layer The convolution algorithm result of Chi Huahou combine, the recessive character in initialization text vector is obtained, to realize to initialization The capture for the global information for including in text vector.Wherein, recessive character is for characterizing in initialization text vector between word Entity relationship.
According to the recessive character of obtained initialization text vector, after successive ignition training, obtain literary for identification The preset second instance relation recognition model of entity relationship in this.
202, credible training sample set is constructed according to the credible training sample with entity relationship label.
In practical application scene, by screening remote supervisory training sample, obtain marking with entity relationship Credible training sample.Specifically, obtain remote supervisory training sample, the remote supervisory training sample be initialize text to Remote supervisory training sample is inputted preset second instance relation recognition model, and utilizes constructed Gaussian Mixture mould by amount Type obtains the output for characterizing the entity relationship for including in remote supervisory training sample as a result, according to output result and with real The label training sample of body relation mark obtains the credible training sample for constructing credible training sample set.
It is marked as a kind of preferred embodiment according to entity relationship in order to illustrate the specific embodiment of step 202 Credible training sample construct credible training sample set, can specifically include: utilizing preset second instance relation recognition model Entity relationship prediction is carried out to remote supervisory training sample;According to entity relationship prediction result and the mark marked with entity relationship Remember training sample, obtains the credible training sample with entity relationship label.
Preset second is utilized as a kind of preferred embodiment in order to further illustrate the specific embodiment of step 202 Entity-relationship recognition model carries out entity relationship prediction to remote supervisory training sample, can specifically include: utilizing preset the Two entity-relationship recognition models carry out convolution algorithm to the label training sample with entity relationship label and obtain convolution algorithm knot Fruit;According to the entity relationship label in the convolution algorithm result and the label training sample, to the Gaussian Mixture of initialization Model is trained to obtain trained gauss hybrid models;Using trained gauss hybrid models, to remote supervisory training Sample carries out entity relationship prediction.
In practical application scene, a large amount of remote supervisory training samples are inputted into preset second instance relation recognition mould Type is sequentially output using the full articulamentum of preset second instance relation recognition model for characterizing in remote supervisory training sample The output for the entity relationship for including as a result, using the gauss hybrid models (GMM:Gaussian Mixed Model) built, According to the output sequence of output result, successively closed with the label in the remote supervisory training sample of the corresponding output result Connection, by taking first group of output result as an example, for the remote supervisory training sample of first group of output result and the corresponding output result In the realization process that is associated of label specifically:
It should be noted that the specific training process of trained GMM is, it is assumed that preset second instance relation recognition mould The full articulamentum of type be sequentially output for characterize the entity relationship for including in remote supervisory training sample output result (that is, For training initialization GMM training sample set) in include L group with entity relationship mark label training sample (xi, yi) and The remote supervisory training sample x that u group is extracted by remote supervisoryL+ j, wherein 1≤i≤L, 1≤j≤u, then training sample set D={ (x1, y1), (x2, y2) ..., (xL, yL), xL+ 1, xL+ 2 ..., xL+ u }, it is concentrated according to training sample and has entity relationship mark The label training sample of note, building initialization GMM, and obtained according to the label training sample training with entity relationship label The network parameter of GMM, to obtain trained GMM.
Assuming that the label training sample with entity relationship label includes m class, the label of entity relationship label is had with L group For training sample, if γijIndicate label training sample xjBelong to the probability value of the i-th class, then its γijValue is for class shown in label Biao Ji not be 0 for category label shown in remaining.For example, according to the demand of practical application scene, i-th Gaussian component is The i-th class in training sample is marked, i.e. the i-th class is clause inclusion relation, γijIndicate label training sample xjBelonging to clause includes The probability value of relationship.
The calculation formula of the probability distribution of GMM is as follows:
Wherein, N (x | μi,∑i) indicate GMM in i-th of Gaussian component, π is mixed coefficint, is equivalent to each component Weight, x are feature vector (i.e. training sample), and μ is the mean vector of x, and ∑ is covariance matrix.
Using EM algorithm (EM:Expectation-Maximization algorithm), instructed according to L group echo Practice the initial parameter π that sample determines GMMi、μi、∑i, the initial network parameter π of GMMi、μi、∑iCalculation formula it is as follows:
During carrying out parameter Estimation to GMM, using Expectation step (E step), according to initial network Parameter πi、μi、∑iLabel classification belonging to predictive marker training sample;And utilize Maximization step (M Step), the label classification of the label training sample obtained according to prediction updates initial parameter πi、μi、∑i
Wherein, the calculation formula of E step is as follows:
The calculation formula of M step is as follows:
E step and M step are repeated in based on semi-supervised learning method until convergence, obtains trained parameter πi、 μi、∑i, to obtain trained GMM.Using trained GMM, entity relationship prediction is carried out to remote supervisory training sample, According to entity relationship prediction result and with entity relationship label label training sample, obtain with entity relationship label can Believe training sample.
It is pre- according to entity relationship as a kind of preferred embodiment in order to further illustrate the specific embodiment of step 202 Result and the label training sample with entity relationship label are surveyed, the credible training sample with entity relationship label, tool are obtained If body may include: in the entity relationship of remote supervisory training sample that prediction obtains and the remote supervisory training sample just Beginning entity relationship label is consistent, then using the remote supervisory training sample and label training sample as with entity relationship label Credible training sample;If predict in the obtained entity relationship of remote supervisory training sample and the remote supervisory training sample just Beginning entity relationship label is inconsistent, then deletes the remote supervisory training sample.
If predicting, obtained entity relationship is consistent with the initial labels of the remote supervisory training sample, it is determined that the long-range prison The remote supervisory training sample that training sample is high confidence level is superintended and directed, if entity relationship and remote supervisory training sample that prediction obtains This initial labels are inconsistent, it is determined that the remote supervisory training sample is the remote supervisory training sample of low confidence level, and straight It connects and gives up the remote supervisory training sample.
Second group of output result is associated with the label in the remote supervisory training sample of the corresponding output result, according to It is secondary to repeat the above steps, until all output results all complete by processing, the training text collection after being screened, i.e., credible instruction Practice sample set, since the confidence level for screening obtained credible training sample set is higher, based on the credible training sample training The recognition accuracy of the preset first instance relation recognition model got is also higher.
203, preset first instance relation recognition model is obtained based on the training of credible training sample set.
In order to illustrate the specific embodiment of step 203, as a kind of preferred embodiment, step 203 be can specifically include: Using the first instance relation recognition model of initialization, to the credible training sample concentrate with entity relationship label can Believe that training sample carries out entity relationship prediction;According to entity relationship prediction result and the entity relationship of credible training sample label pair Network parameter in the first instance relation recognition model of initialization is trained, and obtains preset first instance relation recognition mould Type.
By preset second instance relation recognition model and constructed GMM, the remote supervisory of low confidence level is filtered out Training sample obtains credible training sample set, to realize the improvement to existing remote supervisory training method;And according to credible Training sample set, the first instance relation recognition model training based on initialization obtain preset first instance relation recognition mould Type improves the quasi- precision of identification of preset first instance relation recognition model by the quality of training for promotion sample set, in turn The preset first instance relation recognition model enable more rapidly and accurately identifies each word in text to be identified Between entity relationship, to determine semanteme that the text to be identified is characterized according to the obtained entity relationship of identification.
204, using preset first instance relation recognition model, the word of text to be identified is obtained using words vector dictionary Vector sum term vector.
By the text vector for initialized after word segmentation processing to the text to be identified got, by initialization Text vector inputs the embeding layer of preset first instance relation recognition model, and embeding layer utilizes preset term vector dictionary, and It is matched based on text vector of the Word2Vec model to initialization, obtains the word vector sum word for characterizing text to be identified Vector.Wherein, the corresponding word vector of each word in the text vector comprising initialization in preset words vector dictionary, and it is every The corresponding term vector of a word.
205, convolution algorithms are carried out to obtained adjacent multiple word vector sum term vectors, obtain the text of text to be identified to Amount.
The embeding layer of preset first instance relation recognition model further includes double-deck one-dimensional full convolutional coding structure, text to be identified This word vector sum term vector obtains the text vector of text to be identified via double-deck one-dimensional full convolutional coding structure, output.Specifically For, convolution algorithm (i.e. point multiplication operation) is carried out with the word vector sum term vector of text to be identified respectively using convolution kernel, and will Text vector of all convolution algorithm results arrived as text to be identified.
206, the convolution algorithm result of the text vector is obtained according to the text vector of text to be identified.
The text vector of the text to be identified of embeding layer output is inputted to the volume of preset first instance relation recognition model Lamination carries out convolution algorithm to text vector using the convolution kernel in convolutional layer, obtains convolution algorithm as a result, and utilizing default First instance relation recognition model pond layer to convolution algorithm result carry out pond operation, obtain the convolution algorithm of Chi Huahou As a result, what the convolution algorithm result of i.e. text vector was obtained by the convolution algorithm result that obtains via convolutional layer and via pond layer The convolution algorithm result of Chi Huahou is constituted.
207, it is wrapped in text to be identified according to the convolution algorithm of the text vector and obtained text vector as a result, determining The entity relationship contained.
Full articulamentum is according to text vector, and convolution algorithm that convolutional layer obtains is as a result, and the obtained Chi Huahou of pond layer Convolution algorithm as a result, export the recessive character in text to be identified, recessive character for characterize in text to be identified word it Between entity relationship.For example, if text to be identified is " Shanghai is located at China ", it is determined that the entity for including in text to be identified closes System is geographical inclusion relation.
Technical solution by applying this embodiment, using preset first instance relation recognition model, according to getting Text to be identified obtain the text vector of text to be identified, the text vector is obtained according to the text vector of text to be identified Convolution algorithm as a result, and according to the text vector and obtained convolution algorithm as a result, determining in text to be identified and including Entity relationship, wherein preset first instance relation recognition model be based on credible training sample set training obtain.With mesh The preceding easy training sample set for being mixed into error training sample based on remote supervisory building, and then train what is obtained to be used for text The lower identification model of accuracy rate for extracting entity relationship is compared, credible training sample set training of the present embodiment based on high quality Obtained preset first instance relation recognition model, can effectively promote the recognition accuracy to entity relationship.
Further, the specific implementation as Fig. 1 method, the embodiment of the present application provide a kind of entity-relationship recognition dress It sets, as shown in figure 3, the device includes: to obtain module 31, convolution algorithm module 32, entity relationship module 33.
Module 31 is obtained, can be used for using preset first instance relation recognition model, it is to be identified according to what is got Text obtains the text vector of text to be identified;Wherein, the preset first instance relation recognition model is based on credible instruction Practice what sample set training obtained;The acquisition module 31 is the main functional modules that the present apparatus identifies entity relationship.
Convolution algorithm module 32 can be used for obtaining institute according to the text vector for obtaining the text to be identified that module 31 obtains State the convolution algorithm result of text vector;Convolution algorithm module 32 is the main functional modules that the present apparatus identifies entity relationship.
Entity relationship module 33 can be used for according to the text vector for obtaining the text to be identified that module 31 obtains, and The convolution algorithm for the text vector that convolution algorithm module 32 obtains is as a result, determine that the entity for including in text to be identified closes System;Entity relationship module 33 is that the present apparatus identifies the main functional modules of entity relationship and the corn module of the present apparatus.
In specific application scenarios, the acquisition module 31 specifically can be used for obtaining using term vector dictionary wait know The word vector sum term vector of other text;Convolution algorithm is carried out to obtained adjacent multiple word vector sum term vectors, is obtained to be identified The text vector of text.
In specific application scenarios, the acquisition module 31 specifically can be also used for the first instance using initialization Relation recognition model carries out entity pass to the credible training sample with entity relationship label that the credible training sample is concentrated System's prediction;The first instance relationship to initialization is marked according to entity relationship prediction result and the entity relationship of credible training sample Network parameter in identification model is trained, and obtains preset first instance relation recognition model.
The device further includes sample module 34, second instance relation recognition model 35.
The sample module 34, can be used for constructing credible training sample set, and credible training sample set is by with entity The credible training sample building of relation mark.
In specific application scenarios, the sample module 34 specifically can be used for utilizing preset second instance relationship Identification model carries out entity relationship prediction to remote supervisory training sample;According to entity relationship prediction result and have entity relationship The label training sample of label determines and obtains the credible training sample with entity relationship label.
In specific application scenarios, the sample module 34 specifically can be also used for closing using preset second instance It is identification model, convolution algorithm is carried out to the label training sample with entity relationship label and obtains convolution algorithm result;According to Entity relationship label in the convolution algorithm result and the label training sample, carries out the gauss hybrid models of initialization Training obtains trained gauss hybrid models;Using trained gauss hybrid models, according to the label training sample pair The remote supervisory training sample carries out entity relationship prediction.
In specific application scenarios, the sample module 34, if specifically can be also used for the remote supervisory that prediction obtains The entity relationship of training sample is consistent with the initial solid relation mark in remote supervisory training sample label training sample, Entity relationship label then is carried out to the remote supervisory training sample, and the remote supervisory of mark-up entity relationship is trained into sample This and the label training sample are as the credible training sample with entity relationship label;If predicting obtained remote supervisory instruction Entity relationship and the initial solid relation mark in remote supervisory label training sample for practicing sample are inconsistent, then delete institute State remote supervisory training sample.
The second instance relation recognition model 35 can be used for carrying out the second instance relation recognition model of initialization Training, obtains the preset second instance relation recognition model.
In specific application scenarios, the second instance relation recognition model 35 specifically can be used for utilizing initialization Second instance relation recognition model, to entity relationship label label training sample carry out entity relationship prediction;According to Entity relationship prediction result and the entity relationship of label training sample are marked in the second instance relation recognition model of initialization Network parameter be trained, obtain preset second instance relation recognition model.
It should be noted that each functional unit involved by a kind of entity-relationship recognition device provided by the embodiments of the present application Other are accordingly described, can be with reference to the corresponding description in Fig. 1 and Fig. 2, and details are not described herein.
Based on above-mentioned method as depicted in figs. 1 and 2, correspondingly, the embodiment of the present application also provides a kind of storage medium, On be stored with computer program, which realizes above-mentioned entity-relationship recognition side as depicted in figs. 1 and 2 when being executed by processor Method.
Based on this understanding, the technical solution of the application can be embodied in the form of software products, which produces Product can store in a non-volatile memory medium (can be CD-ROM, USB flash disk, mobile hard disk etc.), including some instructions With so that computer equipment (can be personal computer, server or the network equipment an etc.) execution the application is each Method described in implement scene.
It is above-mentioned in order to realize based on above-mentioned method as shown in Figure 1 and Figure 2 and virtual bench embodiment shown in Fig. 3 Purpose, the embodiment of the present application also provides a kind of computer equipments, are specifically as follows personal computer, server, the network equipment Deng the entity device includes storage medium and processor;Storage medium, for storing computer program;Processor, for executing Computer program is to realize above-mentioned entity-relationship recognition method as depicted in figs. 1 and 2.
Optionally, which can also include user interface, network interface, camera, radio frequency (Radio Frequency, RF) circuit, sensor, voicefrequency circuit, WI-FI module etc..User interface may include display screen (Display), input unit such as keyboard (Keyboard) etc., optional user interface can also connect including USB interface, card reader Mouthful etc..Network interface optionally may include standard wireline interface and wireless interface (such as blue tooth interface, WI-FI interface).
It will be understood by those skilled in the art that a kind of computer equipment structure provided in this embodiment is not constituted to the reality The restriction of body equipment may include more or fewer components, perhaps combine certain components or different component layouts.
It can also include operating system, network communication module in storage medium.Operating system is that management computer equipment is hard The program of part and software resource supports the operation of message handling program and other softwares and/or program.Network communication module is used Communication between each component in realization storage medium inside, and communicated between other hardware and softwares in the entity device.
Through the above description of the embodiments, those skilled in the art can be understood that the application can borrow It helps software that the mode of necessary general hardware platform is added to realize, hardware realization can also be passed through.Pass through the skill of application the application Art scheme, and the easy training sample set for being mixed into error training sample constructed currently based on remote supervisory, and then training obtains For to text extract entity relationship the lower identification model of accuracy rate compare, credible instruction of the present embodiment based on high quality Practice the preset first instance relation recognition model that sample set training obtains, the identification that can be effectively promoted to entity relationship is accurate Degree.
It will be appreciated by those skilled in the art that the accompanying drawings are only schematic diagrams of a preferred implementation scenario, module in attached drawing or Process is not necessarily implemented necessary to the application.It will be appreciated by those skilled in the art that the mould in device in implement scene Block can according to implement scene describe be distributed in the device of implement scene, can also carry out corresponding change be located at be different from In one or more devices of this implement scene.The module of above-mentioned implement scene can be merged into a module, can also be into one Step splits into multiple submodule.
Above-mentioned the application serial number is for illustration only, does not represent the superiority and inferiority of implement scene.Disclosed above is only the application Several specific implementation scenes, still, the application is not limited to this, and the changes that any person skilled in the art can think of is all The protection scope of the application should be fallen into.

Claims (10)

1. a kind of entity-relationship recognition method characterized by comprising
Using preset first instance relation recognition model, the text of text to be identified is obtained according to the text to be identified got Vector;
The convolution algorithm result of the text vector is obtained according to the text vector of text to be identified;
According to the text vector and obtained convolution algorithm as a result, determining the entity relationship for including in text to be identified;
Wherein, the preset first instance relation recognition model be based on credible training sample set training obtain, it is described can Letter training sample set is constructed by the credible training sample with entity relationship label.
2. the method according to claim 1, wherein the text to be identified that the basis is got obtain it is to be identified The text vector of text, specifically includes:
The word vector sum term vector of text to be identified is obtained using term vector dictionary;
Convolution algorithm is carried out to obtained adjacent multiple word vector sum term vectors, obtains the text vector of text to be identified.
3. the method according to claim 1, wherein the credible training sample set is by with entity relationship mark The credible training sample building of note, it specifically includes:
Entity relationship prediction is carried out to remote supervisory training sample using preset second instance relation recognition model;
According to entity relationship prediction result and the label training sample marked with entity relationship, obtain marking with entity relationship Credible training sample.
4. according to the method described in claim 3, it is characterized in that, described utilize preset second instance relation recognition model pair Remote supervisory training sample carries out entity relationship prediction, specifically includes:
Using preset second instance relation recognition model, convolution fortune is carried out to the label training sample with entity relationship label Calculation obtains convolution algorithm result;
According to the entity relationship label in the convolution algorithm result and the label training sample, to the Gaussian Mixture of initialization Model is trained to obtain trained gauss hybrid models;
Using trained gauss hybrid models, entity relationship prediction is carried out to the remote supervisory training sample.
5. the method according to claim 3 or 4, which is characterized in that described according to entity relationship prediction result and with real The label training sample of body relation mark obtains specifically including with the credible training sample of entity relationship label:
If the initial solid of the entity relationship and the remote supervisory training sample of predicting obtained remote supervisory training sample closes System's label is consistent, then the remote supervisory training sample and the label training sample are used as with entity relationship mark can Believe training sample;
If the entity relationship for predicting obtained remote supervisory training sample and the initial solid in the remote supervisory training sample Relation mark is inconsistent, then deletes the remote supervisory training sample.
6. according to the method described in claim 3, it is characterized in that, the preset second instance relation recognition model is to first What the second instance relation recognition model of beginningization was trained;
The preset second instance relation recognition model is trained to the second instance relation recognition model of initialization It arrives, specifically includes:
Using the second instance relation recognition model of initialization, entity is carried out to the label training sample with entity relationship label Relationship Prediction;
The second instance relationship of initialization is known according to the entity relationship label of entity relationship prediction result and label training sample Network parameter in other model is trained, and obtains preset second instance relation recognition model.
7. method according to claim 1 or 3, which is characterized in that the preset first instance relation recognition model is It is obtained, is specifically included based on the training of credible training sample set:
Using the first instance relation recognition model of initialization, marked to what the credible training sample was concentrated with entity relationship Credible training sample carry out entity relationship prediction;
The first instance relationship of initialization is known according to entity relationship prediction result and the entity relationship of credible training sample label Network parameter in other model is trained, and obtains preset first instance relation recognition model.
8. a kind of entity-relationship recognition device characterized by comprising
Obtain module, for utilize preset first instance relation recognition model, according to the text to be identified got obtain to Identify the text vector of text;
Convolution algorithm module, for obtaining the convolution algorithm result of the text vector according to the text vector of text to be identified;
Entity relationship module, for being wrapped according to the text vector and obtained convolution algorithm as a result, determining in text to be identified The entity relationship contained;
Wherein, the preset first instance relation recognition model is obtained based on the training of credible training sample set.
9. a kind of storage medium, is stored thereon with computer program, which is characterized in that realization when described program is executed by processor Entity-relationship recognition method described in any one of claims 1 to 7.
10. a kind of computer equipment, including storage medium, processor and storage can be run on a storage medium and on a processor Computer program, which is characterized in that the processor is realized described in any one of claims 1 to 7 when executing described program Entity-relationship recognition method.
CN201910559111.9A 2019-06-26 2019-06-26 Entity relationship identification method and device, storage medium and computer equipment Active CN110457677B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910559111.9A CN110457677B (en) 2019-06-26 2019-06-26 Entity relationship identification method and device, storage medium and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910559111.9A CN110457677B (en) 2019-06-26 2019-06-26 Entity relationship identification method and device, storage medium and computer equipment

Publications (2)

Publication Number Publication Date
CN110457677A true CN110457677A (en) 2019-11-15
CN110457677B CN110457677B (en) 2023-11-17

Family

ID=68481090

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910559111.9A Active CN110457677B (en) 2019-06-26 2019-06-26 Entity relationship identification method and device, storage medium and computer equipment

Country Status (1)

Country Link
CN (1) CN110457677B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111192692A (en) * 2020-01-02 2020-05-22 上海联影智能医疗科技有限公司 Entity relationship determination method and device, electronic equipment and storage medium
CN111274412A (en) * 2020-01-22 2020-06-12 腾讯科技(深圳)有限公司 Information extraction method, information extraction model training device and storage medium
CN111338338A (en) * 2020-02-20 2020-06-26 山东科技大学 Robot speed self-adaptive control method based on road surface characteristic cluster analysis
CN111552812A (en) * 2020-04-29 2020-08-18 深圳数联天下智能科技有限公司 Method and device for determining relation category between entities and computer equipment
CN111651575A (en) * 2020-05-29 2020-09-11 泰康保险集团股份有限公司 Session text processing method, device, medium and electronic equipment
CN112069329A (en) * 2020-09-11 2020-12-11 腾讯科技(深圳)有限公司 Text corpus processing method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239446A (en) * 2017-05-27 2017-10-10 中国矿业大学 A kind of intelligence relationship extracting method based on neutral net Yu notice mechanism
CN107729497A (en) * 2017-10-20 2018-02-23 同济大学 A kind of word insert depth learning method of knowledge based collection of illustrative plates
CN107943784A (en) * 2017-11-02 2018-04-20 南华大学 Relation extraction method based on generation confrontation network
CN109299457A (en) * 2018-09-06 2019-02-01 北京奇艺世纪科技有限公司 A kind of opining mining method, device and equipment
WO2019094895A1 (en) * 2017-11-13 2019-05-16 Promptu Systems Corporation Systems and methods for adaptive proper name entity recognition and understanding
CN109815339A (en) * 2019-01-02 2019-05-28 平安科技(深圳)有限公司 Based on TextCNN Knowledge Extraction Method, device, computer equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239446A (en) * 2017-05-27 2017-10-10 中国矿业大学 A kind of intelligence relationship extracting method based on neutral net Yu notice mechanism
CN107729497A (en) * 2017-10-20 2018-02-23 同济大学 A kind of word insert depth learning method of knowledge based collection of illustrative plates
CN107943784A (en) * 2017-11-02 2018-04-20 南华大学 Relation extraction method based on generation confrontation network
WO2019094895A1 (en) * 2017-11-13 2019-05-16 Promptu Systems Corporation Systems and methods for adaptive proper name entity recognition and understanding
CN109299457A (en) * 2018-09-06 2019-02-01 北京奇艺世纪科技有限公司 A kind of opining mining method, device and equipment
CN109815339A (en) * 2019-01-02 2019-05-28 平安科技(深圳)有限公司 Based on TextCNN Knowledge Extraction Method, device, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ANDRES VIGNAGA: "Typing Textual Entities and M2T/T2M Transformations in a Model Management Environment", 《2009 INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY》, pages 115 - 122 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111192692A (en) * 2020-01-02 2020-05-22 上海联影智能医疗科技有限公司 Entity relationship determination method and device, electronic equipment and storage medium
CN111192692B (en) * 2020-01-02 2023-12-08 上海联影智能医疗科技有限公司 Entity relationship determination method and device, electronic equipment and storage medium
CN111274412A (en) * 2020-01-22 2020-06-12 腾讯科技(深圳)有限公司 Information extraction method, information extraction model training device and storage medium
CN111338338A (en) * 2020-02-20 2020-06-26 山东科技大学 Robot speed self-adaptive control method based on road surface characteristic cluster analysis
CN111338338B (en) * 2020-02-20 2024-01-16 山东科技大学 Robot speed self-adaptive control method based on road surface feature cluster analysis
CN111552812A (en) * 2020-04-29 2020-08-18 深圳数联天下智能科技有限公司 Method and device for determining relation category between entities and computer equipment
CN111552812B (en) * 2020-04-29 2023-05-12 深圳数联天下智能科技有限公司 Method, device and computer equipment for determining relationship category between entities
CN111651575A (en) * 2020-05-29 2020-09-11 泰康保险集团股份有限公司 Session text processing method, device, medium and electronic equipment
CN111651575B (en) * 2020-05-29 2023-09-12 泰康保险集团股份有限公司 Session text processing method, device, medium and electronic equipment
CN112069329A (en) * 2020-09-11 2020-12-11 腾讯科技(深圳)有限公司 Text corpus processing method, device, equipment and storage medium
CN112069329B (en) * 2020-09-11 2024-03-15 腾讯科技(深圳)有限公司 Text corpus processing method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN110457677B (en) 2023-11-17

Similar Documents

Publication Publication Date Title
CN111476284B (en) Image recognition model training and image recognition method and device and electronic equipment
CN110457677A (en) Entity-relationship recognition method and device, storage medium, computer equipment
CN113486981B (en) RGB image classification method based on multi-scale feature attention fusion network
CN109165645A (en) A kind of image processing method, device and relevant device
CN108304835A (en) character detecting method and device
CN107908789A (en) Method and apparatus for generating information
CN107688823A (en) A kind of characteristics of image acquisition methods and device, electronic equipment
CN109117781A (en) Method for building up, device and the more attribute recognition approaches of more attribute Recognition Models
CN108229341A (en) Sorting technique and device, electronic equipment, computer storage media, program
CN112989085B (en) Image processing method, device, computer equipment and storage medium
CN107545038B (en) Text classification method and equipment
CN111475613A (en) Case classification method and device, computer equipment and storage medium
CN110287311B (en) Text classification method and device, storage medium and computer equipment
CN108734212A (en) A kind of method and relevant apparatus of determining classification results
CN109902285A (en) Corpus classification method, device, computer equipment and storage medium
CN108959474A (en) Entity relationship extracting method
CN113657087B (en) Information matching method and device
CN113094533B (en) Image-text cross-modal retrieval method based on mixed granularity matching
CN105989336A (en) Scene recognition method based on deconvolution deep network learning with weight
CN109492093A (en) File classification method and electronic device based on gauss hybrid models and EM algorithm
CN108319888A (en) The recognition methods of video type and device, terminal
CN107239775A (en) Terrain classification method and device
US11893773B2 (en) Finger vein comparison method, computer equipment, and storage medium
CN109583367A (en) Image text row detection method and device, storage medium and electronic equipment
CN117036843A (en) Target detection model training method, target detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant