CN110276066A

CN110276066A - The analysis method and relevant apparatus of entity associated relationship

Info

Publication number: CN110276066A
Application number: CN201810217272.5A
Authority: CN
Inventors: 王天祎
Original assignee: Beijing Gridsum Technology Co Ltd
Current assignee: Beijing Gridsum Technology Co Ltd
Priority date: 2018-03-16
Filing date: 2018-03-16
Publication date: 2019-09-24
Anticipated expiration: 2038-03-16
Also published as: WO2019174422A1; CN110276066B

Abstract

The invention discloses the analysis methods and relevant apparatus of a kind of entity associated relationship, wherein, in the analysis method of the entity associated relationship, after obtaining the part of speech sequence of the text to be predicted to text to be predicted progress word segmentation processing, the vector of each of part of speech sequence of the text to be predicted participle is obtained again, it is predicted by the vector that the prediction model of entity associated relationship segments each of part of speech sequence of the text to be predicted, the prediction result of the incidence relation in the text to be predicted between entity and corresponding attribute can be obtained.Due in above process, it is to carry out word segmentation processing to obtain part of speech sequence to text to be predicted, and the vector for the participle that each of obtains part of speech sequence, it is not to solve the problems, such as the accuracy of the test result due to the incidence relation caused by manually selecting word and providing word feature between influence entity and attribute by manually selecting word and extracting word feature.

Description

The analysis method and relevant apparatus of entity associated relationship

Technical field

The present invention relates to the analysis methods and related dress of text analysis technique field more particularly to a kind of entity associated relationship It sets.

Background technique

Text emotion analysis is primarily in reflection social media, and user is about certain events, personage, enterprise, product Deng emotion tendency.Entity emotion analysis refers to the emotion tendency in analysis text about certain entities, rather than entire literary This tendentiousness, such benefit is so that the analysis granularity of emotion object is more clear.And in entity emotion analysis, more It is important that knowing the incidence relation of entity and attribute in text, that is, judge each attribute (such as interior trim, engine in text Deng) associated by entity (such as BMW, benz, Audi).

Existing scheme generally depends on artificial feature of extracting and carries out traditional machine learning classification algorithm.Specifically, Word in artificial selection text between entity and attribute, and the feature for extracting the word is input to classifier, by classifier into The analysis of row incidence relation obtains the test result of the incidence relation in text between entity and attribute.

By artificial selection word and the feature of word is extracted, characteristic extraction procedure can be made with very strong subjectivity, meeting Influence the accuracy of the test result of the incidence relation in text between entity and attribute.

Summary of the invention

In view of the above problems, it proposes on the present invention overcomes the above problem or at least be partially solved in order to provide one kind State the analysis method and relevant apparatus of the entity associated relationship of problem.

A kind of analysis method of entity associated relationship, comprising:

Obtain text to be predicted；

Word segmentation processing is carried out to the text to be predicted, obtains the part of speech sequence of the text to be predicted；

Obtain the vector of each of part of speech sequence of the text to be predicted participle；

Each of part of speech sequence of the text to be predicted is segmented using the prediction model of entity associated relationship Vector is predicted, the prediction result of the incidence relation in the text to be predicted between entity and corresponding attribute is obtained；Wherein, institute The prediction model for stating entity associated relationship is based on First Principle and constructs to obtain；The First Principle includes: that iteration updates the mind Through the parameter in network algorithm, until being carried out using feature vector of the neural network algorithm after undated parameter to training text pre- Prediction result obtained from survey is equal to artificial annotation results；The feature vector of the training text, according to the training text Part of speech sequence each participle vector obtain.

Optionally, the vector of each of described part of speech sequence for obtaining the text to be predicted participle, comprising: obtain The term vector of each of part of speech sequence of the text to be predicted participle；

Alternatively, the vector of each of described part of speech sequence for obtaining the text to be predicted participle, comprising: obtain institute It states every in the term vector of each of part of speech sequence of text to be predicted participle and the part of speech sequence of the text to be predicted The part of speech vector and/or word packet vector of one participle；Combine each of part of speech sequence of the text to be predicted participle The part of speech vector and/or word packet vector that each in the part of speech sequence of term vector and the text to be predicted segments, obtain institute State the vector of each of part of speech sequence of text to be predicted participle.

Optionally, the prediction model using entity associated relationship is to every in the part of speech sequence of the text to be predicted The vector of one participle is predicted, the pre- of the incidence relation in the text to be predicted between target entity and corresponding attribute is obtained Survey result, comprising:

The network characterisation that sequence relation is carried out to the first matrix, obtains the second matrix；Wherein, first matrix includes: The vector of each of part of speech sequence of the text to be predicted participle；

According to the corresponding weight of numerical value of each position in second matrix, second matrix is weighted and averaged Processing, obtains feature vector；

Described eigenvector is handled using softmax function, obtains probability output vector；Wherein, the probability output to Amount includes: probability of the incidence relation under default kind of classification in the text to be predicted between target entity and corresponding attribute Value.

Optionally, the building process of the prediction model of the entity associated relationship, comprising:

Word segmentation processing is carried out to training text, obtains the part of speech sequence of the training text；

Obtain the vector of each of part of speech sequence of training text participle；

The network characterisation that sequence relation is carried out to third matrix, obtains the 4th matrix；Wherein, the third matrix includes: The vector of each of part of speech sequence of training text participle；

According to the corresponding weight of numerical value of each position in the 4th matrix, the 4th matrix is weighted and averaged Processing, obtains feature vector；

Described eigenvector is handled using softmax function, obtains probability output vector；Wherein, the probability output to Amount includes: probability value of the incidence relation under default kind of classification in the training text between target entity and corresponding attribute；

The artificial mark classification of the probability output vector and the training text is subjected to cross entropy operation, is lost Function；

Optimize the loss function, and the first parameter is updated according to the loss function after the optimization, until using updating The probability output vector and the training text that the feature vector that parameter afterwards obtains predicts the training text Artificial mark classification it is equivalent until；Wherein, first parameter includes the softmax function and the training text The vector of each of part of speech sequence participle；

Using updated second parameter as the parameter in the prediction model of entity associated relationship；Wherein, described Two parameters include: the softmax function.

A kind of analytical equipment of entity associated relationship, comprising:

Acquiring unit, for obtaining text to be predicted；

Participle unit obtains the part of speech sequence of the text to be predicted for carrying out word segmentation processing to the text to be predicted Column；

Generation unit, the vector of each of part of speech sequence for obtaining the text to be predicted participle；

Predicting unit, for the prediction model using entity associated relationship in the part of speech sequence of the text to be predicted The vector of each participle is predicted, the incidence relation in the text to be predicted between target entity and corresponding attribute is obtained Prediction result；Wherein, the prediction model of the entity associated relationship constructs to obtain based on First Principle；The First Principle packet Include: iteration updates the parameter in the neural network algorithm, makes using the neural network algorithm after undated parameter to training text Feature vector predicted that the prediction result predicted is equal to artificial annotation results；The feature of the training text to The vector of amount, each participle of the part of speech sequence according to the training text obtains.

Optionally, the generation unit, comprising:

First obtains unit, the term vector of each of part of speech sequence for obtaining the text to be predicted participle；

Or, comprising: the second obtaining unit, each of part of speech sequence for obtaining the text to be predicted participle Term vector and the text to be predicted part of speech sequence in each participle part of speech vector and/or word packet vector；And group Close the term vector of each of part of speech sequence of the text to be predicted participle and the part of speech sequence of the text to be predicted In each participle part of speech vector and/or word packet vector, obtain each of part of speech sequence of the text to be predicted point The vector of word.

Optionally, the predicting unit, comprising:

Third obtaining unit obtains the second matrix for carrying out the network characterisation of sequence relation to the first matrix；Wherein, First matrix includes: the vector of each of part of speech sequence of the text to be predicted participle；

4th obtaining unit, for the corresponding weight of numerical value according to each position in second matrix, to described Two matrixes are weighted and averaged processing, obtain feature vector；

It predicts subelement, for handling described eigenvector using softmax function, obtains probability output vector；Wherein, The probability output vector includes: incidence relation in the training text between target entity and corresponding attribute in default type Probability value under not.

Optionally, the participle unit is also used to carry out word segmentation processing to training text, obtains the word of the training text Property sequence；

The generation unit is also used to obtain the vector of each of part of speech sequence of training text participle；

The third obtaining unit is also used to carry out third matrix the network characterisation of sequence relation, obtains the 4th matrix； Wherein, the third matrix includes: the vector of each of part of speech sequence of training text participle；

4th obtaining unit is also used to the corresponding weight of numerical value according to each position in the 4th matrix, right 4th matrix is weighted and averaged processing, obtains feature vector；

The prediction subelement, be also used to using softmax function handle described eigenvector, obtain probability output to Amount；Wherein, the probability output vector includes: that the incidence relation in the training text between target entity and corresponding attribute exists Probability value under default kind of classification；

Further include: comparing unit, for by the artificial mark classification of the probability output vector and the training text into Row cross entropy operation obtains loss function；

Optimize unit, for optimizing the loss function；

Updating unit, for updating the first parameter according to the loss function after the optimization, until utilizing updated ginseng The probability output vector that the obtained feature vectors of number predict the training text is artificial with the training text Until mark classification is equivalent；Wherein, first parameter includes the part of speech sequence of the softmax function and the training text The vector of each of column participle；

Construction unit, for using updated second parameter as the ginseng in the prediction model of entity associated relationship Number；Wherein, second parameter includes: the softmax function.

A kind of storage medium, the storage medium include the program of storage, wherein in described program operation described in control Equipment where storage medium executes the analysis method of entity associated relationship as described in any one of the above embodiments.

A kind of processor, the processor is for running program, wherein such as any of the above-described is executed when described program is run The analysis method of the entity associated relationship.

By above-mentioned technical proposal, in the analysis method and relevant apparatus of entity associated relationship provided by the invention, treat After prediction text progress word segmentation processing obtains the part of speech sequence of the text to be predicted, then obtain the part of speech of the text to be predicted The vector of each of sequence participle, by the prediction model of entity associated relationship in the part of speech sequence of the text to be predicted Vector of each participle predicted, the incidence relation between entity in the text to be predicted and corresponding attribute can be obtained Prediction result.Due in above process, being to carry out word segmentation processing to obtain part of speech sequence, and obtain part of speech to text to be predicted The vector of each of sequence participle, be not by manually selecting word and extract word feature, solve due to manually select word and The problem of accuracy of the test result of incidence relation between entity and attribute is provided caused by offer word feature.

The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention, And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can It is clearer and more comprehensible, the followings are specific embodiments of the present invention.

Detailed description of the invention

By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:

Fig. 1 shows the flow chart of the building process of the prediction model of entity associated relationship disclosed by the embodiments of the present invention；

Fig. 2 shows the flow charts of the specific executive mode of step S102 disclosed by the embodiments of the present invention；

Fig. 3 shows the flow chart of the analysis method of entity associated relationship disclosed by the embodiments of the present invention；

Fig. 4 shows the flow chart of the specific executive mode of step S303 disclosed by the embodiments of the present invention；

Fig. 5 shows the flow chart of the specific executive mode of step S304 disclosed by the embodiments of the present invention；

Fig. 6 shows the structural schematic diagram of the analytical equipment of entity associated relationship disclosed by the embodiments of the present invention；

Fig. 7 shows the structural schematic diagram of generation unit disclosed by the embodiments of the present invention；

Fig. 8 shows the structural schematic diagram of predicting unit disclosed by the embodiments of the present invention.

Specific embodiment

Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure It is fully disclosed to those skilled in the art.

In the embodiment of the present application, need to predict text to be predicted using the prediction model of entity associated relationship.Cause This needs first to construct the entity associated disclosed in execute the embodiment of the present application before the analysis method of entity associated relationship The prediction model of relationship.

Referring to Fig. 1, the building process of the prediction model of the entity associated relationship, comprising:

S101, word segmentation processing is carried out to training text, obtains the part of speech sequence of the training text.

Wherein, prepare Training document, include at least a training text in the Training document.Training text be user about The evaluation sentence of certain events, personage, enterprise and product etc..

For each of Training document training text, using Open-Source Tools software, such as LTP (Harbin Institute of Technology's language technology Platform, Language Technology Platform) it is segmented, and obtain the part of speech sequence accordingly segmented, wherein it is described Part of speech sequence includes segmentation sequence, part of speech result and dependence sequence.The segmentation sequence includes dividing training text Each participle obtained from after word；The part of speech result includes the part of speech of each participle.The dependence sequence is the instruction Practice the incidence relation between each participle obtained from after text is segmented.

Such as: training text is the preceding face might arrogance of benz, then carrying out the segmentation sequence that word segmentation processing obtains to it is [benz, preceding, face, might, arrogance,.]；Part of speech result is [nz, u, nd, n, a, a, wp]；Part of speech result be [n, n, v, a, N], in obtained part of speech result, n represents general noun, noun；V represents verb, verb；A represents adjective, shape Hold word；Dependence sequence is [ATT, RAD, ATT, SBV, HED, COO, WP], and in obtained dependence sequence, ATT is represented Attribute, fixed middle relationship；RAD represents right adjunct, right additional relationships；SBV is represented, and HED represents head, and core is closed System, COO represent coordinate, coordination, and WP represents punctuation, punctuation mark.

The vector of each of S102, the part of speech sequence for obtaining training text participle.

Wherein, each of part of speech sequence of the training text segments, and needs by the way of feature vector come table It reaches.Therefore, it is necessary to each of the part of speech sequences for the training text to segment, and obtain the vector of the participle.The instruction Practicing text includes entity and the entity attributes, also includes pair in the part of speech sequence after word segmentation processing is carried out to the training text Answer the participle of the participle of entity and the attribute of correspondent entity.

It should also be noted that, be directed to each training text, obtain each of its part of speech sequence participle to Before amount, it is also necessary to determine that its participle length cannot be excessive.Therefore, the participle for counting each training text in Training document is long It spends, whether there is the length text that peels off of overlength in training of judgement document.Specifically, calculating the equal of the participle length of training text The standard deviation of value, overlength peel off length text be segment length whether be more than the mean value the several multiples of standard deviation other than Training text.It can according to the actual situation, to set specific multiple requirement.

If the length text that peels off that overlength is not present in Training document is judged, by length longest in the Training document Training text participle length as the Training document part of speech sequence length, then obtain the part of speech sequence of training text Each of column participle.It, will be in the Training document if judging the length text that peels off in Training document there are overlength In addition to the length text that peels off of overlength and in remaining training text, the length conduct of the participle of the longest training text of length The length of the part of speech sequence of the Training document.Also, the length of the part of speech sequence according to the Training document intercepts the instruction Practice the length text that peels off of the overlength in document.Specifically, with centered on the target entity in the training text, to forward and backward Extension is until segmenting length to the length of the part of speech sequence of the Training document respectively, then obtains training text and can manage it interception The vector that each in the part of speech sequence of text after operation segments.

Such as: there are 10 training texts in Training document, the participle length of each training text is differed, but longest The participle length of one training text is 50, then taking 50 is the length of the part of speech sequence of the Training document.If Training document In have a training text, participle length has 1000, then the training text is the length text that peels off of overlength.

Optionally, in a kind of implementation of step S102, which includes:

Obtain the term vector of each of part of speech sequence of training text participle.

Wherein, each of part of speech sequence of the training text is segmented, is sieved respectively in term vector model It looks into, obtains term vector of the current participle in term vector model.

Every text sentence in text library is segmented using Open-Source Tools software, and carries out word using term vector model Vector training, i.e. generation term vector model.The text library includes industry corpus and general corpus, the general corpus Refer to the text library for being detached from industry personalization.The effect of term vector model is mapped to word in the space of certain latitude, can be characterized Similitude between word and word.Meanwhile low frequency long-tail word (the low frequency long-tail appeared in corpus is contained in term vector model Word refers to that in the whole vocabulary frequency of occurrences is lower than the vocabulary of some threshold value), be uniformly denoted as UNK (unknown keyword, it is unknown Keyword), UNK shares unique term vector in term vector model.

If in the part of speech sequence of the training text some participle, do not have in the term vector model corresponding word to Amount, then the term vector of the participle uses UNK term vector.

Optionally, in another implementation of step S102, referring to fig. 2, which includes:

Each of S1022, the part of speech sequence for obtaining the training text term vector of participle and the training text The part of speech vector and/or word packet vector of each of this part of speech sequence participle.

Each of part of speech sequence of training text participle, the difference of part of speech also result in entity and corresponding attribute Between incidence relation prediction result difference.And hence it is also possible to obtain each of part of speech sequence of the training text The part of speech vector of participle.

Specifically, the random vector of certain dimension is carried out to part of speech, for example part of speech shares 5 kinds [a, b, c, d, e], then can To indicate a with random vector Va, similarly, indicate that b, the dimension of Va, Vb can be arbitrarily designated with random vector Vb.For the instruction Practice each of part of speech sequence of text participle, corresponding part of speech vector can be obtained according to its part of speech.

Similarly, the word packet belonging to segmenting also will affect sentencing to the prediction result of the incidence relation between entity and corresponding attribute Disconnected, there is no find corresponding word in the term vector model for some participle especially in the part of speech sequence of training text Vector can be able to be comprehensive reflection participle by the word packet vector of participle.And hence it is also possible to obtain the training text The word packet vector of each of part of speech sequence participle.

Specifically, by the affiliated pass of each of the part of speech sequence of training text participle and industry field word packet System, is encoded, and the word packet vector of each of part of speech sequence of training text participle is obtained.Such as: described in judgement Whether each of part of speech sequence of training text segments in entity word packet, if is evaluating in words and phrases packet.Judgement is tied Fruit is encoded, and the word packet vector of each of part of speech sequence of training text participle is obtained.

1023, each of the part of speech sequence of the training text term vector of participle and the training text are combined The part of speech vector and/or word packet vector of each participle, obtain the part of speech sequence of the training text in this part of speech sequence The vector of each of column participle.

Wherein, it is segmented for each of part of speech sequence of the training text, respectively by its term vector, part of speech vector And/or word packet vector is spliced and combined, and the vector of the participle is formed.

S103, the network characterisation that sequence relation is carried out to third matrix, obtain the 4th matrix.

Wherein, the vector that each in the part of speech sequence of the training text segments is combined, the third matrix is obtained.Again The network of a sequence relation is carried out to the third matrix using two-way Bi-LSTM (Long-Short Term Memory) Characterization, obtains the 4th matrix.

S104, according to the corresponding weight of numerical value of each position in the 4th matrix, the 4th matrix is added Weight average processing, obtains feature vector.

Specifically, in conjunction with neural network algorithm attention mechanism normalize, assign each position of the 4th matrix with Different weights.Specifically, some participles do not need to pay close attention to very much, weight reduction, some participles should then strengthen concern.Then right The numerical value of each position is weighted and averaged in 4th matrix, obtains feature vector.

S105, described eigenvector is handled using softmax function, obtains probability output vector.

Wherein, the probability output vector is a bivector, the probability value including two classifications, each classification it is general Rate value is used to indicate that the incidence relation between the participle of correspondent entity and the participle of corresponding attribute to belong to the probability of corresponding classification.Tool Body, in two classifications, a classification is pairing, shows that the participle of correspondent entity has with the participle of corresponding attribute and is associated with System；One classification be it is unpaired, the participle of the participle and corresponding attribute that show correspondent entity does not have incidence relation.

Before using softmax function processing described eigenvector, it is also necessary to obtain in the part of speech sequence of training sample Positive sample and negative sample.Specifically, the participle for the correspondent entity being manually entered in the part of speech sequence of training text and belonging to this The participle of entity and corresponding attribute.The participle of correspondent entity and the participle for belonging to the entity and corresponding attribute are combined, Form positive sample.By the participle of each correspondent entity in the training text and belong to each entity and corresponding attribute again Participle carry out combined crosswise, obtain negative sample set, the negative sample some or all of in negative sample set described in reselection.

Such as: training text are as follows: Xiang Zhong GS8 interior trim hears that indisposition is more.Mutually Central European Lander's quality but see in not that his is interior Decorations.In the training text, first entity is GS8, and corresponding attribute is interior trim, and second entity is Europe Lander, and corresponding attribute is Quality.Combine the positive sample that first entity and corresponding attribute obtain are as follows: GS8, interior trim.Combine second entity and corresponding attribute Obtained positive sample are as follows: Europe Lander, quality.By first entity, second entity, the corresponding attribute of first entity and The corresponding attribute combined crosswise of second entity, obtained negative sample set include: GS8, quality and Europe Lander, interior trim.

For each sample (including positive sample and negative sample), described eigenvector is handled using softmax function, point The probability output vector of each sample is not corresponded to, and the probability value of the two categories in the probability output vector can distinguish table The incidence relation between entity and attribute for including in each bright sample is pairing and unpaired probability value.

It should also be noted that, being segmented for each of the part of speech sequence in the training text, in correspondent entity Participle and corresponding attribute participle two sides, add special identifier symbol, which is used as special index, be used to indicate entity with Attribute position.Such as " the powerful arrogance of face before<e2>of<e1>benz<e1><e2>", special identifier be<e1><e1><e2> The participle of<e2>mark correspondent entity and the participle of corresponding attribute.

Described eigenvector is being handled using softmax function, during obtaining probability output vector, is needing to pass through knowledge The special identifier symbol not added, the correspondence to determine each sample in the part of speech sequence in the training text are real The participle of the participle of body and corresponding attribute.

S106, the artificial mark classification of the probability output vector and the training text is subjected to cross entropy operation, obtained Obtain loss function.

Wherein, to each of Training document training text, the association of entity and attribute in manual identified training text Relationship obtains the artificial mark classification of the training text.

The artificial mark classification of the probability output vector and the training text is subjected to cross entropy operation, obtained institute State the difference of artificial mark classification of the loss function for showing training text described in the probability output vector sum.

S107, the optimization loss function, and the first parameter is updated according to the loss function after the optimization, until utilizing The probability output vector and the training that the feature vector that updated parameter obtains predicts the training text The artificial mark classification of text is substantially with until.

Wherein, first parameter includes the attention mechanism, described of the Bi-LSTM, the neural network algorithm The vector of each of part of speech sequence of softmax function and training text participle.

Specifically, by stochastic gradient descent method or Adam optimization algorithm etc., may be implemented to the loss function into Row optimization, the loss function after being optimized obtain updated parameter according to the loss function Layer by layer recurrence after the optimization.

It should also be noted that, in this step, it is equivalent to be meant that: to stand from the perspective of those skilled in the art, generally It is equivalent that rate output vector can be treated as compared with the artificial mark classification of training text.

S108, using updated second parameter as the parameter in the prediction model of entity associated relationship；Wherein, institute State the attention machine that the second parameter includes: the Bi-LSTM, the softmax function and the neural network algorithm System.

It, can be to text to be predicted based on the prediction model of the entity associated relationship constructed by the method for above-described embodiment Carry out the analysis of entity associated relationship.Specifically, referring to Fig. 3, the analysis method of the entity associated relationship, comprising:

S301, text to be predicted is obtained.

Wherein, the text to be predicted is evaluation sentence of the user about certain events, personage, enterprise and product etc.. The text to be predicted is obtained, to analyze the emotion tendency to the text about the target entity in text.

S302, word segmentation processing is carried out to the text to be predicted, obtains the part of speech sequence of the text to be predicted.

For text to be predicted, word segmentation processing is equally carried out using Open-Source Tools software, and obtain the part of speech accordingly segmented Sequence.The specific implementation procedure of this step can be found in the embodiment of corresponding diagram 1, and the content of step S101, details are not described herein again.

The vector of each of S303, the part of speech sequence for obtaining the text to be predicted participle.

Optionally, in a kind of implementation of step S303, which includes:

Obtain the term vector of each of part of speech sequence of the text to be predicted participle.

Optionally, in another implementation of step S303, referring to fig. 4, which includes:

Each of S3031, part of speech sequence for obtaining the text to be predicted term vector of participle and described to pre- Survey the part of speech vector and/or word packet vector of each participle in the part of speech sequence of text.

Each of part of speech sequence of S3032, the combination text to be predicted term vector of participle and described to pre- The part of speech vector and/or word packet vector for surveying each participle in the part of speech sequence of text, obtain the part of speech of the text to be predicted The vector of each of sequence participle.

Wherein, the particular content of above-mentioned two implementation, in the embodiment that may refer to corresponding diagram 1, step S102's The content of specific implementation, details are not described herein again.

S304, each of part of speech sequence of the text to be predicted is divided using the prediction model of entity associated relationship The vector of word is predicted, the prediction result of the incidence relation in the text to be predicted between entity and corresponding attribute is obtained；Its In, the prediction model of the entity associated relationship is based on First Principle and constructs to obtain；The First Principle includes: that iteration updates institute State the parameter in neural network algorithm, until using the neural network algorithm after undated parameter to the feature vector of training text into Prediction result obtained from row prediction is equal to artificial annotation results；The feature vector of the training text, according to the training The vector of each participle of the part of speech sequence of text obtains.

In the analysis method of entity associated relationship disclosed in the present embodiment, word segmentation processing is carried out to text to be predicted and obtains institute After the part of speech sequence for stating text to be predicted, then the vector that each of part of speech sequence for obtaining the text to be predicted segments, The vector that each of part of speech sequence of the text to be predicted segments is carried out by the prediction model of entity associated relationship pre- It surveys, the prediction result of the incidence relation in the text to be predicted between entity and corresponding attribute can be obtained.Due in above-mentioned mistake Cheng Zhong, to text to be predicted be carry out word segmentation processing each of obtain part of speech sequence, and obtain part of speech sequence participle to Amount is not to be solved by manually selecting word and extracting word feature due to shadow caused by manually selecting word and providing word feature The problem of ringing the accuracy of the test result of the incidence relation between entity and attribute.

Optionally, in another embodiment of the application, referring to Fig. 5, step S304 includes:

S3041, the network characterisation that sequence relation is carried out to the first matrix, obtain the second matrix；Wherein, first matrix It include: the vector of each of part of speech sequence of the text to be predicted participle.

Wherein, the specific implementation of this step, reference can be made in the embodiment of corresponding diagram 1, the content of step S103, herein It repeats no more.

S3042, according to the corresponding weight of numerical value of each position in second matrix, second matrix is added Weight average processing, obtains feature vector.

Wherein, the specific implementation of this step, reference can be made in the embodiment of corresponding diagram 1, the content of step S104, herein It repeats no more.

S3043, described eigenvector is handled using softmax function, obtains probability output vector.

The probability output vector includes: the incidence relation in the text to be predicted between target entity and corresponding attribute Probability value under two categories.

Wherein, the specific implementation of this step, reference can be made in the embodiment of corresponding diagram 1, the content of step S105, herein It repeats no more.

Another embodiment of the application also discloses a kind of analytical equipment of entity associated relationship comprising each unit Specific work process can be found in the embodiment content of corresponding diagram 3.Specifically, the analysis of the entity associated relationship fills referring to Fig. 6 It sets and includes:

Acquiring unit 601, for obtaining text to be predicted.

Participle unit 602 obtains the part of speech of the text to be predicted for carrying out word segmentation processing to the text to be predicted Sequence.

Generation unit 603, the vector of each of part of speech sequence for obtaining the text to be predicted participle.

Optionally, in another embodiment of the application, generation unit 603, referring to Fig. 7, comprising:

First obtains unit 6031, each of part of speech sequence for obtaining the text to be predicted participle word to Amount.

Alternatively, generation unit 603 includes: the second obtaining unit 6032, for the text to be predicted part of speech sequence Each of participle term vector and the text to be predicted part of speech sequence in each participle part of speech vector sum/ Or word packet vector；And the term vector and described to pre- of each of part of speech sequence for combining the text to be predicted participle The part of speech vector and/or word packet vector for surveying each participle in the part of speech sequence of text, obtain the part of speech of the text to be predicted The vector of each of sequence participle.

Wherein, the specific work process of each unit in generation unit 603 disclosed in the present embodiment can be found in above-mentioned right The content of the embodiment of Fig. 4 is answered, details are not described herein again.

Predicting unit 604, for the prediction model using entity associated relationship to the part of speech sequence of the text to be predicted Each of the vector of participle predicted that the association obtained between target entity in the text to be predicted and corresponding attribute is closed The prediction result of system；Wherein, the prediction model of the entity associated relationship constructs to obtain based on First Principle；The First Principle Include: that iteration updates parameter in the neural network algorithm, makes using the neural network algorithm after undated parameter to training text This feature vector is predicted that the prediction result predicted is equal to artificial annotation results；The feature of the training text The vector of vector, each participle of the part of speech sequence according to the training text obtains.

Optionally, in another embodiment of the application, predicting unit 604, as shown in Figure 8, comprising:

Third obtaining unit 6041 obtains the second matrix for carrying out the network characterisation of sequence relation to the first matrix；Its In, first matrix includes: the vector of each of part of speech sequence of the text to be predicted participle.

4th obtaining unit 6042, for the corresponding weight of numerical value according to each position in second matrix, to institute It states the second matrix and is weighted and averaged processing, obtain feature vector.

Predict subelement 6043, for obtaining probability output vector using softmax function processing described eigenvector, Wherein, the probability output vector includes: incidence relation in the training text between target entity and corresponding attribute two Probability value under kind classification.

Wherein, the specific work process of each unit in predicting unit 604 disclosed in the present embodiment can be found in above-mentioned right The content of the embodiment of Fig. 5 is answered, details are not described herein again.

In the present embodiment, to text to be predicted, word segmentation processing is carried out by participle unit and obtains part of speech sequence, and is single by generating Member obtains the vector of each of part of speech sequence participle, be not by manually selecting word and extract word feature, solve due to It manually selects word and the accuracy for influencing the test result of the incidence relation between entity and attribute caused by word feature is provided The problem of.

Optionally, in another embodiment of the application, the analytical equipment of the entity associated relationship can also be to training text This is predicted, the prediction model of entity associated relationship is obtained.

Specific: participle unit 602 is also used to carry out word segmentation processing to training text, obtains the word of the training text Property sequence.

Generation unit 603 is also used to obtain the vector of each of part of speech sequence of training text participle.

Third obtaining unit 6041 is also used to carry out third matrix the network characterisation of sequence relation, obtains the 4th matrix； Wherein, the third matrix includes: the vector of each of part of speech sequence of training text participle.

4th obtaining unit 6042 is also used to the corresponding weight of numerical value according to each position in the 4th matrix, right 4th matrix is weighted and averaged processing, obtains feature vector.

Predict subelement 6043, be also used to using softmax function handle described eigenvector, obtain probability output to Amount；Wherein, the probability output vector includes: that the incidence relation in the training text between target entity and corresponding attribute exists Probability value under two categories.

Also, the analytical equipment of the entity associated relationship further include: comparing unit is used for the probability output vector Cross entropy operation is carried out with the artificial mark classification of the training text, obtains loss function.

Optimize unit, for optimizing the loss function.

Updating unit, for updating the first parameter according to the loss function after the optimization, until utilizing updated ginseng The probability output vector that the obtained feature vectors of number predict the training text is artificial with the training text Until mark classification is substantially equivalent；Wherein, first parameter includes the word of the softmax function and the training text Property each of sequence participle vector.

Wherein, the specific work process of each unit in above-described embodiment can be found in the embodiment of above-mentioned corresponding diagram 1 Content, details are not described herein again.

The analytical equipment of the entity associated relationship includes processor and memory, above-mentioned acquiring unit, participle unit, Generation unit and predicting unit etc. store in memory as program unit, are executed by processor stored in memory Above procedure unit realizes corresponding function.

Include kernel in processor, is gone in memory to transfer corresponding program unit by kernel.Kernel can be set one Or more, the analysis of the incidence relation in text to be predicted between entity and corresponding attribute is realized by adjusting kernel parameter Journey, to obtain the prediction result of the incidence relation in the text to be predicted between entity and corresponding attribute.

Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/ Or the forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flashRAM), memory includes at least one storage Chip.

The embodiment of the invention provides a kind of storage mediums, are stored thereon with program, real when which is executed by processor The analysis method of the existing entity associated relationship.

The embodiment of the invention provides a kind of processor, the processor is for running program, wherein described program operation The analysis method of entity associated relationship described in Shi Zhihang.

The embodiment of the invention provides a kind of equipment, equipment herein can be server, PC, PAD, mobile phone etc..If It is standby including processor, memory and storage on a memory and the program that can run on a processor, when processor execution program It performs the steps of

A kind of analysis method of entity associated relationship, comprising:

Obtain text to be predicted；

Optionally, the building process of the prediction model of entity associated relationship is stated, comprising:

The present invention also provides a kind of computer program products, when executing on data processing equipment, are adapted for carrying out just The program of beginningization there are as below methods step:

A kind of analysis method of entity associated relationship, comprising:

Obtain text to be predicted；

Described eigenvector is handled using softmax function, obtains probability output vector；Wherein, the probability output to Amount includes: probability value of the incidence relation under two categories in the text to be predicted between target entity and corresponding attribute.

It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.

The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.

In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net Network interface and memory.

Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/ Or the forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flashRAM).Memory is computer-readable medium Example.

Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), Digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices Or any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, it calculates Machine readable medium does not include temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.

It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including element There is also other identical elements in process, method, commodity or equipment.

It will be understood by those skilled in the art that embodiments herein can provide as method, system or computer program product. Therefore, complete hardware embodiment, complete software embodiment or embodiment combining software and hardware aspects can be used in the application Form.It is deposited moreover, the application can be used to can be used in the computer that one or more wherein includes computer usable program code The shape for the computer program product implemented on storage media (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) Formula.

The above is only embodiments herein, are not intended to limit this application.To those skilled in the art, Various changes and changes are possible in this application.It is all within the spirit and principles of the present application made by any modification, equivalent replacement, Improve etc., it should be included within the scope of the claims of this application.

Claims

1. a kind of analysis method of entity associated relationship characterized by comprising

Obtain text to be predicted；

The vector that each of part of speech sequence of the text to be predicted is segmented using the prediction model of entity associated relationship It is predicted, obtains the prediction result of the incidence relation in the text to be predicted between entity and corresponding attribute；Wherein, the reality The prediction model of body incidence relation is based on First Principle and constructs to obtain；The First Principle includes: that iteration updates the nerve net Parameter in network algorithm, until predicted using feature vector of the neural network algorithm after undated parameter to training text and Obtained prediction result is equal to artificial annotation results；The feature vector of the training text, the word according to the training text Property sequence each participle vector obtain.

2. the method according to claim 1, wherein in the part of speech sequence for obtaining the text to be predicted The vector of each participle, comprising: obtain the term vector of each of part of speech sequence of the text to be predicted participle；

Alternatively, the vector of each of described part of speech sequence for obtaining the text to be predicted participle, comprising: obtain it is described to Predict each in the term vector of each of part of speech sequence of text participle and the part of speech sequence of the text to be predicted The part of speech vector and/or word packet vector of participle；Combine each of part of speech sequence of the text to be predicted participle word to The part of speech vector and/or word packet vector of each participle in amount and the part of speech sequence of the text to be predicted, obtain it is described to Predict the vector of each of part of speech sequence of text participle.

3. the method according to claim 1, wherein the prediction model using entity associated relationship is to described The vector of each of part of speech sequence of text to be predicted participle is predicted, target entity in the text to be predicted is obtained The prediction result of incidence relation between corresponding attribute, comprising:

The network characterisation that sequence relation is carried out to the first matrix, obtains the second matrix；Wherein, first matrix includes: described The vector of each of part of speech sequence of text to be predicted participle；

According to the corresponding weight of numerical value of each position in second matrix, place is weighted and averaged to second matrix Reason, obtains feature vector；

Described eigenvector is handled using softmax function, obtains probability output vector；Wherein, the probability output vector packet It includes: probability value of the incidence relation under default kind of classification in the text to be predicted between target entity and corresponding attribute.

4. the method according to claim 1, wherein the building of the prediction model of the entity associated relationship Journey, comprising:

The network characterisation that sequence relation is carried out to third matrix, obtains the 4th matrix；Wherein, the third matrix includes: described The vector of each of part of speech sequence of training text participle；

According to the corresponding weight of numerical value of each position in the 4th matrix, place is weighted and averaged to the 4th matrix Reason, obtains feature vector；

Described eigenvector is handled using softmax function, obtains probability output vector；Wherein, the probability output vector packet It includes: probability value of the incidence relation under default kind of classification in the training text between target entity and corresponding attribute；

The artificial mark classification of the probability output vector and the training text is subjected to cross entropy operation, obtains loss letter Number；

Optimize the loss function, and the first parameter is updated according to the loss function after the optimization, until using updated The people of probability output vector and the training text that the feature vector that parameter obtains predicts the training text Until work mark classification is equivalent；Wherein, first parameter includes the part of speech of the softmax function and the training text The vector of each of sequence participle；

Using updated second parameter as the parameter in the prediction model of entity associated relationship；Wherein, second ginseng Number includes: the softmax function.

5. a kind of analytical equipment of entity associated relationship characterized by comprising

Acquiring unit, for obtaining text to be predicted；

Participle unit obtains the part of speech sequence of the text to be predicted for carrying out word segmentation processing to the text to be predicted；

Predicting unit, for the prediction model using entity associated relationship to each in the part of speech sequence of the text to be predicted The vector of a participle is predicted, the prediction of the incidence relation in the text to be predicted between target entity and corresponding attribute is obtained As a result；Wherein, the prediction model of the entity associated relationship constructs to obtain based on First Principle；The First Principle includes: repeatedly In generation, updates the parameter in the neural network algorithm, makes the feature using the neural network algorithm after undated parameter to training text Vector is predicted that the prediction result predicted is equal to artificial annotation results；The feature vector of the training text, foundation The vector of each participle of the part of speech sequence of the training text obtains.

6. device according to claim 5, which is characterized in that the generation unit, comprising:

Or, comprising: the second obtaining unit, the word of each of part of speech sequence for obtaining the text to be predicted participle The part of speech vector and/or word packet vector that each in the part of speech sequence of vector and the text to be predicted segments；And combine institute It states every in the term vector of each of part of speech sequence of text to be predicted participle and the part of speech sequence of the text to be predicted The part of speech vector and/or word packet vector of one participle obtain each of part of speech sequence of the text to be predicted participle Vector.

7. device according to claim 5, which is characterized in that the predicting unit, comprising:

Third obtaining unit obtains the second matrix for carrying out the network characterisation of sequence relation to the first matrix；Wherein, described First matrix includes: the vector of each of part of speech sequence of the text to be predicted participle；

4th obtaining unit, for the corresponding weight of numerical value according to each position in second matrix, to second square Battle array is weighted and averaged processing, obtains feature vector；

It predicts subelement, for handling described eigenvector using softmax function, obtains probability output vector；Wherein, described Probability output vector includes: incidence relation in the training text between target entity and corresponding attribute under default kind of classification Probability value.

8. device according to claim 5, which is characterized in that the participle unit is also used to divide training text Word processing, obtains the part of speech sequence of the training text；

The third obtaining unit is also used to carry out third matrix the network characterisation of sequence relation, obtains the 4th matrix；Its In, the third matrix includes: the vector of each of part of speech sequence of training text participle；

4th obtaining unit is also used to the corresponding weight of numerical value according to each position in the 4th matrix, to described 4th matrix is weighted and averaged processing, obtains feature vector；

The prediction subelement is also used to handle described eigenvector using softmax function, obtains probability output vector；Its In, the probability output vector includes: that the incidence relation in the training text between target entity and corresponding attribute is being preset Probability value under kind classification；

Further include: comparing unit, for the artificial mark classification of the probability output vector and the training text to be handed over Entropy operation is pitched, loss function is obtained；

Optimize unit, for optimizing the loss function；

Updating unit, for updating the first parameter according to the loss function after the optimization, until being obtained using updated parameter To the feature vector artificial mark of probability output vector and the training text that the training text is predicted Until classification is equivalent；Wherein, first parameter includes in the part of speech sequence of the softmax function and the training text Each participle vector；

Construction unit, for using updated second parameter as the parameter in the prediction model of entity associated relationship；Its In, second parameter includes: the softmax function.

9. a kind of storage medium, which is characterized in that the storage medium includes the program of storage, wherein run in described program When control the storage medium where equipment execute analysis side such as entity associated relationship of any of claims 1-4 Method.

10. a kind of processor, which is characterized in that the processor is for running program, wherein executed such as when described program is run The analysis method of entity associated relationship of any of claims 1-4.