CN109241535B

CN109241535B - A kind of the term vector training method and device of multi-semantic meaning supervision

Info

Publication number: CN109241535B
Application number: CN201811083181.3A
Authority: CN
Inventors: 李健铨
Original assignee: Beijing Shenzhou Taiyue Software Co Ltd
Current assignee: Dingfu Intelligent Technology Co., Ltd
Priority date: 2018-09-17
Filing date: 2018-09-17
Publication date: 2019-08-27
Anticipated expiration: 2038-09-17
Also published as: CN109241535A

Abstract

The embodiment of the present application provides the term vector training method and device of a kind of multi-semantic meaning supervision, during term vector training, firstly, including the adopted former vector of former justice according to each of centre word semanteme, generates the weighting semantic vector of each semanteme；Then, the weighting semantic vector of semantemes all to centre word is weighted summation, generates boot vector；Then, the distance between the term vector of construction from the boot vector to centre word loss function；Finally, iteration more new direction of the term vector of centre word in term vector model training is supervised according to the range loss function, so that the term vector of centre word be made to draw close in the training process towards the multiple semanteme of centre word.To solve the term vector that term vector training method in the prior art obtains and there is shortcoming, the problem of the true semanteme for the word that is beyond expression out in terms of the polarity of expression word, classification.

Description

A kind of the term vector training method and device of multi-semantic meaning supervision

Technical field

This application involves the term vector training methods that natural language processing technique field more particularly to a kind of multi-semantic meaning are supervised And device.

Background technique

With the development of natural language processing technique, the intelligent Answer System set up based on natural language processing technique Also it is widely used, common intelligent Answer System such as chat robots, it can be according in the chat that user inputs Hold, automatically generates and correspondingly reply.

In the prior art, intelligent Answer System according to different technological means can be divided into retrieval type intelligent Answer System and Production intelligent Answer System.Method based on retrieval needs to define knowledge base, stores the reply of needs and some didactic Method selects suitable reply according to input and context, thus cannot generate new reply text.Heuristic can be Simple rule-based expression formula goes to match, and is also possible to the combination of complicated a series of machine learning method, makes to generate Formula intelligent Answer System has after receiving user and inputting sentence, and answer content is not limited to the ability of existing knowledge.

In natural language processing technique field, natural language is given to the algorithm in machine learning to handle, is needed first By natural language mathematicization, term vector is exactly by a kind of mode of natural language mathematicization.In the prior art, it such as uses The term vector of the model trainings such as CBOW, Skip-gram is capable of forming a term vector space.In term vector space, language is removed Size in adopted space, positive and negative (polarity, direction), term vector spatial distribution ≈ semantic space distribution.

In the prior art, it is being expressed by using the term vector trained using Skip-gram as the training pattern of representative It is weak, causes when carrying out similarity calculation to word, either with the Euclidean distance of term vector or cosine in ability The similarity of distance expression word, there is always following problems: the distance between semantic opposite word is more identical than semantic instead The distance between word is close, such as " promotion " and " reduction "；In addition, cannot guarantee that similarity meter for belonging to of a sort word That calculates is accurate, such as " apple " and " banana "；And for different classes of word, distinctiveness, such as fruit can not be embodied The word of class and the word of animal class.As it can be seen that the term vector that the term vector training method of the prior art obtains is in polarity, the class for expressing word Not there is shortcoming, the true semanteme for the word that is beyond expression out in aspect.

Summary of the invention

The embodiment of the present application provides the term vector training method and device of a kind of multi-semantic meaning supervision, to solve the prior art The problem of.

In a first aspect, the embodiment of the present application provides a kind of term vector training method of multi-semantic meaning supervision, comprising:

Include the adopted former vector of former justice according to each of centre word semanteme, generates the weighting semantic vector of each semanteme；

The weighting semantic vector of semantemes all to centre word is weighted summation, generates boot vector；

Construct the distance between the term vector from the boot vector to centre word loss function；

Iteration update side of the term vector of centre word in term vector model training is supervised according to the range loss function To.

Second aspect, the embodiment of the present application provide a kind of term vector training device of multi-semantic meaning supervision, comprising:

It weights semantic vector and expresses module, it is raw for including the adopted former vector of former justice according to each of centre word semanteme At the weighting semantic vector of each semanteme；

Boot vector expresses module, is weighted summation for the weighting semantic vector to all semantemes of centre word, Generate boot vector；

Range loss constructing module, for constructing the loss of the distance between term vector from the boot vector to centre word Function；

Supervision module, for supervising the term vector of centre word according to the range loss function in term vector model training Iteration more new direction.

From the above technical scheme, the embodiment of the present application provide a kind of supervision of multi-semantic meaning term vector training method and Device, firstly, including the adopted former vector of former justice according to each of centre word semanteme, generates during term vector training The weighting semantic vector of each semanteme；Then, the weighting semantic vector of semantemes all to centre word is weighted summation, raw At boot vector；Then, the distance between the term vector of construction from the boot vector to centre word loss function；Finally, root According to iteration more new direction of the term vector of range loss function supervision centre word in term vector model training, thus in making The term vector of heart word is drawn close towards the multiple semanteme of centre word in the training process.To, solve word in the prior art to There is shortcoming, the true language for the word that is beyond expression out in terms of the polarity of expression word, classification in the term vector that amount training method obtains The problem of justice.

Detailed description of the invention

In order to illustrate more clearly of the technical solution of the application, letter will be made to attached drawing needed in the embodiment below Singly introduce, it should be apparent that, for those of ordinary skills, without any creative labor, It is also possible to obtain other drawings based on these drawings.

Fig. 1 shows the schematic diagram of the term vector model basic structure of the prior art；

Fig. 2 is a kind of flow chart of the term vector training method of multi-semantic meaning supervision shown in the embodiment of the present application；

Fig. 3 is the flow chart of the term vector training method step S1 of multi-semantic meaning supervision shown in the embodiment of the present application a kind of；

Fig. 4 is the flow chart of the term vector training method step S2 of multi-semantic meaning supervision shown in the embodiment of the present application a kind of；

Fig. 5 is the flow chart of the term vector training method step S21 of multi-semantic meaning supervision shown in the embodiment of the present application a kind of；

Fig. 6 is the flow chart of the term vector training method step S4 of multi-semantic meaning supervision shown in the embodiment of the present application a kind of；

The term vector training method that Fig. 7 supervises for a kind of multi-semantic meaning shown in the embodiment of the present application migrates flow chart

Fig. 8 is a kind of schematic diagram of the term vector training device of multi-semantic meaning supervision shown in the embodiment of the present application.

Specific embodiment

In order to make those skilled in the art better understand the technical solutions in the application, below in conjunction with the application reality The attached drawing in example is applied, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described implementation Example is merely a part but not all of the embodiments of the present application.Based on the embodiment in the application, this field is common The application protection all should belong in technical staff's every other embodiment obtained without making creative work Range.

In natural language processing technique field, natural language is given to the algorithm in machine learning to handle, is needed first By natural language mathematicization, term vector is exactly a kind of mode for indicating natural language mathematicization.

One-hot type term vector is exactly a kind of mode for indicating natural language mathematicization, and One-hot type term vector is one A high-dimensional term vector, dimension indicate the quantity of word in dictionary, the word in dictionary are arranged according to certain sequence, Mei Gewei Degree means that a word, and therefore, in One-hot type term vector, only one dimension is 1, remaining dimension is 0.

For example, there is 1000 words in dictionary, " apple " is the 3rd word in this dictionary, and " banana " is in this dictionary The 4th word, then, for the word in this dictionary, One-hot type term vector is exactly 1000 dimensional vectors, in which:

Apple=[0,0,1,0,0,0,0 ...]

Banana=[0,0,0,1,0,0,0 ...]

One-hot type term vector, which seems, realizes natural language mathematicization expression, still, in text matches, text classification Etc. application fields, generally involve similarity calculation semantic between word.And each dimension of One-hot type independently indicates one Word can not embody the semantic similarity relation between word, such as: " apple " and " banana " is same to indicate fruit, if with term vector COS distance indicate word semantic similarity relation, under the representation method of One-hot type term vector, " apple " and " banana " it is remaining String similarity is 0, and be beyond expression out any semantic similarity relation.

In the case where One-hot type term vector is beyond expression the similarity relation between word, those skilled in the art are main Distributed Representation type term vector is used to indicate as by natural language mathematicization.This term vector is one The term vector of kind low dimensional, can be regarded as the term vector space for constructing a low dimensional, each word is as term vector space In a point, semantic more similar word, the distance in term vector space is closer, and this term vector shows as following shape Formula: [0.792, -0.177, -0.107,0.109, -0.542 ...].

In the prior art, those skilled in the art pass through the related term vector model of training Word2vec, obtain above-mentioned shape The low-dimensional term vector of formula.These term vector models substantially construct a semantic mapping from word itself to word, existing skill In art, those skilled in the art using centre word or upper and lower word as the input of term vector model, using upper and lower word or centre word as The output of term vector model, to train term vector.It should be added that centre word and upper and lower word are relative concept, one A participle is either centre word is also the word up and down of other participles simultaneously.

Specifically, Fig. 1 shows the schematic diagram of the term vector model basic structure of the prior art.As shown in Figure 1, term vector The basic structure of model can be regarded as two layers oppositely arranged of neural network, including input layer, hidden layer and output layer, In, W1 indicate term vector model in input layer arrive hidden layer neural network weight, W2 expression term vector model in hidden layer arrive The neural network weight of output layer.Currently, one is skip-gram there are mainly two types of the term vector training patterns of the prior art Model, this model predict the term vector of word up and down using the term vector of centre word as input；Another kind is CBOW model, this Kind model predicts the term vector of centre word using the term vector of upper and lower word as input.

The training of term vector model is an iterative process.In a kind of iterative process, the term vector being predicted is being exported Layer obtains training result, there are error between the term vector of the centre word or upper and lower word of this training result and output layer input, Update the hidden layer weight of term vector model using back-propagation algorithm in each round iterative process according to this error, thus Realize that the continuous iteration of term vector updates.

By the above-mentioned prior art it is found that when term vector model skip-gram model, the term vector training result of upper and lower word It is drawn close to the term vector of centre word, therefore, the term vector that the prior art trains will appear following situation:

For example, there is following corpus in training sample:

Product sales volume is the same as compared with last yearIt is promoted15 percentage points

Product price is the same as compared with last yearDecline10 percentage points

In sample above, upper and lower word " promotion " and " decline " are semantic on the contrary, still, due to word up and down in the prior art Term vector (such as: centre word is " last year ") is drawn close to the term vector of centre word, therefore, the term vector for passing through the prior art is instructed The term vector for practicing " promotion " and " decline " that method obtains may have close distance, or even the synonym distance than them It is closer, so that the distinctiveness of semantic two opposite words can not be embodied.In addition, it can be appreciated that even closely located is same A kind of word, for example, " apple " " banana " due to their centre word may be not belonging to same class (such as: " plantation " " picking " etc. Verb), therefore, the term vector of " apple " " banana " that training obtains cannot guarantee that apart from upper close.

As it can be seen that the term vector training method of the prior art, can not effectively express the polarity of word in the expression of term vector, That is: similar word cannot give expression to similitude, and inhomogeneity word cannot give expression to distinctiveness.Therefore, the word that the prior art trains to Amount cannot be well close to the true semanteme of word.

Currently, term vector is widely used in the tasks such as text classification or the voice response of artificial intelligence field.Example Such as, in text classification field, those skilled in the art can collect a large amount of corpus text, and carry out contingency table to corpus text Note；Then, the corpus text after classification annotation is segmented, and corpus text is obtained according to the corresponding term vector of each participle Term vector sequence；Then, the term vector sequence of corpus text and classification annotation result are input to such as Sequence to In the disaggregated models neural network based such as Sequence model (model of sequence to sequence), with train classification models, make point Class model has text classification ability.Wherein, a term vector space, the dimension in the term vector space can be regarded as in disaggregated model Degree is equal to the dimension of term vector, and each term vector can be with a coordinate points in equivalent vector space, the corpus of each classification The term vector sequence of text respectively corresponds the set of the coordinate points in term vector space in a concentrated area.So, it is using When disaggregated model carries out text classification, by the term vector sequence inputting of text to be sorted into disaggregated model, make term vector model Differentiate the term vector sequence of text to be sorted is distributed in which region in term vector space, region distance corresponding with which classification Recently, to provide the classification results of prediction.

In above-mentioned disaggregated model, if the term vector used is of low quality, such as term vector cannot be expressed effectively The polarity of word will lead to the big term vector of semantic difference being closer in term vector space or the word that semantic difference is small Distance of the vector in term vector space farther out, to reduce the quality of disaggregated model, therefore the height of term vector quality is to determine Determine the basic condition of text classification or voice response accuracy.

In order to solve the problems in the existing technology, the embodiment of the present application provides a kind of term vector of multi-semantic meaning supervision Training method and device.

The technical solution of the application in order to facilitate understanding by those skilled in the art, to technical solution provided by the present application into Row detailed description before, first to this application involves to some professional conceptuals make specific explanations.

Firstly, training term vector is needed using sample corpus, sample corpus can be one or more text fragments, sentence Deng.Before training term vector, it is necessary first to sample corpus is segmented, then, in the word segmentation result of sample corpus, such as Fruit word centered on one of participle, then the preceding C (C is more than or equal to 1 integer) of the centre word is a segments and rear C Participle can be known as the word up and down of the centre word.

Specifically, it presets one and takes word window value C, take word range and quantity for define upper and lower word, C value is integer, And C is greater than or equal to 1.When the participle quantity for being located at centre word front and back is all larger than C, the quantity of the word up and down of acquisition is 2C；When the participle quantity for being located at centre word front or behind is less than C, all points are got less than the direction of C in participle quantity Until word, the quantity of word is less than 2C up and down at this time.

Illustratively, centre word is " apple ", segmentation sequence are as follows: I wants to buy an Apple Computers

As C=1, upper and lower word is "one" and " computer ".

As C=2, word " wanting to buy " up and down and "one" are got in front of " apple "；It is got from the rear of " apple " Lower word " computer "；Therefore, the word up and down got from segmentation sequence is " wanting to buy " "one" " computer ".

Further, the participle including centre word and upper and lower word may each comprise multiple semantemes, and each semanteme can be into One step the is divided into multiple atoms semantic, and atom semanteme is the semantic minimum unit that can not continue segmentation of expression, also referred to as adopted It is former.Wherein, the semanteme and adopted original of centre word and upper and lower word can be obtained from HowNet (Hownet).Hownet be one with Chinese and Concept representated by the word of English is description object, to disclose between concept and concept and between attribute possessed by concept Relationship be basic content commonsense knowledge base.In Hownet, adopted original is minimum that is most basic, being not easy to the meaning divided again Unit, a word can have multiple semantemes, and it is former that each semanteme may include multiple justice.Such as: there are two centre word " apple " is total Semanteme, i.e. " computer " and " fruit ", wherein such as " computer " includes that justice is former " pattern value " " energy " " carrying " " specific brand ".? Hownet, for the quantity of word, the former quantity of justice is extremely limited, and the semanteme of word can be subdivided into Hownet Several justice are former, so adopted original, which can exist, intersects in the semanteme of different terms.Therefore, if going to generate participle with adopted former vector Semantic vector can not only give expression to the true semanteme of participle, additionally it is possible to embody the relationship between the semanteme of different participles.

With reference to the accompanying drawing, technical solution provided by the embodiments of the present application is illustrated.

Here is the present processes embodiment.

It referring to fig. 2, is a kind of flow chart of the term vector training method of multi-semantic meaning supervision shown in the embodiment of the present application, such as Shown in Fig. 2, it the described method comprises the following steps:

Step S1 includes the adopted former vector of former justice according to each of centre word semanteme, generates the weighting language of each semanteme Adopted vector.

Wherein, the former vector of justice generates at random before term vector model training starts, and continuous iteration is more in the training process Newly.Weighting semantic vector can be weighted summation by the former vector of the former justice of the justice for being included to semanteme and obtain.

In Hownet, the former quantity of justice seldom (about 2000), can by the weighting semantic vector that adopted former vector generates Closeness relation between expression semanteme well, such as: weighting semantic vector A is generated by the former vector of tri- justice of a, b, c, weights language Adopted vector B is generated by the former vector of tri- justice of a, d, e, comprising the former vector a of justice in two weighting semantic vectors, therefore, the two Weighting semanteme corresponding to semantic vector, there are close ingredients.

It is a kind of term vector training method step of multi-semantic meaning supervision shown in the embodiment of the present application with further reference to Fig. 3 The flow chart of S1, in a kind of selectable embodiment of other some embodiments of the present embodiment or the application, step S1 can With the following steps are included:

Each former weight of semantic justice is arranged according to the former quantity of the semantic justice of each of centre word in step S11.

In the embodiment of the present application, basic ideas are: the former quantity of the justice possessed according to each semanteme of centre word determines justice Former weight, it may be assumed that the former quantity of justice is more in semanteme, and the former weight of each justice is with regard to smaller, and the former quantity of justice is fewer in semanteme, and each justice is former Weight it is bigger.Illustratively, centre word shares N number of semanteme, if the former quantity of justice is M in a semanteme of jth (1≤j≤N), And total weight of each semantic justice original is 1, then, the former justice original weight of each justice is just 1/M in j-th of semanteme, it may be assumed that every The identical former weight of justice of the former setting of justice in a semanteme, the numerical value of justice original weight are equal to the inverse of the former quantity of justice.

Step S12 is weighted summation to the former vector of the justice in each semanteme, generates according to the former weight of the justice The weighting semantic vector of each semanteme.

In the embodiment of the present application, basic ideas are: firstly, initialization assignment is carried out to adopted former vector at random, it then, will be adopted Former vector is multiplied by its corresponding former weight of justice, to obtain the weighting of adopted former vector, then, the weighted sum of justice original vector is obtained To weighting semantic vector.

Illustratively, j-th of the centre word semantic former vector of k-th of justice isSo weighting of the former vector of the justice are as follows:

If j-th of semantic weighting semantic vector of centre word is expressed asSo,Calculation method are as follows:

It is easily understood that in the prior art in the method for training term vector, the word usually centered on upper and lower word Semanteme draws close centre word to upper and lower word, alternatively, being the semanteme using centre word as upper and lower word, leans on upper and lower word to centre word Hold together, since the term vector of upper and lower word and centre word is all that random initializtion generates, the term vector that the prior art trains Be beyond expression out the true semanteme of centre word and upper and lower word, therefore term vector is of low quality.And in the application, centre word adds Power semantic vector is can to express the true semanteme of centre word by weighting to obtain to adopted former vector.

Step S2, the weighting semantic vector of semantemes all to centre word are weighted summation, generate boot vector.

In the embodiment of the present application, summation is weighted by the weighting semantic vector of all semantemes to centre word, makes to give birth to At boot vector can give expression to the multiple semanteme of centre word, if going to draw using the boot vector that the embodiment of the present application generates The term vector of the raw centre word of artificial delivery, then the term vector of centre word can give expression to the multiple semanteme of centre word, to improve Term vector is to the polarity of word and the ability to express of type.

It is a kind of term vector training method step of multi-semantic meaning supervision shown in the embodiment of the present application with further reference to Fig. 4 The flow chart of S2, in a kind of selectable embodiment of other some embodiments of the present embodiment or the application, step S2 can With the following steps are included:

Step S21 generates the context vector of centre word according to the term vector of upper and lower word.

One participle generally comprises multiple semantemes, which semantic and its institute this participle specific manifestation in a sentence goes out The context at place has certain relationship, and what the context vector in the embodiment of the present application was expressed is exactly the context of centre word.

It is a kind of term vector training method step of multi-semantic meaning supervision shown in the embodiment of the present application with further reference to Fig. 5 The flow chart of S21, in a kind of selectable embodiment of the present embodiment or other some embodiments of the invention, step S21 It may comprise steps of:

Step S211 determines the word up and down of the preset quantity of centre word according to preset window size.

As C=1, upper and lower word is "one" and " computer ".

Step S212, the term vector weighted sum to upper and lower word, generates the context vector.

In the embodiment of the present application, the context weight being weighted for the term vector to upper and lower word can be set.

As a kind of selectable embodiment, identical context weight can be set to all words up and down, it may be assumed that when upper and lower word Quantity when being Q, the context weight of each word up and down is 1/Q.

It, can be different for the setting of upper and lower word at a distance from centre word according to upper and lower word as another selectable embodiment Context weight, it may be assumed that biggish context weight is set apart from closer word up and down to centre word, to centre word apart from farther Word up and down lesser context weight is set.

Such as: centre word is " apple ", segmentation sequence are as follows: I wants to buy an Apple Computers, and upper and lower word is " wanting to buy " " one It is a " " computer ".Wherein, "one" and " computer " are adjacent with centre word " apple ", it is believed that distance is 1, " wanting to buy " and centre word " apple One, fruit " interval participle, it is believed that distance is 2；Therefore, biggish context weight, such as 0.35 are set for "one" and " computer ", And lesser context weight, such as 0.3 are set for " wanting to buy ".It is easily understood that in the embodiment of the present application, to upper and lower word The occurrence of the different context weight of setting is only used as schematical example, does not constitute the specific restriction to the present embodiment, ability Domain those of ordinary skill, can quantity according to upper and lower word and upper and lower word under the enlightenment of technical thought provided in this embodiment The distance of distance center word upper and lower word is arranged the context weight for meeting and needing, these designs that can be used here all do not have Away from spirit and scope of the invention.

On the basis of context weight has been determined, following formula can be used and generate the context vector:

T_c=T '₁×H₁+……T’_Q×H_Q

Wherein: T_cFor the context vector of center word, Q is the quantity of upper and lower word, T '₁~T '_QRespectively Q up and down word word to Amount, H₁~H_QThe context weight of about respectively Q words.

Work as H₁~H_QEqual, i.e., when the context weight of each word up and down is 1/Q, above formula can be write as:

Wherein, T '_iFor the term vector of about i-th word of center word, 1≤i≤Q.

Step S22 obtains the concern coefficient of each of the context vector and centre word weighting semantic vector respectively.

Specifically, the concern coefficient that following formula generates each random semantic vector can be used:

Wherein: centre word shares N number of semanteme；Indicate the pass of j-th of semantic random semantic vector of centre word Infuse coefficient, 1≤j≤N；T_cFor context vector；For j-th of semantic random semantic vector of center word；For center word P-th of semantic random semantic vector, 1≤p≤N:For the context vector of center word and j-th of semanteme of centre word Weighting semantic vector similarity；Indicate that taking e to j-th of centre word semantic corresponding similarity is bottom Logarithm；Indicating that the corresponding similarity of semanteme each to centre word takes e respectively is the logarithm at bottom, so After sum.

Step S23 is weighted summation to the weighting semantic vector of centre word according to the concern coefficient, generates The boot vector.

In the embodiment of the present application, basic ideas are to be weighted using concern coefficient to the weighting semantic vector of centre word, And sum to weighted results, using summed result as boot vector.

Specifically, boot vector can be generated with following formula.

W ' t is boot vector；N is the quantity of center word justice；For j-th of weighting semantic vector of center word Concern coefficient；For j-th of semantic weighting semantic vector of center word.

In the embodiment of the present application, concern coefficient is calculated according to the weighting semantic vector of centre word and the context vector of centre word It generates, embodies context to the influence degree of each of centre word semanteme；And boot vector is further by the weighting language of centre word The concern coefficient weighted sum of adopted vector sum obtains, therefore, boot vector can give expression to centre word it is true semantic while, The context of centre word can also be given expression to semantic influence.

Step S3 constructs the distance between the term vector from the boot vector to centre word loss function.

In order to solve term vector that term vector training method in the prior art obtains in the polarity and classification of expression word Defect, the embodiment of the present application without using upper and lower word term vector centered on word term vector iterative target, but use The boot vector generated in step s 2, construction loss function the distance between from the term vector of centre word to boot vector, Since boot vector is derived from multiple weighting semantic vectors of centre word, the range loss function of the embodiment of the present application construction, Establish centre word term vector and centre word it is multiple it is true semanteme between contacting, guide the term vector of centre word towards Iteration is gone to update close to the direction of the multiple true semanteme of centre word.Therefore, the term vector energy that the embodiment of the present application training obtains The true semanteme for enough giving expression to centre word compensates for lacking in the polarity and classification of term vector expression word existing in the prior art It falls into.

Specifically, the term vector of centre word and the COS distance of boot vector are obtained, using the COS distance as distance Loss function.

Wherein, the COS distance function can be following form:

Wherein: cos θ is COS distance；A_iIndicate the value of boot vector i-th dimension；B_iIndicate the term vector i-th dimension of centre word Value；N indicates the default dimension of term vector, for example, when the dimension of term vector is 200 dimension, n=200.

Step S4 supervises iteration of the term vector of centre word in term vector model training according to the range loss function More new direction.

The prior art is to go the term vector of supervision centre word in term vector model training using the term vector of upper and lower word It repeatedly updates for direction.In the embodiment of the present application, basic ideas are the term vectors that word up and down is substituted using range loss function, thus The word for the centre word for making the term vector of the centre word of term vector model training go iteration towards boot vector, and then generating training The multiple true semanteme of vector towards centre word goes to draw close, and solves the term vector that the term vector training method of the prior art obtains and exists There is shortcoming in polarity, the classification aspect of expression word, can not embody the problem of the true semanteme of word.

It is a kind of term vector training method step of multi-semantic meaning supervision shown in the embodiment of the present application with further reference to Fig. 6 The flow chart of S4, in a kind of selectable embodiment of the present embodiment or other some embodiments of the invention, step S4 can With the following steps are included:

Step S41 obtains the range loss function to the local derviation of each connection weight in term vector model.

Such as the basic structure of the term vectors model such as CBOW, Skip-gram can be regarded as the nerve of two Opposite direction connections Network, wherein the hidden layer weight of term vector model is exactly the connection weight of each network node of this neural network, instruction Practice term vector process be exactly its connection weight is constantly updated in iteration, make the output result of neural network level off to reduction away from Process from loss function.Therefore, the basic ideas of the embodiment of the present application are: in order to update these connection weights, obtaining first The local derviation of each connection weight in term vector model.

Step S42 updates the connection weight according to the local derviation.

Wherein, the local derviation of each connection weight has reacted each connection weight and has adjusted the distance the influence value of loss function.In step In rapid S42, the local derviation of each connection weight is carried out by backpropagation by using chain rule, to get term vector model Output layer adjust the distance the influence value of loss function to hidden layer and hidden layer to the connection weight of input layer.

Specifically, during updating connection weight settable neural network learning rate, can iteration just A biggish learning rate is arranged in phase, then so that learning rate is constantly decayed in an iterative process, prevents pace of learning too fast And the update of connection weight is caused to fall into randomized jitter, local minimum or diverging.

Step S43 updates the term vector of centre word using the updated connection weight in next round iteration.

By the continuous renewal of the connection weight of term vector model, the term vector of centre word is also constantly updated in training, Ultimately generate the term vector that can be used in into production environment.

In addition, term vector model in the prior art, usually using the term vector of centre word as input, with upper and lower word Term vector is as output, alternatively, using the term vector of upper and lower word as input, using the term vector of centre word as output.Due in The term vector of the term vector of heart word and upper and lower word is the term vector generated at random, and therefore, such term vector is in expression word Equal existing defects in semantic and context.In order to overcome drawbacks described above, in the technical solution that the application example provides, term vector model Using the term vector of centre word as input, using output vector as output.

Specifically, the output vector in the embodiment of the present application generates the base of concern coefficient in step S21 and step S22 On plinth, generated by following steps:

Step S24 is weighted summation to the semantic vector of centre word, generates centre word according to the concern coefficient Output vector.

In the embodiment of the present application, basic ideas are to be weighted using the concern coefficient to the semantic vector of centre word, And sum to weighted results, by the output vector of word centered on summed result.Wherein, the semantic vector of centre word is in The weighting semantic vector of heart word is different concept, and the semantic vector of centre word is that random initializtion generates, and word to Iteration in the training process of model is measured to update, and the weighting semantic vector of centre word is weighted by adopted former vector.

Specifically, the output vector of centre word can be generated with following formula.

Wt is the output vector of center word；N is the quantity of center word justice；For j-th of semanteme of center word Weight the concern coefficient of semantic vector；For j-th of semantic semantic vector of center word.

In the embodiment of the present application, output vector can give expression to semantic influence of the context to centre word of centre word, because This, the output vector for using the application to generate as the output of term vector model, can make the word of trained obtained centre word to It measures while expressing the polarity and classification of word well, moreover it is possible to give expression to influence of the context to semanteme, therefore the matter of term vector It measures higher.

Further, since carrying out initializing it to adopted former vector, semantic vector and term vector in term vector training process Afterwards, the numerical value of these vectors is constantly changed in an iterative process, and therefore, entire term vector model will be one not The time-dependent model of disconnected renewal learning.In order to making term vector model tend towards stability, make term vector steadily towards guidance to The direction of amount goes iteration to update, and the embodiment of the present application can also include migrating process after step s4.

Referring to Fig. 7, process is migrated for a kind of term vector training method of multi-semantic meaning supervision shown in the embodiment of the present application Figure, specific following steps:

Step S61 extracts the term vector that term vector model generates after the iteration that term vector model completes preset times With adopted former vector.

Specifically, after the iteration that term vector model completes certain number, the term vector of term vector model training generation With adopted former vector have been able to reach for production environment (such as: for carrying out text classification, the production environments such as intelligent answer In) standard, at this point, in the embodiment of the present application, the former vector of term vector and justice that term vector model generates is extracted, as word The material for migrating study of vector model.

Step S62 continues the input parameter of training using the term vector extracted and the former vector of justice as term vector model.

Specifically, in the embodiment of the present application, continue to instruct using the term vector extracted and the former vector of justice as term vector model Experienced input parameter, to substitute the initial parameter being randomly generated used when term vector model training.

In addition, the term vector extracted and the former vector of justice can also migrate to other term vector models, as other words to The initial parameter for measuring model can greatly save the term vector term vector training time and improve the matter for the term vector that training generates Amount.

Step S63, when term vector model continues to train, the value of the fixed former vector of justice, make term vector model iteration more The value of term vector is only updated when new.

Specifically, it is generated due to semantic vector and term vector by adopted former vector, if the former vector of fixed justice Value, is equal to secure the foundation of term vector model, and term vector model tends to stablize, so that term vector can steadily court Boot vector direction update, improve the quality of term vector.

From the above technical scheme, the embodiment of the present application provides a kind of term vector training method of multi-semantic meaning supervision, During term vector training, firstly, including the adopted former vector of former justice according to each of centre word semanteme, each language is generated The weighting semantic vector of justice；Then, the weighting semantic vector of semantemes all to centre word is weighted summation, generates guidance Vector；Then, the distance between the term vector of construction from the boot vector to centre word loss function；Finally, according to described Range loss function supervises iteration more new direction of the term vector of centre word in term vector model training, to make centre word Term vector is drawn close towards the multiple semanteme of centre word in the training process.To solve term vector training in the prior art There is shortcoming in the term vector that method obtains, the true semantic of the word that is beyond expression out is asked in terms of the polarity of expression word, classification Topic.

Here is the Installation practice of the application, can be used for executing the present processes embodiment, which includes being used for Execute the software module of each step of the present processes embodiment.For undisclosed thin in the application Installation practice Section, please refers to the application embodiment of the method.

It is a kind of schematic diagram of the term vector training device of multi-semantic meaning supervision shown in the embodiment of the present application, such as referring to Fig. 8 Shown in Fig. 8, described device includes:

It weights semantic vector and expresses module 71, for including the adopted former vector of former justice according to each of centre word semanteme, Generate the weighting semantic vector of each semanteme；

Boot vector expresses module 72, is weighted and asks for the weighting semantic vector to all semantemes of centre word With generation boot vector；

Range loss constructing module 73, for constructing the distance between term vector from the boot vector to centre word damage Lose function；

Supervision module 74, for supervising the term vector of centre word according to the range loss function in term vector model training When iteration more new direction.

From the above technical scheme, the embodiment of the present application provides a kind of term vector training device of multi-semantic meaning supervision, During term vector training, firstly, including the adopted former vector of former justice according to each of centre word semanteme, each language is generated The weighting semantic vector of justice；Then, the weighting semantic vector of semantemes all to centre word is weighted summation, generates guidance Vector；Then, the distance between the term vector of construction from the boot vector to centre word loss function；Finally, according to described Range loss function supervises iteration more new direction of the term vector of centre word in term vector model training, to make centre word Term vector is drawn close towards the multiple semanteme of centre word in the training process.To solve term vector training in the prior art There is shortcoming in the term vector that method obtains, the true semantic of the word that is beyond expression out is asked in terms of the polarity of expression word, classification Topic.

The application can be used in numerous general or special purpose computing system environments or configuration.Such as: personal computer, service Device computer, handheld device or portable device, laptop device, multicomputer system, microprocessor-based system, top set Box, programmable consumer-elcetronics devices, network PC, minicomputer, mainframe computer, including any of the above system or equipment Distributed computing environment etc..

The application can describe in the general context of computer-executable instructions executed by a computer, such as program Module.Generally, program module includes routines performing specific tasks or implementing specific abstract data types, programs, objects, group Part, data structure etc..The application can also be practiced in a distributed computing environment, in these distributed computing environments, by Task is executed by the connected remote processing devices of communication network.In a distributed computing environment, program module can be with In the local and remote computer storage media including storage equipment.

It should be noted that, in this document, the relational terms of such as " first " and " second " or the like are used merely to one A entity or operation with another entity or operate distinguish, without necessarily requiring or implying these entities or operation it Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant are intended to Cover non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or setting Standby intrinsic element.

Those skilled in the art will readily occur to its of the application after considering specification and practicing application disclosed herein Its embodiment.This application is intended to cover any variations, uses, or adaptations of the application, these modifications, purposes or Person's adaptive change follows the general principle of the application and including the undocumented common knowledge in the art of the application Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the application are by following Claim is pointed out.

It should be understood that the application is not limited to the precise structure that has been described above and shown in the drawings, and And various modifications and changes may be made without departing from the scope thereof.Scope of the present application is only limited by the accompanying claims.

Claims

1. a kind of term vector training method of multi-semantic meaning supervision characterized by comprising

Iteration more new direction of the term vector of centre word in term vector model training is supervised according to the range loss function；

The weighting semantic vector of the semanteme all to centre word is weighted summation, generates boot vector, comprising:

The context vector of centre word is generated according to the term vector of upper and lower word；

The concern coefficient of each of the context vector and centre word weighting semantic vector is obtained respectively；

According to the concern coefficient, summation is weighted to the weighting semantic vector of centre word, generates the boot vector.

2. according to each of centre word semanteme including adopted former the method according to claim 1, wherein described Adopted original vector, generates the weighting semantic vector of each semanteme, comprising:

According to the former quantity of the semantic justice of each of centre word, to the former weight of setting justice of each semanteme；

According to the former weight of the justice, summation is weighted to the former vector of the justice in each semanteme, generates the institute of each semanteme State weighting semantic vector.

3. the method according to claim 1, wherein the term vector of word generates the language of centre word above and below the basis Border vector, comprising:

The word up and down of the preset quantity of centre word is determined according to preset window size；

Term vector weighted sum to upper and lower word generates the context vector.

4. the method according to claim 1, wherein the term vector of the construction from boot vector to centre word it Between range loss function, comprising:

The term vector of centre word and the COS distance of the boot vector are obtained, using the COS distance as the range loss Function.

5. the method according to claim 1, wherein it is described according to range loss function supervise centre word word to After iteration more new direction of the amount in term vector model training, further includes:

After the iteration that term vector model completes preset times, the term vector and the former vector of justice that term vector model generates are extracted；

Continue the input parameter of training using the term vector extracted and the former vector of justice as term vector model；

When term vector model continues to train, the value of the fixed former vector of justice makes term vector model only more neologisms when iteration updates The value of vector.

6. method according to claim 1 or 3, which is characterized in that further include:

According to the concern coefficient, summation is weighted to the semantic vector of centre word, generates the output vector of centre word；

Wherein, the term vector model is using the term vector of upper and lower word as input, using the output vector as output.

7. the method according to claim 1, wherein it is described according to range loss function supervise centre word word to Measure the iteration more new direction in term vector model training, comprising:

The range loss function is obtained to the local derviation of each connection weight in term vector model；

The connection weight is updated according to the local derviation；

In next round iteration, the term vector of centre word is updated using the updated connection weight.

8. according to the method described in claim 2, it is characterized in that, the former weight of the justice is falling for the former quantity of justice that semanteme includes Number.

9. a kind of term vector training device of multi-semantic meaning supervision characterized by comprising

It weights semantic vector and expresses module, for including the adopted former vector of former justice according to each of centre word semanteme, generate every The weighting semantic vector of a semanteme；

Boot vector expresses module, is weighted summation for the weighting semantic vector to all semantemes of centre word, generates Boot vector；

Range loss constructing module, for constructing the distance between term vector from the boot vector to centre word loss letter Number；

Supervision module, for supervising the term vector of centre word changing in term vector model training according to the range loss function Generation more new direction；

The boot vector expresses module, is weighted summation for the weighting semantic vector to all semantemes of centre word, Generate boot vector, comprising: