CN109117480A

CN109117480A - Word prediction technique, device, computer equipment and storage medium

Info

Publication number: CN109117480A
Application number: CN201810942238.4A
Authority: CN
Inventors: 黄羿衡; 苏丹
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-08-17
Filing date: 2018-08-17
Publication date: 2019-01-01
Anticipated expiration: 2038-08-17
Also published as: CN110377916A; CN110377916B; CN109117480B

Abstract

This application discloses a kind of word prediction technique, device, computer equipment and storage mediums, and in the method, computer equipment obtains the first contextual information that the word sequence being used for before the current word and the current word of prediction has；Based on current word and the first contextual information, determine that the word to be predicted after the current word is belonging respectively to the probability in multiple and different fields；For each field, it is based on the current word and the first contextual information, determines that each word in vocabulary is belonging respectively to the first possibility of the word to be predicted；It is belonging respectively to the first possibility that each word in the corresponding vocabulary of probability and each field in multiple and different fields is belonging respectively to word to be predicted according to the word to be predicted, determines that each word in the vocabulary is belonging respectively to the second possibility of the word to be predicted.The accuracy of next word probability of occurrence after some word of prediction can be improved in the scheme of the application, is conducive to the precision for improving prediction sentence probability of occurrence.

Description

Word prediction technique, device, computer equipment and storage medium

Technical field

This application involves technical field of data processing more particularly to a kind of word prediction technique, device, computer equipment and deposit Storage media.

Background technique

Language model is widely used in the multiple fields such as speech recognition and machine translation.The work of language model is used tricks The probability of sentence appearance is calculated, to pick out the sentence for being best suitable for human language from multiple candidate sentences.Such as, with For speech recognition scene, input voice may be identified multiple candidate sentences, and these candidate sentences have a bit Erroneous words or grammer, and the language of the mankind is not met, in that case, it is necessary to be exported using language model each The candidate rational probability of sentence.

During language model determines a sentence probability of occurrence to be predicted, need to determine in the sentence to be predicted For the current word of prediction, and each word in the vocabulary of the determining language model belongs to next word after the current word The probability of (word i.e. to be predicted).However, each word belongs to word to be predicted after the current word in language model prediction vocabulary at present The prediction accuracy of probability is generally relatively low, so that language model determines that the accuracy of sentence probability of occurrence is lower.

Summary of the invention

In view of this, this application provides a kind of word prediction technique, device, computer equipment and storage medium, to improve Predict the accuracy of next word probability of occurrence after some word.

To achieve the above object, on the one hand, this application provides a kind of word prediction techniques, comprising:

The first contextual information that word sequence before obtaining current word and the current word for prediction has；

Based on the current word and the first contextual information, determine that the word to be predicted after the current word is belonging respectively to The probability in multiple and different fields；

For each field, it is based on the current word and the first contextual information, determines that each word belongs to respectively in vocabulary In the first possibility of the word to be predicted, first possibility is the case where the word to be predicted belongs to the field Under, a possibility that word in the vocabulary belongs to the word to be predicted；The vocabulary is to be constructed in advance comprising multiple words Set；

The corresponding vocabulary of probability and each field in multiple and different fields is belonging respectively to according to the word to be predicted In each word be belonging respectively to the first possibility of the word to be predicted, determine that each word is belonging respectively to the word to be predicted in the vocabulary The second possibility.

In one possible implementation, the vocabulary is high frequency vocabulary, and the high frequency vocabulary in total vocabulary by being made It being constituted with the higher multiple words of frequency, total vocabulary is the set comprising multiple words constructed in advance, and, word in total vocabulary Total quantity more than word in the high frequency vocabulary total quantity；

Further include:

Based on the current word and the first contextual information, determine that each word is belonging respectively to described to be predicted in low frequency vocabulary The third possibility of word, the low frequency vocabulary are made of the multiple words for being not belonging to the high frequency vocabulary in total vocabulary；

Each word in the second possibility and low frequency vocabulary of word to be predicted is belonging respectively to according to each word in the high frequency vocabulary It is belonging respectively to the third possibility of the word to be predicted, each word in total vocabulary is constructed and is belonging respectively to the word to be predicted Possibility.

Another aspect, present invention also provides a kind of word prediction meanss, comprising:

Acquiring unit is inputted, is had for the word sequence before obtaining the current word and the current word that are used for predicting First contextual information；

Domain prediction unit, for being based on the current word and the first contextual information, after determining the current word Word to be predicted be belonging respectively to the probability in multiple and different fields；

First predicting unit is based on the current word and the first contextual information, determines for being directed to each field Each word is belonging respectively to the first possibility of the word to be predicted in vocabulary, and first possibility is to belong in the word to be predicted In the case where the field, a possibility that word in the vocabulary belongs to the word to be predicted；The vocabulary is to construct in advance The set comprising multiple words；

Second predicting unit, for being belonging respectively to the probability in multiple and different fields according to the word to be predicted, and it is each Each word is belonging respectively to the first possibility of the word to be predicted in the corresponding vocabulary in field, determines each word point in the vocabulary Do not belong to the second possibility of the word to be predicted.

Another aspect, present invention also provides a kind of computer equipments, comprising:

Processor and memory；

Wherein, the processor is for executing the program stored in the memory；

For storing program, described program is at least used for the memory:

It is executable to be stored with computer present invention also provides a kind of storage medium for another aspect in the storage medium Instruction when the computer executable instructions are loaded and executed by processor, realizes as above described in any item word prediction techniques.

As it can be seen that in the embodiment of the present application, after getting for the current word of prediction, according to current word and can work as The contextual information that word sequence between preceding word has analyzes the word to be predicted (next word of current word) after the current word Belong to the probability in multiple and different fields, and determines each word in vocabulary in the case where the word to be predicted belongs to each field respectively A possibility that belonging to the word to be predicted.Due to the difference of word fields to be predicted, this can be belonged to pre- to word each in vocabulary A possibility that surveying word has an impact, and therefore, the probability of every field is belonged in conjunction with word to be predicted, and belong to not in word to be predicted It is a possibility that each word belongs to the word to be predicted in vocabulary in the case where same domain, comprehensive to determine that it is to be predicted to belong to this for each word in vocabulary The accuracy that each word in the vocabulary predicted belongs to next word after current word can be improved, thus favorably in a possibility that word In the precision for improving the prediction affiliated sentence probability of occurrence of the current word.

Detailed description of the invention

In order to more clearly explain the technical solutions in the embodiments of the present application, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, the accompanying drawings in the following description is only embodiments herein, for ability For the those of ordinary skill of domain, without creative efforts, it can also be obtained according to the attached drawing of offer other Attached drawing.

Fig. 1 shows the composed structure schematic diagram of word forecasting system in the application；

Fig. 2 shows a kind of flow diagrams of word prediction technique a kind of in the embodiment of the present application；

Fig. 3 shows a kind of another flow diagram of word prediction technique of the embodiment of the present application；

Fig. 4 shows the composition schematic diagram that a kind of language model of word prediction is realized in the embodiment of the present application；

Fig. 5 shows a kind of flow diagram of the application train language model；

Fig. 6 shows a kind of another flow diagram of word prediction technique of the application；

Fig. 7 shows a kind of another flow diagram of word prediction technique of the application；

Fig. 8 shows the composition schematic diagram for another language model that the application is applicable in；

A kind of schematic diagram for application scenarios that the word prediction technique that Fig. 9 shows the application is applicable in；

Figure 10 shows a kind of composed structure schematic diagram of the word prediction meanss of the application；

Figure 11 shows another composed structure schematic diagram of the word prediction meanss of the application；

Figure 12 shows a kind of composition schematic diagram for computer equipment that the application is applicable in.

Specific embodiment

The scheme of the embodiment of the present application is suitable for for the current word a sentence, predicting that each word conduct is deserved in vocabulary After preceding word next word and can be with the current word composition sentence a possibility that, it is next after predicting current word to improve The accuracy of the probability of word, and then be conducive to improve the standard for the probability of occurrence for predicting the sentence being made of current word and next word Exactness.

Present inventor it has been investigated that: a word is likely to belong to one or more different field, e.g., word A It may be the word of industrial circle, it is also possible to the word of agriculture field, it is also possible to word of sciemtifec and technical sphere etc..Correspondingly, current Next word after word may also belong to some or multiple fields, and when the field difference belonging to next word, The probability distribution that each word belongs to the word to be predicted in vocabulary would also vary from, and existing language model is during prediction, Do not consider field belonging to next word, each word in the vocabulary predicted is necessarily caused to belong to the probability of next word in this way The precision of distribution is lower.

Present inventor is in order to improve precision of prediction, during next word after predicting current word, considers The field that may belong to next word, and predict that each word belongs to next word in vocabulary respectively for multiple fields Possibility, the comprehensive probability distribution determined each word in vocabulary and belong to next word, so that the essence of finally obtained probability distribution Accuracy is higher.

The scheme of the application in order to facilitate understanding, first scene used in the scheme to the application is introduced.Such as, referring to Fig. 1, a kind of a kind of structure composed schematic diagram of word forecasting system used in the scheme it illustrates the application.

As seen from Figure 1, word forecasting system may include: computer equipment 101 and data statistics server 102.

Wherein, computer equipment 101 is available is determined to based on speech recognition, machine translation or input method technology Sentence to be predicted；Current current word to be analyzed is determined from sentence to be predicted, to be based on the current word, is predicted in vocabulary A possibility that each word is as word to be predicted after next word after the current word in the prediction sentence.

Such as, by taking field of speech recognition as an example, the voice signal that user inputs is being converted to by multiple times based on speech recognition After selecting statement text, in order to determine that this multiple candidate statement text is the probability of correct sentence, that is, candidate statement text is determined Meet the probability of human language, then for each candidate statement text, needs successively using word each in candidate statement text as working as Preceding word predicts that each word is belonging respectively to be in the probability of next word after the current word in candidate's statement text in vocabulary. In this way, next after each word according to forming in candidate statement text in the composition of word and the candidate's statement text predicted The probability of word, to integrate the probability for determining that candidate's statement text is correct sentence.For example, " very for candidate statement text It is glad ", predicted, "high" belongs to " " very " the probability of next word be 0.5, and " emerging " belongs to "high" (it is also assumed that being " very high ") after next word probability be 0.9, then should " being very glad " probability of occurrence can be 0.45.

For machine translation field, predict that the application of next word and process are similar to field of speech recognition.

For another example, by taking input method field as an example, in input method field, it may be necessary to the sentence inputted based on user Each word in (sentence can be a word), the sequence for multiple candidates that analysis user may need to input.In that case, It just needs to predict that each word can constitute sentence with the current word in sentence in vocabulary using the last one word in sentence as current word A possibility that, in order to which input method is according to prediction result, choose the sequencing for needing the candidate word and candidate word that show Deng.

As seen from Figure 1, language model can be deployed in computer equipment 102.As seen from Figure 1, voice is known Not, machine translation or determine that sentence to be analyzed can input to the language model in computer equipment 102 based on input method, To analyze the probability that next word occurs after each word in the sentence to be analyzed based on the language model, and then based on to be analyzed Sentence determines candidate sentence, or analyzes the probability of occurrence of the sentence to be analyzed (that is, the sentence to be analyzed meets human language Probability).

Wherein, which can be language model neural network based, be also possible to other kinds of language mould Type, it is without restriction herein.

The data statistics server can collect different user multiple words used in everyday, and multiple words are sent to calculating Machine equipment allows computer equipment to determine that vocabulary is constituted；Either, used in everyday multiple based on the user being collected into Word generates vocabulary and feeds back to computer equipment.

It is understood that the computer equipment can be the clothes in language identification, machine translation or input method system Business device, is also possible to the autonomous device with data-handling capacity.

Below with reference to the above general character, a kind of word prediction technique of the embodiment of the present application is introduced, such as referring to fig. 2, Show a kind of flow diagram of word prediction technique one embodiment of the application.The method of the present embodiment can be applied to this Shen Computer equipment please, this method may include:

S201, the first context letter that the word sequence before obtaining current word and the current word for prediction has Breath.

It is known as current word for the word for predicting that possibility occurs in next word is currently used in the embodiment of the present application.In view of working as Next word is likely to be any one word in vocabulary after preceding word, and the application needs to predict each word in vocabulary belongs to and deserve After preceding word and a possibility that capable of forming sentence with current word, therefore, next word after the current word is known as by the application The word to be predicted for needing to predict.

Wherein, which can be the word in sentence to be predicted for prediction.Wherein, which can be by a word Symbol string composition, such as can be a Chinese character；It is also possible to be made of multiple character strings, for example, the word being made of multiple Chinese characters Group.Correspondingly, word to be predicted is also possible to be made of one or more character string.

It is understood that the mode for obtaining current word may be different in different application scene.

Such as, in one possible implementation, it can be the sequencing according to word each in sentence to be predicted, to pre- Survey the word for determining in sentence and being currently used in prediction.In this kind of situation, sentence to be predicted inherently needs to predict probability of occurrence Sentence, which is made of multiple words, and each word needs successively as current word in sentence to be predicted.Example Such as, sentence to be predicted can be the candidate sentence by speech recognition or machine translation acquisition, and current word may be the candidate Word in sentence on any position, different moments, current word are the words in candidate's sentence on different location.

In another possible implementation, it is current as be used to predict to obtain the last one word in sentence to be predicted Word.It in that case, may be next word after needing to predict the sentence to be predicted and the current word by language model A possibility that candidate sentence of composition.Such as, sentence to be predicted is the sentence to be predicted currently inputted by input method, when current It carves, which may not be a complete sentence, for example, it may be possible to which an only word or word, also can be more The imperfect sentence that a word is constituted, and in order to predict next word in the sentence to be predicted after the last one word which is Which a possibility that word and word, need using the last one word in the sentence to be predicted as current word.

It is understood that can directly predict the word to be predicted after current word based on the current word in sentence to be predicted (next word) be which word and these words respectively a possibility that.But precision can be very low.In order to guarantee precision of prediction, this Apply in embodiment, it can also be in conjunction with the corresponding contextual information of word sequence for being used to predict before current time during prediction.

Wherein, the word sequence before current word can be with one or more phrase before the current word in sentence to be predicted At word sequence；The word sequence may be sky.Such as, if current word is first word for prediction, language to be predicted in this way First word or sentence to be predicted of sentence only include currently a word, then the word order before current word is classified as sky, in this kind In the case of, the contextual information that word sequence has also is empty.

Wherein, the semantic relation between the word of contextual information characterization, then the context relation of word sequence is in word sequence Semantic relation between each word.For the ease of distinguishing, the context that the embodiment of the present application has the word sequence before current word Information is known as the first contextual information.

Optionally, which can be indicated with term vector, which can also be indicated by vector.

S202 is based on the current word and the first contextual information, determines that the word to be predicted after the current word is belonging respectively to The probability in multiple and different fields.

From present inventor it has been investigated that: the word to be predicted after current word may belong to some or Multiple fields, and the different field belonging to word to be predicted, influence whether a possibility that each word belongs to the word to be predicted in vocabulary. It therefore, in the embodiment of the present application, can be based on the first contextual information of the word sequence after the current word and current word, really The fixed word to be predicted belongs to the probability of multiple fields.

It is understood that be predicted after can analyze the current word based on current word and first contextual information Word is belonging respectively to the tendency degree of every field, which can be reflected by probability.

Such as, in the case where a kind of possible, the mapping degree of different semantic relations and different field can be analyzed in advance, In this way, based on the semantic relation that the current word and the first contextual information are characterized, available word to be predicted and every field Mapping degree.

In the case where another is possible, field distributed model can be trained in advance, which can use Multiple sentence sample trainings obtain.Then according to the current word and first contextual information, and the field distributed model is utilized, just The word to be predicted that the current word can be predicted is belonging respectively to the probability in multiple and different fields.

Wherein, which can be set as needed, and e.g., which can be circulation nerve net Network model, such as shot and long term memory network (Long Short-Term Memory, LSTM) model.

Wherein, which can be independently of language model, in that case, utilize multiple sentence samples This is trained the field distributed model can be and is trained using multiple sentence sample to preset network model, will instruct The network model is practised as the field distributed model.Such as, in sentence sample the sequencing of each word be it is fixed, be based on language In sentence sample in the sequencing of each word and the sentence sample of mark each word field, network model can be trained, until The field of each word of network model output meets the requirements with the difference degree actually marked.

Optionally, in order to improve predict the word to be predicted belong to different field probability precision, the field distributed mode Type can be the model for including in language model, in this way, during using multiple sentence sample training language models, it can be with one And the field distributed model is trained, the contents of the section will be introduced in the content below.

S203 is based on the current word and the first contextual information, determines that each word is belonging respectively in vocabulary for each field First possibility of the word to be predicted.

Wherein, vocabulary constructs in advance, the set including multiple words.Vocabulary sometimes can also be referred to as vocabulary, Or corpus etc..The middle word of vocabulary can be through big data analysis etc., and determining that user is daily may use Word.It such as, may include the daily word that may be used of 200,000 users in vocabulary.

Wherein, for each field, which is in the case where the word to be predicted belongs to the field, in vocabulary Word a possibility that belonging to word to be predicted.For a field, in vocabulary the first possibility of a word it is also assumed that be, In the case that the word belongs to the field, a possibility which belongs to the word to be predicted.

First possibility can indicate that e.g., the first possibility can be numerical value with various ways such as numerical value, grades, then A possibility that value of first possibility of some word is also big, then the word belongs to word to be predicted in vocabulary degree is higher.

It is understood that every kind of field is directed to, due to there is multiple words in vocabulary, then it is respectively right to obtain multiple word The first possibility answered, a kind of corresponding first possibility distrabtion in every kind of field.Wherein, corresponding first possibility point in every kind of field Cloth is actually, in the case where the word to be predicted belongs to the field, in vocabulary each word belong to the first of the word to be predicted can It can property distribution.

First possibility distrabtion includes the first possibility that each word distinguishes word to be predicted in vocabulary, and e.g., this first can Energy property distribution can be a vector, as soon as the word in vector in each element direction vocabulary, and the specific value of the element It is the first possibility that word pointed by the element belongs to word to be predicted.

Field is not considered with current language model, and directly predicts a kind of possibility that each word in vocabulary belongs to word to be predicted Property distribution it is different, the case where the application belongs to each field in a variety of different fields for word to be predicted, predict word respectively A possibility that each word belongs to word to be predicted in table.

In order to enable language model can be directed to multiple fields, each word in vocabulary belongs to word to be predicted the is predicted respectively One possibility, can be set in language model it is multiple estimate function, using multiple sentence sample training language models mistake Cheng Zhong can make difference estimate function and correspond to different fields by training.Wherein, it can each estimate function and be all based on and work as Preceding word and the first contextual information predict the first possibility that each word in vocabulary belongs to word to be predicted, only due to every It is a to estimate the difference of field corresponding to function, so that each estimating each word in the vocabulary that function prediction goes out belongs to the of word to be predicted One possibility will not be identical, i.e., different the first possibility distrabtion differences estimating function prediction and going out.

Wherein, language model can be obtained by the different network model of training, and network mould selected by language model Type is different, and estimating function and may also will be different in language model, is language neural network based in language model e.g. Model, then estimating function can be in the language model logits function in output layer, by logits function by current word The vector of term vector and first contextual information is converted to dimension vector identical with vocabulary size, that is, what is be commonly called as is defeated The logtis of logtis out, logtis function output indicates that the logarithm for the ratio not occurred with the event occurs for an event.Phase It answers, the corresponding logtis of each word is exactly corresponding first possibility of the word.

S204 is belonging respectively to the corresponding vocabulary of probability and each field in multiple and different fields according to the word to be predicted In each word be belonging respectively to the first possibility of the word to be predicted, determine in the vocabulary that each word is belonging respectively to word to be predicted second can It can property.

Such as, it will be belonging respectively to the probability of every field according to word to be predicted, determine the weight of every field, for example, will be to Prediction word belongs to the weight of the probability as the field in a field, correspondingly, the weight based on every field, to every field The first possibility that each word is belonging respectively to word to be predicted in the corresponding vocabulary is weighted summation, and weighted sum result is just It is the second possibility that each word is belonging respectively to word to be predicted in vocabulary.

It illustrates, it is assumed that field includes industry and the two agriculture fields, include 3 words in vocabulary is { " ball ", " electricity Depending on ", " going out ", " people ", " game ", it is assumed that word to be predicted belong to industrial circle probability be 0.6, belong to agriculture field Probability be 0.4, and characterize by numerical values recited the size of the first possibility, it is assumed that belong to industrial circle in word to be predicted In the case of, the first possibility that each word is belonging respectively to field to be predicted in the vocabulary be respectively as follows: " ball "=1.2, " TV "= 1.5 ,=0.2 " is gone out " }；Assuming that each word is belonging respectively to pre- in the vocabulary in the case where word to be predicted belongs to agriculture field First possibility in survey field is respectively as follows: { " ball "=0.8, " TV "=1.6, " going out "=0.4 }, then can be by by work The corresponding probability 0.6 in industry field the first possibility distrabtion corresponding with industrial circle is multiplied, and obtains the first result；By agriculture field Corresponding probability the first possibility distrabtion corresponding with agriculture field is multiplied, and obtains the second result.Then by the first result and the Two results addeds, are specifically expressed as follows:

0.6* { " ball "=1.2, " TV "=1.5, " going out "=0.2 }+0.4* " ball "=0.8, " TV "=1.6, " going out "=0.4 }={ " ball "=0.6*1.2+0.4*0.8=1.04, " TV "=0.6*1.5+0.4*1.6=1.54, " out Door "=0.6*0.4=0.24 }, that is to say, that " ball " belongs to the second possibility value of word to be predicted for 1.04, " electricity in vocabulary Regard " belong to the second possibility value that the second possibility value of word to be predicted belongs to word to be predicted as 1.54, " going out " and is 0.24。

Certainly, above is only that the word to be predicted is combined to belong to the probability of different field and belong to difference in word to be predicted Each word belongs to the first possibility of word to be predicted in vocabulary in the case where field, comprehensive to determine that each word belongs to the second of word to be predicted A kind of implementation of possibility can also have other possible implementations in practical applications, without restriction herein.

It is understood that in the case where the second possibility is indicated by numerical value, it is contemplated that each word belongs in vocabulary The numerical values recited of second possibility of word to be predicted is different, is difficult intuitively comparing and goes out the possibility that each word in vocabulary belongs to word to be predicted Property degree.Therefore, optionally, the second possibility that the word to be predicted can also be belonging respectively to word each in vocabulary carries out normalizing Change, to obtain the probability distribution that each word in vocabulary is belonging respectively to the word to be predicted.Wherein, probability distribution includes each word point in vocabulary Do not belong to the probability of the word to be predicted.Wherein, after normalization, all words belong to the probability summation of word to be predicted and are in vocabulary One.

Wherein, the mode corresponding second possibility of word each in vocabulary being normalized might have it is a variety of, with one kind Illustrate for mode, can be carried out with softmax function to be belonging respectively to the second possibility of word to be predicted to each word in the vocabulary Processing, to export the probability distribution obtained after normalization.Such as, it is assumed that include C word in vocabulary, then C word corresponds to C second Possibility, wherein the second possibility of i-th of word in C word is expressed as v_i, by softmax function in C word Second possibility v of i word_iResulting probability S is normalized_iIt is expressed as follows:

It certainly, is herein only to illustrate in a kind of normalized mode as an example, but carried out by other normalized functions Normalization is applied equally to the present embodiment.

It as known from the above, in the embodiment of the present application, can be according to current word after getting for the current word of prediction And the contextual information that the word sequence between current word has, the word to be predicted after the current word is analyzed (under current word One word) belong to the probability in multiple and different fields, and the vocabulary in the case where the word to be predicted belongs to each field is determined respectively In each word a possibility that belonging to the word to be predicted.Due to the difference of word fields to be predicted, word each in vocabulary can be belonged to A possibility that word to be predicted, has an impact, and therefore, the probability of every field is belonged in conjunction with word to be predicted, and in word to be predicted Belong to a possibility that each word belongs to the word to be predicted in vocabulary in the case where different field, it is comprehensive to determine that each word belongs to this in vocabulary The accuracy that each word in the vocabulary predicted belongs to next word after current word can be improved in a possibility that word to be predicted, from And be conducive to improve the precision of the prediction affiliated sentence probability of occurrence of the current word.

It is understood that in order to improve prediction precision, after getting current word and first contextual information, Be also based on the current word and the first contextual information, determine for characterize word sequence before the current word and current word it Between semantic relation the second contextual information.Second contextual information can reflect out the word before current word and current word Semantic association relationship in the sentence of sequence composition between each word and each word.

Correspondingly, can be based on second contextual information in above step S202, determine that the word to be predicted belongs to respectively In the probability of different field；In step S203, it can determine that each word is belonging respectively in vocabulary according to second contextual information First possibility of word to be predicted.

Further, when based on the prediction of next current word, which can be worked as with next Preceding word is input in language model, as input information to improve prediction precision.

In order to make it easy to understand, including field distributed model with language model and multiple and different fields are corresponding estimates Illustrate for function, e.g., referring to Fig. 3, it illustrates the signals of another implementation process of the word prediction technique of the embodiment of the present application Figure, the process are suitable for the computer equipment of the application.The process may include:

S301 obtains the term vector w (t) of the current word for prediction, and language model trained in advance is the last The the first contextual information s (t-1) determined.

Wherein, term vector is referred to as word insertion.

It is understood that it is different for the current word for prediction of different moments in language model, and in language mould When type is needed based on semantic relation between the adjacent word in front and back, the first context letter which determines The first contextual information that word sequence before breath actually current word has.

Wherein, contextual information can be indicated by vector.For the ease of distinguishing, the word sequence before current word is had The first contextual information be expressed as s (t-1), and closed semantic between word sequence before subsequent expression current word and the current word Second contextual information of system is expressed as s (t).

In the present embodiment, language model includes the field distribution for determining next word fields after current word Model, and respectively correspond multiple and different fields estimate function.Neck in this kind of situation, in language model and language model By multiple sentence samples, uniformly training obtains domain distributed model with function is estimated.

The current word w (t) and the first contextual information s (t-1) are converted to characterization by the language model and deserved by S302 Second contextual information s (t) of semantic relation between word sequence before preceding word and current word.

Such as, the current word w (t) and the first contextual information s (t-1) can be turned according to preset functional relation It changes, the available second contextual information s (t).

For example, s (t) can be calculated by following formula two:

S (t)=sigmoid (Uw (t)+Ws (t-1)) (formula two)；

Wherein, sigmoid is the function of setting, and U and W are preset vector, wherein U and W can be in training language It says in model process and determines.

With language model for language model (the Recurrent Neural Network Based based on Recognition with Recurrent Neural Network Language Model, RNNLM) for illustrate.Referring to fig. 4, it illustrates the part of this language model of RNNLM compositions to illustrate Figure.

As seen from Figure 4, in the language model importation in addition to include be currently used in prediction current word word to It measures except w (t), the upper level for also having the upper level hidden layer output of the corresponding hidden layer of the current word in the language model is hidden It hides layer output vector s (t-1).Wherein, s (t-1) is actually current word to being inputted in the word sequence of the language model before Semantic relation possessed by each word.Correspondingly, the term vector w (t) and upper level hidden layer output vector s (t-1) input is worked as The corresponding current hidden layer of preceding word, so that current hidden layer output vector s (t-1) is obtained, current hidden layer output vector s (t- 1) characterization be exactly the term vector w (t) expression current word and current word before word sequence in semantic relation between each word.

The second contextual information s (t) is input in the distributed model of field by S303 by language model, to pass through neck Domain distributed model determines that the word to be predicted after current word is belonging respectively to the probability of different field.

Optionally, which can be this time recurrence time network model of LSTM.Field distribution is true The concrete mode for the probability that fixed word to be predicted belongs to different field may refer to the related introduction of preceding embodiment.

The second contextual information s (t), is separately input to that every field is corresponding to estimate function by S304, and is obtained every A the first possibility distrabtion for estimating function output.

Wherein, which includes the first possibility that each word is belonging respectively to prediction word in vocabulary.Such as, this One possibility distrabtion can be a vector, and the dimension of the vector is identical as word number in vocabulary, the vector of the first possibility distrabtion The value of middle different dimensions indicates a possibility that different words belong to word to be predicted in vocabulary value.

It should be noted that the sequence of step S304 and step S304 are not limited to shown in Fig. 3, in practical applications, this Two steps also may be performed simultaneously；Alternatively, first carry out step S304 executes step S305 again.

S305, based on corresponding the first possibility for estimating function output of the corresponding probability of every field and every field Distribution is weighted summation, obtains the second possibility distrabtion.

Wherein, which includes the second possibility that each word is belonging respectively to word to be predicted in the vocabulary.

Such as, it is assumed that there is n field, n can be provided with accordingly and estimate function, each estimate the corresponding neck of function Domain, it is assumed that the probability that word to be predicted belongs to i-th of field is expressed asAnd i-th of field is corresponding estimates the of function output One may distribution expression vectorThe value of i is to n, then each word belongs to the second of word to be predicted in the comprehensive vocabulary determined Possibility distrabtion P_lIt can be obtained by following formula three:

In order to make it easy to understand, still by taking language model is RNNLM model as an example, and be illustrated in conjunction with Fig. 4.

As seen from Figure 4, distribution function model is further comprised in language model, and pre- with conventional only setting one Estimating function difference, n are provided in the language model and estimates function, n is the quantity in field, specifically can be set as needed, This n are estimated the difference of field corresponding to function.It is logtis function that function is estimated in Fig. 4.

As shown in Figure 4, after corresponding current hidden layer output hidden layer output vector s (t) of current word, s (t) is not only It can be input in the distributed model of field, can also be separately input to multiple estimate in function.

Wherein, field distributed model can be based on the s (t), and next word to be predicted is belonging respectively to after analyzing current word The probability of every field.Every field is corresponded into the weight that probability is calculated as subsequent weight in Fig. 4, therefore, word to be predicted The probability for belonging to first field is expressed as weight 1, and the probability that word to be predicted belongs to second field is expressed as weight 2, with This analogizes, and the probability that word to be predicted belongs to n-th of field is expressed as weight n.

Wherein, the corresponding logtis function in each field can export a vector logtis.Logtis is actually not Normalized probability distribution, a word in each dimensional representation vocabulary in logtis, word represented by different dimensions is different, accordingly , the value of each dimension is indicated in the case where word to be predicted belongs to the field in the logtis, the vocabulary which indicates In word a possibility that being belonging respectively to the word to be predicted degree.In Fig. 4, by first field it is corresponding estimate function output Logtis is expressed as logtis1, and the logtis for estimating function output in second field is expressed as logtis2, with such It pushes away, the logtis for estimating function output in n-th of field is expressed as logtisn.

Correspondingly, in order to integrate field to be predicted be belonging respectively to every field probability and each field it is corresponding The the first possibility distrabtion logtis for estimating function output, determining that each word in vocabulary is belonging respectively to the second of word to be predicted may Property distribution, then need the logtis for estimating function output in the weight based on each field and each field, be weighted and ask With logtis after being weighted.

Multiple second possibilities for including in the second possibility distrabtion are normalized in S306, obtain in characterization vocabulary Each word is belonging respectively to the probability distribution of word to be predicted.

It is understood that each second possibility is without normalized probability distribution in the second possibility distrabtion. In this way, although the probability that the word belongs to word to be predicted also can if the value of the second possibility of some word is bigger in vocabulary It is higher, but the second possibility distrabtion is not the probability distribution of conventional language model final output, is based on second possibility Distribution may be difficult the probability height for intuitively finding out which word belongs to word to be predicted, therefore, alternatively, also need The value of multiple second possibilities in second possibility distrabtion is normalized.

Such as, illustrate in conjunction with Fig. 4, the output layer of language model is in addition to including above-mentioned multiple logtis letters in Fig. 4 Except number (estimating function), also there is a softmax function.Wherein, multiple field it is corresponding estimate function and The softmax function belongs to the output layer of this language model of RNNLM.

As seen from Figure 4, multiple multiple logtis for estimating function output are weighted summation, after obtaining weighted sum Logtis after, which can be input in the softmax function, in this way, softmax function can be defeated Each word belongs to the probability distribution of the word to be predicted in vocabulary out.

It is understood that being directed to a sentence to be predicted, language model successively makees word each in the sentence to be predicted For the current word for prediction, and, for each current word, language model prediction goes out each word in the vocabulary and belongs to after current word After the probability distribution of word to be predicted, language model can also determine the language to be predicted according to each probability distribution predicted The probability of occurrence of sentence.Either, for a sentence to be predicted, language model prediction go out each word in the vocabulary belong to current word it Afterwards after the probability distribution of word to be predicted, it can also determine that each word is to be predicted with this respectively in vocabulary according to the probability distribution The probability of occurrence of candidate's sentence composed by sentence.The probability of occurrence or language to be predicted of the application sentence to be predicted for determination The specific implementation of the probability of occurrence for the candidate sentence that sentence is separately constituted with word each in vocabulary is without restriction.

It is understood that being provided with the field distributed model and multiple the case where estimating function in language model Under, the language model can be trained by multiple sentence samples, and when language model training is completed, the language mould Field distributed model in type with function is estimated also complete by training.In that case, train language model, field distributed model It is realm information and language identical, that the field distributed model of advantageously ensuring that predicts with sentence sample used by function is estimated The realm information that function characterization is estimated in speech model matches, and is conducive to improve prediction precision.It is set with independently of language model It sets a field distributed model and the stand-alone training field distributed model is compared, field distribution function is set in language model and is predicted In vocabulary each word belong to the probability of word to be predicted accuracy it is higher.

Train the process of the language model in order to facilitate understanding, below in the way of a kind of train language model for carry out It introduces, e.g., referring to Fig. 5, it illustrates a kind of implementation process schematic diagram of train language model, which may include:

S501 obtains multiple sentence samples for training.

Wherein, each sentence sample includes one or more words.The sequencing of each word is solid in each sentence sample Fixed.

The sentence sample is input in language model for each sentence sample, obtains the language model prediction by S502 The probability of occurrence of the sentence sample out.

S503 judges the language model prediction according to the probability of occurrence for each sentence sample that the language model prediction goes out Whether accuracy reaches requirement, if it is, training is completed；If it is not, then field distribution in adjustment language model, language model Function and each relevant parameter for estimating function, and return to step S502.

It is understood that sentence sample is all the sentence for meeting human language.For each sentence sample, sentence sample In each word sequence of positions be it is fixed, therefore, the probability of occurrence that language model prediction goes out sentence sample is higher, then explanation should The accuracy of prediction model prediction is higher.Correspondingly, the probability of occurrence of each sentence sample gone out by language model prediction, it can Finally to analyze the accuracy of the language model prediction.

It is understood that then explanation is directed to sentence sample in the case where the prediction accuracy of language model meets the requirements Each current word in this, the word to be predicted after the current word that field distributed model predicts in language model belong to each neck The accuracy of the probability in domain is also satisfactory.Correspondingly, the corresponding function of estimating of every field is estimated out in language model The first possibility distrabtion be also satisfactory, therefore, the field distribution when language model training is completed, in language model It model and estimates function and is also trained to complete.

It should be noted that Fig. 5 is intended merely to facilitate the process for understanding train language model, and with a kind of training language Model approach has carried out simple introduction, but it is understood that, in practical applications, the mode of train language model can have Other may, for training the language model by other means the case where is applied equally to the application.

In above word prediction technique embodiment, since the quantity of word in vocabulary is larger, in this way, for multiple fields point Corresponding first possibility distrabtion of vocabulary is not predicted (word each in vocabulary is belonging respectively to the of word to be predicted for ease of description One possibility is known as corresponding first possibility distrabtion of vocabulary), and multiple first possibility distrabtions predicted are added Read group total is weighed, is necessarily required to occupy the memory increased, and since data processing amount is big, also influences whether predetermined speed, cause Forecasting efficiency is relatively low.

In order to be further reduced EMS memory occupation, and improve forecasting efficiency on the basis of guaranteeing to predict precision, this Apply for that inventor studies discovery: the quantity of word is larger in vocabulary, however the quantity of the common word of people is relatively fewer, e.g., word The rank of 200,000 words may be reached in table, but the commonly used word of people may only have 1 ten thousand to 2 ten thousand or so.It, can be with based on this The first possibility distrabtion that these everyday words of domain prediction will be divided to belong to word to be predicted only for the everyday words in vocabulary, and Carry out corresponding weighted sum；And for these remaining seldom words, then it may not need a point domain prediction, only need to directly predict These a possibility that seldom word belongs to word to be predicted out.In this way, being equivalent to only since the frequency of use of seldom word is lower Less prediction precision is had lost, but EMS memory occupation can be greatly reduced, and improve forecasting efficiency.

Based on above the study found that total vocabulary comprising whole words can also be divided into two parts by the application, specifically according to The frequency of use that word is used by a user is divided into two parts, wherein a part is high frequency vocabulary, and another part is low frequency vocabulary.It should Total vocabulary is the vocabulary comprising multiple words constructed in advance, and the total quantity of the word in total vocabulary is more than word in high frequency vocabulary Total quantity, certainly, the total quantity of word is also more than the total quantity of the word of low frequency vocabulary in total vocabulary.In that case, total word Table is equivalent to the vocabulary of preceding embodiment.Wherein, the high frequency vocabulary is in total vocabulary by the higher multiple words of frequency of use by being constituted, Such as, it sorts from high to low according to the frequency of use of word, using word of the frequency of use sequence in preceding specific bit in total vocabulary as height Word in frequency vocabulary.Correspondingly, low frequency vocabulary is made of the multiple words for being not belonging to high frequency vocabulary in total vocabulary.As it can be seen that low Frequency of use of the frequency of use of word in frequency vocabulary lower than the word in high frequency vocabulary.Wherein it is possible to by data statistic analysis, Determine that user for the access times etc. of each word, finally obtains the frequency of use of each word.

Correspondingly, different processing can be carried out for the word in the word and low frequency vocabulary in high frequency vocabulary.Such as, referring to figure 6, it illustrates a kind of another flow diagrams of word prediction technique of the application.The method of the present embodiment can be applied to the application Computer equipment, this method may include:

S601, the first context letter that the word sequence before obtaining current word and the current word for prediction has Breath.

S602 is based on the current word and the first contextual information, determines that the word to be predicted after the current word is belonging respectively to The probability in multiple and different fields.

Above step S601 and S602 may refer to the related introduction of preceding embodiment, and details are not described herein.

S603 is based on the current word and the first contextual information for each field, determines each word difference in high frequency vocabulary Belong to the first possibility of the word to be predicted.

Wherein, for each field, which is the high frequency words in the case where the word to be predicted belongs to the field A possibility that word in table belongs to word to be predicted.

For the ease of distinguishing, the word in high frequency vocabulary is properly termed as high frequency words, and each word in low frequency vocabulary is properly termed as low Frequency word.It is understood that belonging to the feelings in each field in word to be predicted in the embodiment of the present application only for high frequency vocabulary Under condition, predict that each high frequency words in the high frequency vocabulary belong to the first possibility of word to be predicted.As such, it is desirable to which field is divided to calculate the The quantity of the high frequency words of one possibility is relatively fewer, advantageously reduces EMS memory occupation, reduces data processing amount, improves data processing Efficiency.

Wherein, high frequency vocabulary and the difference of vocabulary in preceding embodiment are only in that the quantity of word, and predict high frequency Each word belongs in the process and preceding embodiment of the first possibility of word to be predicted that each word belongs to word to be predicted in vocabulary in vocabulary The first possibility process it is identical, specifically may refer to the related introduction of preceding embodiment, details are not described herein.

It is similar to preceding embodiment, optionally, after step S601, it is also based on the current word and about first Literary information determines that the second context for semantic relation between the word sequence before characterizing the current word and the current word is believed Breath.Correspondingly, second contextual information can be based on, determine that each word is belonging respectively to the first of the word to be predicted in high frequency vocabulary Possibility.

S604 is belonging respectively to the corresponding high frequency of probability and each field in multiple and different fields according to the word to be predicted Each word is belonging respectively to the first possibility of the word to be predicted in vocabulary, determines that each word is belonging respectively to word to be predicted in the high frequency vocabulary The second possibility.

Due to the negligible amounts of word in high frequency vocabulary, each high frequency words corresponding first in calculated high frequency vocabulary The negligible amounts of possibility, i.e., the number of the first possibility distrabtion comprising corresponding first possibility of high frequency words each in high frequency vocabulary Measure it is less, accordingly, it is determined that each word is belonging respectively to during the second possibility of word to be predicted in the high frequency vocabulary, required processing Data volume also can be relatively fewer, advantageously reduces EMS memory occupation, and improve treatment effeciency.

S605 is based on the current word and the first contextual information, determines that each word is belonging respectively to this to pre- in low frequency vocabulary Survey the third possibility of word.

Wherein, for the ease of distinguishing, a possibility that word in low frequency vocabulary is belonged into word to be predicted referred to as third possibility.

It is understood that although the frequency of use of each word is relatively low in low frequency vocabulary, in low frequency vocabulary respectively Word is it could also be possible that the word in sentence to be predicted either forms the word of new sentence with sentence to be predicted.

For example, by taking the candidate sentence that speech recognition goes out is " a winged Chong sky is grey " as an example.In order to predict candidate's sentence Probability of occurrence, need to predict the probability of each lexeme word sequence before the word in candidate's sentence respectively.Wherein, it is necessary to Prediction " Chong " is the probability of next word after " winged ", and " Chong " belongs to word in low frequency vocabulary, therefore, only according to " winged " with And " one " corresponding contextual information, a possibility that each word is belonging respectively to word to be predicted in low frequency vocabulary (i.e. third is predicted respectively Possibility), just it is available to " Chong " belong to " winged " later next word probability.

Wherein it is determined that the mode of each corresponding third possibility of word does not consider current word actually in low frequency vocabulary The field that next word belongs to later, and it is directly based upon current word and the first contextual information, determine that each word belongs in low-frequency word A possibility that word to be predicted.

S606 is belonging respectively in the second possibility and low frequency vocabulary of word to be predicted respectively according to each word in the high frequency vocabulary Word is belonging respectively to the third possibility of the word to be predicted, constructs the possibility that each word in total vocabulary is belonging respectively to the word to be predicted Property.

It is understood that the dimension of total vocabulary is the sum of word quantity and word quantity in the low frequency vocabulary in high frequency vocabulary, And high frequency vocabulary is not present with the word in low frequency vocabulary and is overlapped, therefore, word each in high frequency vocabulary is belonged to word second to be predicted can Energy property is combined with the third possibility that each word in the low frequency vocabulary belongs to the word to be predicted, includes high frequency words so as to construct A possibility that each word belongs to word to be predicted in table and low frequency vocabulary, that is, obtained each word in total vocabulary and be belonging respectively to pre- A possibility that surveying word.

Such as, it is assumed that include word 1 and word 2 in high frequency vocabulary, while suppositive 1 belongs to the second possibility expression of word to be predicted Possibility 1, the second possibility that word 2 belongs to word to be predicted is possibility 2；And assuming low frequency vocabulary includes word 3, word 4, word 5, word 6 and word 7, it is assumed that word 3, word 4, word 5, word 6 and word 7 are belonging respectively to the third possibility of word to be predicted successively are as follows: possibility 3, can It can property 4, possibility 5, possibility 6 and possibility 7.Then this two parts is combined, each word in total vocabulary is obtained and belongs to word to be predicted A possibility that may include: word 1: possibility 1；Word 2: possibility 2；Word 3: possibility 3；Word 4: possibility 4；Word 5: possibility 5； Word 6: possibility 6；Word 7: possibility 7.

S607, a possibility that being belonging respectively to word to be predicted to each word in total vocabulary, are normalized, and obtain each in total vocabulary Word is belonging respectively to the probability distribution of word to be predicted.

Wherein, step S607 is optional step, the purpose is to by each word in total vocabulary belong to word to be predicted can Energy property is normalized, and belongs to the probability distribution of word to be predicted to obtain each word in total vocabulary, to intuitively recognize in vocabulary Different words belong to the size of a possibility that word to be predicted degree.

Similar to preceding embodiment, in the case where total vocabulary distinguishes high frequency vocabulary and low frequency vocabulary, language model can also To include that field distributed model and multiple fields respectively correspond and estimate function.The difference is that multiple in the language model The corresponding function of estimating in field is to carry out possibility prediction for the word in high frequency vocabulary, moreover, further including in the language model It is corresponding with low frequency vocabulary to estimate function, and the corresponding function of estimating of the low frequency vocabulary is that the word progress being directed in low frequency vocabulary can It can property prediction.Specifically, may refer to Fig. 7, it illustrates a kind of another flow diagrams of word prediction technique of the application, originally Embodiment is suitable for the computer equipment of the application, which may include:

S701 obtains the term vector w (t) of the current word for prediction, and language model trained in advance is the last The the first contextual information s (t-1) determined.

The current word w (t) and the first contextual information s (t-1) are converted to characterization by the language model and deserved by S702 Second contextual information s (t) of semantic relation between word sequence before preceding word and current word.

The second contextual information s (t) is input in the distributed model of field by S703 by language model, to pass through neck Domain distributed model determines that the word to be predicted after current word is belonging respectively to the probability of different field.

Above step S701 to S703 may refer to the related introduction of preceding embodiment, and details are not described herein

It is pre- to be separately input to the corresponding high frequency of every field by language model by S704 by the second contextual information s (t) Estimate function, and obtains the first possibility distrabtion that each high frequency estimates function output.

Wherein, which includes the first possibility that each word is belonging respectively to prediction word in high frequency vocabulary.Such as, First possibility distrabtion can be a vector, and the dimension of the vector is identical as word number in high frequency vocabulary, the first possibility point The value of different dimensions indicates a possibility that different words belong to word to be predicted in high frequency vocabulary value in the vector of cloth.

Wherein, it is contemplated that every field is corresponding to be estimated function only for the word in high frequency vocabulary to carry out possibility pre- It surveys, therefore, is distinguished for the ease of function of estimating corresponding with low frequency vocabulary, the corresponding function of estimating in each field is referred to as height Frequency estimates function, and by low frequency vocabulary it is corresponding estimate function and be known as high frequency estimate function.

S705, estimate function output based on the corresponding probability of every field and the corresponding high frequency of every field first can Energy property distribution is weighted summation, obtains the second possibility distrabtion.

Wherein, which includes the second possibility that each word is belonging respectively to word to be predicted in the high frequency vocabulary Property.

In order to make it easy to understand, still by taking language model is RNNLM model as an example, and be illustrated in conjunction with Fig. 8.It can be with by Fig. 8 Find out, includes that distribution function model and the corresponding n high frequency of n different field estimate function in language model.N is field Quantity.All functions of estimating are assumed to be logtis function in fig. 8, so high frequency can be estimated function is described as high frequency Logtis function.

As shown in Figure 8, determine that the process of current hidden layer output hidden layer output vector s (t) is identical as Fig. 4.Obtaining s (t) after, s (t) can be input in the distributed model of field, can be also separately input to each high frequency and be estimated in function.

It is similar to Fig. 4, in fig. 8, every field is corresponded into probability and is denoted as weight, correspondingly, word to be predicted belongs to first The probability in a field is expressed as weight 1, and the probability that word to be predicted belongs to second field is expressed as weight 2, and so on, to pre- The probability that survey word belongs to n-th of field is expressed as weight n.

Correspondingly, the corresponding high frequency logtis function in each field can export a vector logtis.By first field The logtis that corresponding high frequency estimates function output is expressed as logtis1, and the corresponding high frequency in second field is estimated function The logtis of output is expressed as logtis2, and so on, the logtis that the high frequency in n-th of field estimates function output is indicated For logtisn.

Further, the high frequency of weight and each field based on each field estimates the logtis of function output, carries out Weighted sum, the high frequency logtis after being weighted, high frequency logtis indicate that each word belongs to word to be predicted in high frequency vocabulary Second possibility distrabtion.

The second contextual information s (t) is input to the corresponding low frequency of low frequency vocabulary by language model and estimates letter by S706 Number, and obtain the third possibility distrabtion that low frequency estimates function output.

Wherein, which includes a possibility that each word in low frequency vocabulary is belonging respectively to word to be predicted.

Such as Fig. 8, language model is in addition to including that distribution function model and the corresponding n high frequency of n different field estimate function Except, it further include the function of estimating for predicting a possibility that each word belongs to word to be predicted in low frequency vocabulary, i.e. low frequency estimates letter Number.

Correspondingly, the s (t) of hidden layer output can also be input into the low frequency and estimate in function, which is estimated in function Based on the s (t), logtis a possibility that each word belongs to word to be predicted in low frequency vocabulary is predicted, it is for the ease of distinguishing, low frequency is pre- The logtis for estimating function output is expressed as low frequency logtis.

Optionally, it is contemplated that the quantity of word is larger in low frequency vocabulary, therefore, calculates low frequency if directly utilizing and estimating function A possibility that each word is belonging respectively to word to be predicted in vocabulary, then this, which estimates function, needs to export the biggish vector of dimension, In this way, influencing whether to estimate the computational efficiency of function.In order to improve the computational efficiency for estimating function, low frequency vocabulary can be carried out Therefore multiple words in low frequency vocabulary can be divided into m group by dimensionality reduction, m is the natural number more than or equal to 2, specifically can be with It is set as needed.Wherein, each group includes multiple words, and in each group the frequency of use of each word summation it is equal.

M dimensionality reduction square is also respectively provided between function correspondingly, estimating in hidden layer low frequency corresponding with low frequency vocabulary Battle array (being not shown in fig. 8), the corresponding group of each dimensionality reduction matrix.

Illustrate by taking a group as an example, after hidden layer exports the second contextual information s (t), s (t) can be corresponded to by the group Dimensionality reduction matrix, to reduce the dimension of s (t).Then, the s (t) after dimensionality reduction is input to low frequency and estimated in function by language model, The low frequency estimates function can be based on the s (t) after the dimensionality reduction, and that predicts that multiple low-frequency words in the group are belonging respectively to word to be predicted can Energy property obtains the corresponding possibility distrabtion of the group, the dimension of the s (t) after the corresponding vector dimension of the possibility distrabtion and dimensionality reduction It is identical.

Such as, a possibility that i-th group of output that low frequency vocabulary marks off, is distributed logtis_iIt can be expressed as follows:

logits_i=(proj_is(t)+biasp_i)×tail_i+bias_i(formula four)；

Wherein, i is the natural number from 1 to m.proj_iFor i-th group of corresponding dimensionality reduction matrix.tial_iIt indicates in i-th group The vector of each word；biasp_iFor the first preset bias vector；bias_iFor the second preset bias vector.

It is to illustrate by taking a group as an example above, each group is required to operate according to such as upper type, wherein by different groups The quantity of middle word is different, therefore, difference organize the dimensions of corresponding dimensionality reduction matrix also can all differences, but final each group can The vector dimension of energy property distribution is all the same.Such as, it is assumed that S (t) is 1024, and assumes to include 10,000 words in a group 1, and organizes 2 In include 20,000 words, then the corresponding dimensionality reduction matrix of group 1 can be 512 dimensions, correspondingly, the corresponding possibility distrabtion of group 1 Vector dimension is 512 dimensions；And the vector dimension of a possibility that organizing 2 corresponding dimensionality reduction matrixes can be 216, then group 2 is corresponding indignation Number is also 512 dimensions.

Correspondingly, the vector based on each group of corresponding possibility distrabtion, can construct each in all groups for characterizing A possibility that a word is belonging respectively to word to be predicted distribution.

S707 is based on corresponding second possibility of the high frequency vocabulary and the corresponding third possibility of low frequency vocabulary, building Total possibility distrabtion that each word in total vocabulary is belonging respectively to the word to be predicted is characterized out.

Wherein, which includes a possibility that each word is belonging respectively to the word to be predicted in total vocabulary.

As shown in Figure 8, the dimension of low frequency logtis is different with the dimension of high frequency logtis, low frequency logtis and high frequency Logtis is only capable of characterizing a possibility that participle belongs to word to be predicted in the middle part of total vocabulary.Therefore, by the dimension of low frequency logtis Being combined into dimension vector identical with word quantity in total vocabulary with high frequency logtis is actually to construct one Logtis is belonging respectively to the word possibility to be predicted to characterize each word in total vocabulary, in order to distinguish, the logtis that will construct Referred to as total logtis.

S708, a possibility that corresponding to the multiple words for including in total possibility distrabtion, are normalized, and obtain characterizing total word Each word is belonging respectively to the probability distribution of word to be predicted in table.

As shown in figure 8, the total logtis that will be used to characterize a possibility that each word is belonging respectively to the word to be predicted in total vocabulary It is input to softmax function.Correspondingly, the probability distribution of softmax function output just includes that each word of total vocabulary belongs to this The probability of word to be predicted.

A kind of application scenarios that the embodiment of the present application is applicable in are introduced in the scheme of the application in order to facilitate understanding. Such as Fig. 9, it illustrates a kind of composition schematic diagrams for application scenarios that the word prediction technique of the application is applicable in.

In Fig. 9 by taking application scenarios are language identification scene as an example.It such as, as seen from Figure 9, include voice in the scene System, speech recognition system include: computer equipment 901, data statistics server 902 and speech recognition server 903.

Wherein, computer equipment can be the server for being used to analyze candidate sentence probability of occurrence in speech recognition system, The language model that the application any of the above one embodiment is mentioned can be preset in the computer equipment.

The data statistics server can determine that vocabulary provides foundation for computer equipment.

It is understood that being speech recognition server in Fig. 9 and the computer equipment with language model is two Illustrate for independent equipment, but it is understood that, in practical applications, computer equipment and speech recognition server It can be same equipment.

As seen from Figure 9, the voice to be identified that such as user inputs can be sent to speech recognition by user terminal 904 Server 903, as shown in step S91.

And speech recognition server 903 can convert out the possible corresponding multiple candidate sentences texts of voice institute to be identified This.And in order to determine this multiple candidate statement text, which is only and meets the higher statement text of degree of conformity of human language, It is exactly which candidate sentence is the relatively high statement text of the accuracy that is gone out based on speech recognition, then voice server can be by base It is sent to the computer equipment 901 in multiple candidate statement texts that voice to be identified is converted out, as shown in step S92 in Fig. 9,

Correspondingly, the computer equipment 901 can be according to the scheme of preceding embodiment introduction, for each candidate sentence text This, successively using each word in candidate's statement text as current word, and predicts that each word belongs to the current word in vocabulary Word to be predicted probability distribution.Then, based on the probability distribution predicted, and each word in candidate statement text is combined, it can To analyze the probability of occurrence of candidate statement text, as shown in step S93.

Then, the probability of occurrence of each candidate statement text predicted is sent to the voice and known by computer equipment 901 Other server 903, as shown in step S94.

And speech recognition server 903 according to these sequences of candidate statement text probability of occurrence from high to low to candidate language Sentence text sequence, and the candidate statement text after sequence is returned into user terminal, the voice is quickly selected to be conducive to user Corresponding statement text, as shown in step S95.

It is understood that Fig. 9 be only illustrate for a kind of application scenarios, but it is understood that, the application is real Apply application scenarios that the scheme of example is applicable in can there are many may, the application is without restriction to this.

On the other hand, present invention also provides a kind of word prediction meanss.Such as, referring to Figure 10, it illustrates the application one kind The composed structure schematic diagram of word prediction meanss one embodiment, the computer that the word prediction meanss of the application are suitable for the application are set It is standby, the apparatus may include:

Acquiring unit 1001 is inputted, for obtaining the word sequence tool being used for before the current word and the current word of prediction The first contextual information having；

Domain prediction unit 1002 determines the current word for being based on the current word and the first contextual information Word to be predicted later is belonging respectively to the probability in multiple and different fields；

First predicting unit 1003 is based on the current word and first context for being directed to each field Information determines that each word in vocabulary is belonging respectively to the first possibility of the word to be predicted, first possibility be it is described to In the case that prediction word belongs to the field, a possibility that word in the vocabulary belongs to the word to be predicted；The vocabulary is The set comprising multiple words constructed in advance；

Second predicting unit 1004, for being belonging respectively to the probability in multiple and different fields according to the word to be predicted, and Each word is belonging respectively to the first possibility of the word to be predicted in the corresponding vocabulary in each field, determines each in the vocabulary Word is belonging respectively to the second possibility of the word to be predicted.

Optionally, which can also include:

Normalization unit, for determining that each word is belonging respectively to described to be predicted in the vocabulary in second predicting unit After second possibility of word, the second possibility for being belonging respectively to the word to be predicted to each word in the vocabulary carries out normalizing Change, obtains the probability distribution that each word in the vocabulary is belonging respectively to the word to be predicted.

Optionally, in order to reduce the EMS memory occupation in prediction, forecasting efficiency is improved, in the application word prediction meanss, this The vocabulary in one predicting unit and the second predicting unit is high frequency vocabulary, and the high frequency vocabulary is by being used frequency in total vocabulary The higher multiple words of rate are constituted.In that case, it may refer to Figure 11, it illustrates a kind of word prediction meanss of the application are another The composed structure schematic diagram of a embodiment, the device of the present embodiment and preceding embodiment device the difference is that, the device Can also include:

Third predicting unit 1005 determines in low frequency vocabulary for being based on the current word and the first contextual information Each word is belonging respectively to the third possibility of the word to be predicted, and the low frequency vocabulary is by being not belonging to the high frequency in total vocabulary Multiple words of vocabulary are constituted, and total vocabulary is the set comprising multiple words constructed in advance, and, the sum of word in total vocabulary Total quantity of the amount more than word in the high frequency vocabulary；

Predict combining unit 1006, it may for being belonging respectively to the second of word to be predicted according to each word in the high frequency vocabulary Property and low frequency vocabulary in each word be belonging respectively to the third possibility of the word to be predicted, construct each word point in total vocabulary A possibility that not belonging to the word to be predicted.

Optionally, in the embodiment of apparatus above, which can also include:

Context switch unit, for obtaining the current word and first context in the input acquiring unit After information, it is based on the current word and first contextual information, is determined for characterizing the current word and described current Second contextual information of semantic relation between word sequence before word；

The domain prediction unit, is specifically used for, and is based on the described second information above, determine after the current word to Prediction word is belonging respectively to the probability in multiple and different fields；

First predicting unit, specifically, it is based on second contextual information for being directed to each field, Determine that each word in vocabulary is belonging respectively to the first possibility of the word to be predicted.

Further, the domain prediction unit, comprising:

Domain prediction subelement, for using field distributed model trained in advance, determine after the current word to Prediction word is belonging respectively to the probability in multiple and different fields, and the field distributed model is to be obtained based on multiple sentence sample trainings 's.

In one implementation, the input acquiring unit may include:

Input obtain subelement, for obtains be used for prediction current word term vector, and in advance train language mould The first contextual information that type the last time is determined, the language model include the field distributed model and the multiple Different field is corresponding to estimate function, wherein the field distributed mode in the language model and the language model Type and the function of estimating are obtained by the unified training of multiple sentence samples；

Correspondingly, first predicting unit, comprising:

First prediction subelement, for second contextual information to be separately input to, every field is corresponding to estimate letter Number, and the first possibility distrabtion for each estimating function output is obtained, first possibility distrabtion includes each word point in vocabulary Do not belong to the first possibility of the prediction word.

Optionally, the second predicting unit is specifically used for, corresponding based on the corresponding probability of every field and every field The first possibility distrabtion for estimating function output is weighted summation, obtains the second possibility distrabtion, second possibility point Cloth includes the second possibility that each word is belonging respectively to the word to be predicted in the vocabulary.

In order to make it easy to understand, it illustrates a kind of composed structures of computer equipment in the embodiment of the present application referring to Figure 12 Schematic diagram.In Fig. 7, which may include: processor 1201, memory 1202, communication interface 1203, defeated Enter unit 1204 and display 1205 and communication bus 1206.

Processor 1201, communication interface 1203, input unit 1204, display 1205, passes through communication at memory 1202 Bus 1206 completes mutual communication.

In the embodiment of the present application, the processor 1201 can be central processing unit (Central Processing Unit, CPU), application-specific integrated circuit (application-specific integrated circuit, ASIC), number Signal processor (DSP), specific integrated circuit (ASIC), ready-made programmable gate array (FPGA) or other programmable logic devices Part etc..

The processor can call the program stored in memory 1202, specifically, can handle more than device can execute Operation performed by computer equipment side in Fig. 1 and Fig. 9.

For storing one or more than one program in memory 1202, program may include program code, the journey Sequence code includes computer operation instruction, in the embodiment of the present application, is at least stored in the memory for realizing following function The program of energy:

For each field, it is based on the current word and first contextual information, determines each word point in vocabulary Do not belong to the first possibility of the word to be predicted, first possibility is to belong to the feelings in the field in the word to be predicted Under condition, a possibility that word in the vocabulary belongs to the word to be predicted；The vocabulary is to be constructed in advance comprising multiple words Set；

In one possible implementation, which may include storing program area and storage data area, wherein Storing program area can storage program area and at least one function (such as sound-playing function, image player function etc.) institute The application program etc. needed；Storage data area can store the data created in the use process according to computer.

In addition, memory 1202 may include high-speed random access memory, it can also include nonvolatile memory, example Such as at least one disk memory, flush memory device or other volatile solid-state parts.

The communication interface 1203 can be the interface of communication module, such as the interface of gsm module.

The application can also include display 1204 and input unit 1205, which includes display panel, such as Touch display panel etc.；The input unit can be with touch sensing unit, keyboard etc..

Certainly, computer equipment structure shown in Figure 12 does not constitute the limit to computer equipment in the embodiment of the present application Fixed, computer equipment may include than more or fewer components shown in Figure 12, or the certain portions of combination in practical applications Part.

On the other hand, present invention also provides a kind of storage medium, it is executable that computer is stored in the storage medium Instruction when the computer executable instructions are loaded and executed by processor, is realized described in as above any one embodiment Word prediction technique.

It should be noted that all the embodiments in this specification are described in a progressive manner, each embodiment weight Point explanation is the difference from other embodiments, and the same or similar parts between the embodiments can be referred to each other. For device class embodiment, since it is basically similar to the method embodiment, so being described relatively simple, related place ginseng See the part explanation of embodiment of the method.

Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning Covering non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes that A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged Except there is also other identical elements in the process, method, article or equipment for including element.

The foregoing description of the disclosed embodiments can be realized those skilled in the art or using the present invention.To this A variety of modifications of a little embodiments will be apparent for a person skilled in the art, and the general principles defined herein can Without departing from the spirit or scope of the present invention, to realize in other embodiments.Therefore, the present invention will not be limited It is formed on the embodiments shown herein, and is to fit to consistent with the principles and novel features disclosed in this article widest Range.

The above is only the preferred embodiment of the present invention, it is noted that those skilled in the art are come It says, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications also should be regarded as Protection scope of the present invention.

Claims

1. a kind of word prediction technique characterized by comprising

Based on the current word and the first contextual information, it is multiple to determine that the word to be predicted after the current word is belonging respectively to The probability of different field；

For each field, it is based on the current word and the first contextual information, determines that each word is belonging respectively to institute in vocabulary The first possibility of word to be predicted is stated, first possibility is the institute in the case where the word to be predicted belongs to the field A possibility that word in predicate table belongs to the word to be predicted；The vocabulary is the set comprising multiple words constructed in advance；

It is belonging respectively in the corresponding vocabulary of probability and each field in multiple and different fields respectively according to the word to be predicted Word is belonging respectively to the first possibility of the word to be predicted, determines that each word is belonging respectively to the of the word to be predicted in the vocabulary Two possibilities.

2. word prediction technique according to claim 1, which is characterized in that the vocabulary is high frequency vocabulary, the high frequency words For table by being constituted in total vocabulary by the higher multiple words of frequency of use, total vocabulary is the collection comprising multiple words constructed in advance It closes, and, total quantity of the total quantity of word more than word in the high frequency vocabulary in total vocabulary；

Further include:

Based on the current word and the first contextual information, determine that each word is belonging respectively to the word to be predicted in low frequency vocabulary Third possibility, the low frequency vocabulary are made of the multiple words for being not belonging to the high frequency vocabulary in total vocabulary；

Each word difference in the second possibility and low frequency vocabulary of word to be predicted is belonging respectively to according to each word in the high frequency vocabulary The third possibility for belonging to the word to be predicted constructs the possibility that each word in total vocabulary is belonging respectively to the word to be predicted Property.

3. word prediction technique according to claim 1, which is characterized in that each word is belonging respectively in the determination vocabulary After second possibility of the word to be predicted, further includes:

The second possibility for being belonging respectively to the word to be predicted to each word in the vocabulary is normalized, and obtains in the vocabulary Each word is belonging respectively to the probability distribution of the word to be predicted.

4. word prediction technique according to any one of claims 1 to 3, which is characterized in that obtain described for prediction After the first contextual information that word sequence before current word and the current word has, further includes:

Based on the current word and first contextual information, determine for before characterizing the current word and the current word Word sequence between semantic relation the second contextual information；

It is described to be based on the current word and the first contextual information, determine that the word to be predicted after the current word is belonging respectively to The probability in multiple and different fields, comprising:

Based on the described second information above, determine that the word to be predicted after the current word is belonging respectively to the general of multiple and different fields Rate；

It is described to be directed to each field, it is based on the current word and first contextual information, determines each word point in vocabulary Do not belong to the first possibility of the word to be predicted, comprising:

For each field, it is based on second contextual information, determines that each word is belonging respectively to described to be predicted in vocabulary First possibility of word.

5. word prediction technique according to claim 4, which is characterized in that be predicted after the determination current word Word is belonging respectively to the probability in multiple and different fields, comprising:

Using field distributed model trained in advance, determine that the word to be predicted after the current word is belonging respectively to multiple and different necks The probability in domain, the field distributed model are obtained based on multiple sentence sample trainings.

6. word prediction technique according to claim 5, which is characterized in that the current word obtained for prediction and institute The first contextual information that word sequence before stating current word has, comprising:

Obtain the term vector of the current word for prediction, and language model the last time trained in advance determine first on Context information, the language model includes the field distributed model and the multiple different field is corresponding estimates letter Number, wherein the field distributed model and the function of estimating in the language model and the language model are by more The unified training of a sentence sample obtains；

It is described to be directed to each field, be based on second contextual information, determine in vocabulary each word be belonging respectively to it is described to Predict the first possibility of word, comprising:

Second contextual information is separately input to every field is corresponding to be estimated in function, and obtains and each estimates function First possibility distrabtion of output, first possibility distrabtion include each word is belonging respectively to the prediction word in vocabulary first Possibility.

7. word prediction technique according to claim 6, which is characterized in that each word is belonging respectively in the determination vocabulary Second possibility of the word to be predicted, comprising:

It is carried out based on corresponding the first possibility distrabtion for estimating function output of the corresponding probability of every field and every field Weighted sum, obtains the second possibility distrabtion, second possibility distrabtion include in the vocabulary each word be belonging respectively to it is described Second possibility of word to be predicted.

8. a kind of word prediction meanss characterized by comprising

Acquiring unit is inputted, first had for obtaining the word sequence being used for before the current word and the current word of prediction Contextual information；

Domain prediction unit, for being based on the current word and the first contextual information, determine after the current word to Prediction word is belonging respectively to the probability in multiple and different fields；

First predicting unit is based on the current word and the first contextual information, determines vocabulary for being directed to each field In each word be belonging respectively to the first possibility of the word to be predicted, first possibility be belong in the word to be predicted it is described In the case where field, a possibility that word in the vocabulary belongs to the word to be predicted；The vocabulary is the packet constructed in advance Set containing multiple words；

Second predicting unit, for be belonging respectively to according to the word to be predicted multiple and different fields probability and each field Each word is belonging respectively to the first possibility of the word to be predicted in the corresponding vocabulary, determines that each word belongs to respectively in the vocabulary In the second possibility of the word to be predicted.

9. word prediction meanss according to claim 8, which is characterized in that first predicting unit and the second predicting unit In the vocabulary be high frequency vocabulary, the high frequency vocabulary is described by being constituted in total vocabulary by the higher multiple words of frequency of use Total vocabulary is the set comprising multiple words constructed in advance, and, the total quantity of word is more than in the high frequency vocabulary in total vocabulary The total quantity of word；

Further include:

Third predicting unit determines each word difference in low frequency vocabulary for being based on the current word and the first contextual information Belong to the third possibility of the word to be predicted, the low frequency vocabulary is by being not belonging to the more of the high frequency vocabulary in total vocabulary A word is constituted；

Combining unit is predicted, for being belonging respectively to the second possibility of word to be predicted and low according to each word in the high frequency vocabulary Each word is belonging respectively to the third possibility of the word to be predicted in frequency vocabulary, constructs each word in total vocabulary and is belonging respectively to institute A possibility that stating word to be predicted.

10. word prediction meanss according to claim 8, which is characterized in that further include:

Normalization unit, for determining that each word is belonging respectively to the word to be predicted in the vocabulary in second predicting unit After second possibility, the second possibility for being belonging respectively to the word to be predicted to each word in the vocabulary is normalized, and is obtained Into the vocabulary, each word is belonging respectively to the probability distribution of the word to be predicted.

11. according to the described in any item word prediction meanss of claim 8 to 10, which is characterized in that further include:

Context switch unit, for obtaining the current word and first contextual information in the input acquiring unit Later, the current word and first contextual information are based on, determine for characterize the current word and the current word it Second contextual information of semantic relation between preceding word sequence；

The domain prediction unit, is specifically used for, and is based on the described second information above, determines to be predicted after the current word Word is belonging respectively to the probability in multiple and different fields；

First predicting unit determines specifically, being based on second contextual information for being directed to each field Each word is belonging respectively to the first possibility of the word to be predicted in vocabulary.

12. word prediction meanss according to claim 11, which is characterized in that the domain prediction unit, comprising:

Domain prediction subelement, it is to be predicted after the current word for determining using field distributed model trained in advance Word is belonging respectively to the probability in multiple and different fields, and the field distributed model is obtained based on multiple sentence sample trainings.

13. word prediction meanss according to claim 12, which is characterized in that the input acquiring unit, comprising:

Input obtain subelement, for obtains be used for prediction current word term vector, and in advance train language model most The first contextual information closely once determined, the language model includes the field distributed model and the multiple difference Field is corresponding to estimate function, wherein the field distributed model in the language model and the language model and The function of estimating is obtained by the unified training of multiple sentence samples；

First predicting unit, comprising:

First prediction subelement, every field is corresponding to estimate function for second contextual information to be separately input to, And obtain each estimate function output the first possibility distrabtion, first possibility distrabtion include in vocabulary each word belong to respectively In the first possibility of the prediction word.

14. a kind of computer equipment characterized by comprising

Processor and memory；

Wherein, the processor is for executing the program stored in the memory；

For storing program, described program is at least used for the memory:

15. a kind of storage medium, which is characterized in that be stored with computer executable instructions, the calculating in the storage medium When machine executable instruction is loaded and executed by processor, the described in any item word prediction techniques of claim 1 to 7 as above are realized.