CN109117480A - Word prediction technique, device, computer equipment and storage medium - Google Patents
Word prediction technique, device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN109117480A CN109117480A CN201810942238.4A CN201810942238A CN109117480A CN 109117480 A CN109117480 A CN 109117480A CN 201810942238 A CN201810942238 A CN 201810942238A CN 109117480 A CN109117480 A CN 109117480A
- Authority
- CN
- China
- Prior art keywords
- word
- vocabulary
- predicted
- possibility
- current
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Machine Translation (AREA)
Abstract
This application discloses a kind of word prediction technique, device, computer equipment and storage mediums, and in the method, computer equipment obtains the first contextual information that the word sequence being used for before the current word and the current word of prediction has;Based on current word and the first contextual information, determine that the word to be predicted after the current word is belonging respectively to the probability in multiple and different fields;For each field, it is based on the current word and the first contextual information, determines that each word in vocabulary is belonging respectively to the first possibility of the word to be predicted;It is belonging respectively to the first possibility that each word in the corresponding vocabulary of probability and each field in multiple and different fields is belonging respectively to word to be predicted according to the word to be predicted, determines that each word in the vocabulary is belonging respectively to the second possibility of the word to be predicted.The accuracy of next word probability of occurrence after some word of prediction can be improved in the scheme of the application, is conducive to the precision for improving prediction sentence probability of occurrence.
Description
Technical field
This application involves technical field of data processing more particularly to a kind of word prediction technique, device, computer equipment and deposit
Storage media.
Background technique
Language model is widely used in the multiple fields such as speech recognition and machine translation.The work of language model is used tricks
The probability of sentence appearance is calculated, to pick out the sentence for being best suitable for human language from multiple candidate sentences.Such as, with
For speech recognition scene, input voice may be identified multiple candidate sentences, and these candidate sentences have a bit
Erroneous words or grammer, and the language of the mankind is not met, in that case, it is necessary to be exported using language model each
The candidate rational probability of sentence.
During language model determines a sentence probability of occurrence to be predicted, need to determine in the sentence to be predicted
For the current word of prediction, and each word in the vocabulary of the determining language model belongs to next word after the current word
The probability of (word i.e. to be predicted).However, each word belongs to word to be predicted after the current word in language model prediction vocabulary at present
The prediction accuracy of probability is generally relatively low, so that language model determines that the accuracy of sentence probability of occurrence is lower.
Summary of the invention
In view of this, this application provides a kind of word prediction technique, device, computer equipment and storage medium, to improve
Predict the accuracy of next word probability of occurrence after some word.
To achieve the above object, on the one hand, this application provides a kind of word prediction techniques, comprising:
The first contextual information that word sequence before obtaining current word and the current word for prediction has;
Based on the current word and the first contextual information, determine that the word to be predicted after the current word is belonging respectively to
The probability in multiple and different fields;
For each field, it is based on the current word and the first contextual information, determines that each word belongs to respectively in vocabulary
In the first possibility of the word to be predicted, first possibility is the case where the word to be predicted belongs to the field
Under, a possibility that word in the vocabulary belongs to the word to be predicted;The vocabulary is to be constructed in advance comprising multiple words
Set;
The corresponding vocabulary of probability and each field in multiple and different fields is belonging respectively to according to the word to be predicted
In each word be belonging respectively to the first possibility of the word to be predicted, determine that each word is belonging respectively to the word to be predicted in the vocabulary
The second possibility.
In one possible implementation, the vocabulary is high frequency vocabulary, and the high frequency vocabulary in total vocabulary by being made
It being constituted with the higher multiple words of frequency, total vocabulary is the set comprising multiple words constructed in advance, and, word in total vocabulary
Total quantity more than word in the high frequency vocabulary total quantity;
Further include:
Based on the current word and the first contextual information, determine that each word is belonging respectively to described to be predicted in low frequency vocabulary
The third possibility of word, the low frequency vocabulary are made of the multiple words for being not belonging to the high frequency vocabulary in total vocabulary;
Each word in the second possibility and low frequency vocabulary of word to be predicted is belonging respectively to according to each word in the high frequency vocabulary
It is belonging respectively to the third possibility of the word to be predicted, each word in total vocabulary is constructed and is belonging respectively to the word to be predicted
Possibility.
Another aspect, present invention also provides a kind of word prediction meanss, comprising:
Acquiring unit is inputted, is had for the word sequence before obtaining the current word and the current word that are used for predicting
First contextual information;
Domain prediction unit, for being based on the current word and the first contextual information, after determining the current word
Word to be predicted be belonging respectively to the probability in multiple and different fields;
First predicting unit is based on the current word and the first contextual information, determines for being directed to each field
Each word is belonging respectively to the first possibility of the word to be predicted in vocabulary, and first possibility is to belong in the word to be predicted
In the case where the field, a possibility that word in the vocabulary belongs to the word to be predicted;The vocabulary is to construct in advance
The set comprising multiple words;
Second predicting unit, for being belonging respectively to the probability in multiple and different fields according to the word to be predicted, and it is each
Each word is belonging respectively to the first possibility of the word to be predicted in the corresponding vocabulary in field, determines each word point in the vocabulary
Do not belong to the second possibility of the word to be predicted.
Another aspect, present invention also provides a kind of computer equipments, comprising:
Processor and memory;
Wherein, the processor is for executing the program stored in the memory;
For storing program, described program is at least used for the memory:
The first contextual information that word sequence before obtaining current word and the current word for prediction has;
Based on the current word and the first contextual information, determine that the word to be predicted after the current word is belonging respectively to
The probability in multiple and different fields;
For each field, it is based on the current word and the first contextual information, determines that each word belongs to respectively in vocabulary
In the first possibility of the word to be predicted, first possibility is the case where the word to be predicted belongs to the field
Under, a possibility that word in the vocabulary belongs to the word to be predicted;The vocabulary is to be constructed in advance comprising multiple words
Set;
The corresponding vocabulary of probability and each field in multiple and different fields is belonging respectively to according to the word to be predicted
In each word be belonging respectively to the first possibility of the word to be predicted, determine that each word is belonging respectively to the word to be predicted in the vocabulary
The second possibility.
It is executable to be stored with computer present invention also provides a kind of storage medium for another aspect in the storage medium
Instruction when the computer executable instructions are loaded and executed by processor, realizes as above described in any item word prediction techniques.
As it can be seen that in the embodiment of the present application, after getting for the current word of prediction, according to current word and can work as
The contextual information that word sequence between preceding word has analyzes the word to be predicted (next word of current word) after the current word
Belong to the probability in multiple and different fields, and determines each word in vocabulary in the case where the word to be predicted belongs to each field respectively
A possibility that belonging to the word to be predicted.Due to the difference of word fields to be predicted, this can be belonged to pre- to word each in vocabulary
A possibility that surveying word has an impact, and therefore, the probability of every field is belonged in conjunction with word to be predicted, and belong to not in word to be predicted
It is a possibility that each word belongs to the word to be predicted in vocabulary in the case where same domain, comprehensive to determine that it is to be predicted to belong to this for each word in vocabulary
The accuracy that each word in the vocabulary predicted belongs to next word after current word can be improved, thus favorably in a possibility that word
In the precision for improving the prediction affiliated sentence probability of occurrence of the current word.
Detailed description of the invention
In order to more clearly explain the technical solutions in the embodiments of the present application, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, the accompanying drawings in the following description is only embodiments herein, for ability
For the those of ordinary skill of domain, without creative efforts, it can also be obtained according to the attached drawing of offer other
Attached drawing.
Fig. 1 shows the composed structure schematic diagram of word forecasting system in the application;
Fig. 2 shows a kind of flow diagrams of word prediction technique a kind of in the embodiment of the present application;
Fig. 3 shows a kind of another flow diagram of word prediction technique of the embodiment of the present application;
Fig. 4 shows the composition schematic diagram that a kind of language model of word prediction is realized in the embodiment of the present application;
Fig. 5 shows a kind of flow diagram of the application train language model;
Fig. 6 shows a kind of another flow diagram of word prediction technique of the application;
Fig. 7 shows a kind of another flow diagram of word prediction technique of the application;
Fig. 8 shows the composition schematic diagram for another language model that the application is applicable in;
A kind of schematic diagram for application scenarios that the word prediction technique that Fig. 9 shows the application is applicable in;
Figure 10 shows a kind of composed structure schematic diagram of the word prediction meanss of the application;
Figure 11 shows another composed structure schematic diagram of the word prediction meanss of the application;
Figure 12 shows a kind of composition schematic diagram for computer equipment that the application is applicable in.
Specific embodiment
The scheme of the embodiment of the present application is suitable for for the current word a sentence, predicting that each word conduct is deserved in vocabulary
After preceding word next word and can be with the current word composition sentence a possibility that, it is next after predicting current word to improve
The accuracy of the probability of word, and then be conducive to improve the standard for the probability of occurrence for predicting the sentence being made of current word and next word
Exactness.
Present inventor it has been investigated that: a word is likely to belong to one or more different field, e.g., word A
It may be the word of industrial circle, it is also possible to the word of agriculture field, it is also possible to word of sciemtifec and technical sphere etc..Correspondingly, current
Next word after word may also belong to some or multiple fields, and when the field difference belonging to next word,
The probability distribution that each word belongs to the word to be predicted in vocabulary would also vary from, and existing language model is during prediction,
Do not consider field belonging to next word, each word in the vocabulary predicted is necessarily caused to belong to the probability of next word in this way
The precision of distribution is lower.
Present inventor is in order to improve precision of prediction, during next word after predicting current word, considers
The field that may belong to next word, and predict that each word belongs to next word in vocabulary respectively for multiple fields
Possibility, the comprehensive probability distribution determined each word in vocabulary and belong to next word, so that the essence of finally obtained probability distribution
Accuracy is higher.
The scheme of the application in order to facilitate understanding, first scene used in the scheme to the application is introduced.Such as, referring to
Fig. 1, a kind of a kind of structure composed schematic diagram of word forecasting system used in the scheme it illustrates the application.
As seen from Figure 1, word forecasting system may include: computer equipment 101 and data statistics server 102.
Wherein, computer equipment 101 is available is determined to based on speech recognition, machine translation or input method technology
Sentence to be predicted;Current current word to be analyzed is determined from sentence to be predicted, to be based on the current word, is predicted in vocabulary
A possibility that each word is as word to be predicted after next word after the current word in the prediction sentence.
Such as, by taking field of speech recognition as an example, the voice signal that user inputs is being converted to by multiple times based on speech recognition
After selecting statement text, in order to determine that this multiple candidate statement text is the probability of correct sentence, that is, candidate statement text is determined
Meet the probability of human language, then for each candidate statement text, needs successively using word each in candidate statement text as working as
Preceding word predicts that each word is belonging respectively to be in the probability of next word after the current word in candidate's statement text in vocabulary.
In this way, next after each word according to forming in candidate statement text in the composition of word and the candidate's statement text predicted
The probability of word, to integrate the probability for determining that candidate's statement text is correct sentence.For example, " very for candidate statement text
It is glad ", predicted, "high" belongs to " " very " the probability of next word be 0.5, and " emerging " belongs to "high" (it is also assumed that being
" very high ") after next word probability be 0.9, then should " being very glad " probability of occurrence can be 0.45.
For machine translation field, predict that the application of next word and process are similar to field of speech recognition.
For another example, by taking input method field as an example, in input method field, it may be necessary to the sentence inputted based on user
Each word in (sentence can be a word), the sequence for multiple candidates that analysis user may need to input.In that case,
It just needs to predict that each word can constitute sentence with the current word in sentence in vocabulary using the last one word in sentence as current word
A possibility that, in order to which input method is according to prediction result, choose the sequencing for needing the candidate word and candidate word that show
Deng.
As seen from Figure 1, language model can be deployed in computer equipment 102.As seen from Figure 1, voice is known
Not, machine translation or determine that sentence to be analyzed can input to the language model in computer equipment 102 based on input method,
To analyze the probability that next word occurs after each word in the sentence to be analyzed based on the language model, and then based on to be analyzed
Sentence determines candidate sentence, or analyzes the probability of occurrence of the sentence to be analyzed (that is, the sentence to be analyzed meets human language
Probability).
Wherein, which can be language model neural network based, be also possible to other kinds of language mould
Type, it is without restriction herein.
The data statistics server can collect different user multiple words used in everyday, and multiple words are sent to calculating
Machine equipment allows computer equipment to determine that vocabulary is constituted;Either, used in everyday multiple based on the user being collected into
Word generates vocabulary and feeds back to computer equipment.
It is understood that the computer equipment can be the clothes in language identification, machine translation or input method system
Business device, is also possible to the autonomous device with data-handling capacity.
Below with reference to the above general character, a kind of word prediction technique of the embodiment of the present application is introduced, such as referring to fig. 2,
Show a kind of flow diagram of word prediction technique one embodiment of the application.The method of the present embodiment can be applied to this Shen
Computer equipment please, this method may include:
S201, the first context letter that the word sequence before obtaining current word and the current word for prediction has
Breath.
It is known as current word for the word for predicting that possibility occurs in next word is currently used in the embodiment of the present application.In view of working as
Next word is likely to be any one word in vocabulary after preceding word, and the application needs to predict each word in vocabulary belongs to and deserve
After preceding word and a possibility that capable of forming sentence with current word, therefore, next word after the current word is known as by the application
The word to be predicted for needing to predict.
Wherein, which can be the word in sentence to be predicted for prediction.Wherein, which can be by a word
Symbol string composition, such as can be a Chinese character;It is also possible to be made of multiple character strings, for example, the word being made of multiple Chinese characters
Group.Correspondingly, word to be predicted is also possible to be made of one or more character string.
It is understood that the mode for obtaining current word may be different in different application scene.
Such as, in one possible implementation, it can be the sequencing according to word each in sentence to be predicted, to pre-
Survey the word for determining in sentence and being currently used in prediction.In this kind of situation, sentence to be predicted inherently needs to predict probability of occurrence
Sentence, which is made of multiple words, and each word needs successively as current word in sentence to be predicted.Example
Such as, sentence to be predicted can be the candidate sentence by speech recognition or machine translation acquisition, and current word may be the candidate
Word in sentence on any position, different moments, current word are the words in candidate's sentence on different location.
In another possible implementation, it is current as be used to predict to obtain the last one word in sentence to be predicted
Word.It in that case, may be next word after needing to predict the sentence to be predicted and the current word by language model
A possibility that candidate sentence of composition.Such as, sentence to be predicted is the sentence to be predicted currently inputted by input method, when current
It carves, which may not be a complete sentence, for example, it may be possible to which an only word or word, also can be more
The imperfect sentence that a word is constituted, and in order to predict next word in the sentence to be predicted after the last one word which is
Which a possibility that word and word, need using the last one word in the sentence to be predicted as current word.
It is understood that can directly predict the word to be predicted after current word based on the current word in sentence to be predicted
(next word) be which word and these words respectively a possibility that.But precision can be very low.In order to guarantee precision of prediction, this
Apply in embodiment, it can also be in conjunction with the corresponding contextual information of word sequence for being used to predict before current time during prediction.
Wherein, the word sequence before current word can be with one or more phrase before the current word in sentence to be predicted
At word sequence;The word sequence may be sky.Such as, if current word is first word for prediction, language to be predicted in this way
First word or sentence to be predicted of sentence only include currently a word, then the word order before current word is classified as sky, in this kind
In the case of, the contextual information that word sequence has also is empty.
Wherein, the semantic relation between the word of contextual information characterization, then the context relation of word sequence is in word sequence
Semantic relation between each word.For the ease of distinguishing, the context that the embodiment of the present application has the word sequence before current word
Information is known as the first contextual information.
Optionally, which can be indicated with term vector, which can also be indicated by vector.
S202 is based on the current word and the first contextual information, determines that the word to be predicted after the current word is belonging respectively to
The probability in multiple and different fields.
From present inventor it has been investigated that: the word to be predicted after current word may belong to some or
Multiple fields, and the different field belonging to word to be predicted, influence whether a possibility that each word belongs to the word to be predicted in vocabulary.
It therefore, in the embodiment of the present application, can be based on the first contextual information of the word sequence after the current word and current word, really
The fixed word to be predicted belongs to the probability of multiple fields.
It is understood that be predicted after can analyze the current word based on current word and first contextual information
Word is belonging respectively to the tendency degree of every field, which can be reflected by probability.
Such as, in the case where a kind of possible, the mapping degree of different semantic relations and different field can be analyzed in advance,
In this way, based on the semantic relation that the current word and the first contextual information are characterized, available word to be predicted and every field
Mapping degree.
In the case where another is possible, field distributed model can be trained in advance, which can use
Multiple sentence sample trainings obtain.Then according to the current word and first contextual information, and the field distributed model is utilized, just
The word to be predicted that the current word can be predicted is belonging respectively to the probability in multiple and different fields.
Wherein, which can be set as needed, and e.g., which can be circulation nerve net
Network model, such as shot and long term memory network (Long Short-Term Memory, LSTM) model.
Wherein, which can be independently of language model, in that case, utilize multiple sentence samples
This is trained the field distributed model can be and is trained using multiple sentence sample to preset network model, will instruct
The network model is practised as the field distributed model.Such as, in sentence sample the sequencing of each word be it is fixed, be based on language
In sentence sample in the sequencing of each word and the sentence sample of mark each word field, network model can be trained, until
The field of each word of network model output meets the requirements with the difference degree actually marked.
Optionally, in order to improve predict the word to be predicted belong to different field probability precision, the field distributed mode
Type can be the model for including in language model, in this way, during using multiple sentence sample training language models, it can be with one
And the field distributed model is trained, the contents of the section will be introduced in the content below.
S203 is based on the current word and the first contextual information, determines that each word is belonging respectively in vocabulary for each field
First possibility of the word to be predicted.
Wherein, vocabulary constructs in advance, the set including multiple words.Vocabulary sometimes can also be referred to as vocabulary,
Or corpus etc..The middle word of vocabulary can be through big data analysis etc., and determining that user is daily may use
Word.It such as, may include the daily word that may be used of 200,000 users in vocabulary.
Wherein, for each field, which is in the case where the word to be predicted belongs to the field, in vocabulary
Word a possibility that belonging to word to be predicted.For a field, in vocabulary the first possibility of a word it is also assumed that be,
In the case that the word belongs to the field, a possibility which belongs to the word to be predicted.
First possibility can indicate that e.g., the first possibility can be numerical value with various ways such as numerical value, grades, then
A possibility that value of first possibility of some word is also big, then the word belongs to word to be predicted in vocabulary degree is higher.
It is understood that every kind of field is directed to, due to there is multiple words in vocabulary, then it is respectively right to obtain multiple word
The first possibility answered, a kind of corresponding first possibility distrabtion in every kind of field.Wherein, corresponding first possibility point in every kind of field
Cloth is actually, in the case where the word to be predicted belongs to the field, in vocabulary each word belong to the first of the word to be predicted can
It can property distribution.
First possibility distrabtion includes the first possibility that each word distinguishes word to be predicted in vocabulary, and e.g., this first can
Energy property distribution can be a vector, as soon as the word in vector in each element direction vocabulary, and the specific value of the element
It is the first possibility that word pointed by the element belongs to word to be predicted.
Field is not considered with current language model, and directly predicts a kind of possibility that each word in vocabulary belongs to word to be predicted
Property distribution it is different, the case where the application belongs to each field in a variety of different fields for word to be predicted, predict word respectively
A possibility that each word belongs to word to be predicted in table.
In order to enable language model can be directed to multiple fields, each word in vocabulary belongs to word to be predicted the is predicted respectively
One possibility, can be set in language model it is multiple estimate function, using multiple sentence sample training language models mistake
Cheng Zhong can make difference estimate function and correspond to different fields by training.Wherein, it can each estimate function and be all based on and work as
Preceding word and the first contextual information predict the first possibility that each word in vocabulary belongs to word to be predicted, only due to every
It is a to estimate the difference of field corresponding to function, so that each estimating each word in the vocabulary that function prediction goes out belongs to the of word to be predicted
One possibility will not be identical, i.e., different the first possibility distrabtion differences estimating function prediction and going out.
Wherein, language model can be obtained by the different network model of training, and network mould selected by language model
Type is different, and estimating function and may also will be different in language model, is language neural network based in language model e.g.
Model, then estimating function can be in the language model logits function in output layer, by logits function by current word
The vector of term vector and first contextual information is converted to dimension vector identical with vocabulary size, that is, what is be commonly called as is defeated
The logtis of logtis out, logtis function output indicates that the logarithm for the ratio not occurred with the event occurs for an event.Phase
It answers, the corresponding logtis of each word is exactly corresponding first possibility of the word.
S204 is belonging respectively to the corresponding vocabulary of probability and each field in multiple and different fields according to the word to be predicted
In each word be belonging respectively to the first possibility of the word to be predicted, determine in the vocabulary that each word is belonging respectively to word to be predicted second can
It can property.
Such as, it will be belonging respectively to the probability of every field according to word to be predicted, determine the weight of every field, for example, will be to
Prediction word belongs to the weight of the probability as the field in a field, correspondingly, the weight based on every field, to every field
The first possibility that each word is belonging respectively to word to be predicted in the corresponding vocabulary is weighted summation, and weighted sum result is just
It is the second possibility that each word is belonging respectively to word to be predicted in vocabulary.
It illustrates, it is assumed that field includes industry and the two agriculture fields, include 3 words in vocabulary is { " ball ", " electricity
Depending on ", " going out ", " people ", " game ", it is assumed that word to be predicted belong to industrial circle probability be 0.6, belong to agriculture field
Probability be 0.4, and characterize by numerical values recited the size of the first possibility, it is assumed that belong to industrial circle in word to be predicted
In the case of, the first possibility that each word is belonging respectively to field to be predicted in the vocabulary be respectively as follows: " ball "=1.2, " TV "=
1.5 ,=0.2 " is gone out " };Assuming that each word is belonging respectively to pre- in the vocabulary in the case where word to be predicted belongs to agriculture field
First possibility in survey field is respectively as follows: { " ball "=0.8, " TV "=1.6, " going out "=0.4 }, then can be by by work
The corresponding probability 0.6 in industry field the first possibility distrabtion corresponding with industrial circle is multiplied, and obtains the first result;By agriculture field
Corresponding probability the first possibility distrabtion corresponding with agriculture field is multiplied, and obtains the second result.Then by the first result and the
Two results addeds, are specifically expressed as follows:
0.6* { " ball "=1.2, " TV "=1.5, " going out "=0.2 }+0.4* " ball "=0.8, " TV "=1.6,
" going out "=0.4 }={ " ball "=0.6*1.2+0.4*0.8=1.04, " TV "=0.6*1.5+0.4*1.6=1.54, " out
Door "=0.6*0.4=0.24 }, that is to say, that " ball " belongs to the second possibility value of word to be predicted for 1.04, " electricity in vocabulary
Regard " belong to the second possibility value that the second possibility value of word to be predicted belongs to word to be predicted as 1.54, " going out " and is
0.24。
Certainly, above is only that the word to be predicted is combined to belong to the probability of different field and belong to difference in word to be predicted
Each word belongs to the first possibility of word to be predicted in vocabulary in the case where field, comprehensive to determine that each word belongs to the second of word to be predicted
A kind of implementation of possibility can also have other possible implementations in practical applications, without restriction herein.
It is understood that in the case where the second possibility is indicated by numerical value, it is contemplated that each word belongs in vocabulary
The numerical values recited of second possibility of word to be predicted is different, is difficult intuitively comparing and goes out the possibility that each word in vocabulary belongs to word to be predicted
Property degree.Therefore, optionally, the second possibility that the word to be predicted can also be belonging respectively to word each in vocabulary carries out normalizing
Change, to obtain the probability distribution that each word in vocabulary is belonging respectively to the word to be predicted.Wherein, probability distribution includes each word point in vocabulary
Do not belong to the probability of the word to be predicted.Wherein, after normalization, all words belong to the probability summation of word to be predicted and are in vocabulary
One.
Wherein, the mode corresponding second possibility of word each in vocabulary being normalized might have it is a variety of, with one kind
Illustrate for mode, can be carried out with softmax function to be belonging respectively to the second possibility of word to be predicted to each word in the vocabulary
Processing, to export the probability distribution obtained after normalization.Such as, it is assumed that include C word in vocabulary, then C word corresponds to C second
Possibility, wherein the second possibility of i-th of word in C word is expressed as vi, by softmax function in C word
Second possibility v of i wordiResulting probability S is normalizediIt is expressed as follows:
It certainly, is herein only to illustrate in a kind of normalized mode as an example, but carried out by other normalized functions
Normalization is applied equally to the present embodiment.
It as known from the above, in the embodiment of the present application, can be according to current word after getting for the current word of prediction
And the contextual information that the word sequence between current word has, the word to be predicted after the current word is analyzed (under current word
One word) belong to the probability in multiple and different fields, and the vocabulary in the case where the word to be predicted belongs to each field is determined respectively
In each word a possibility that belonging to the word to be predicted.Due to the difference of word fields to be predicted, word each in vocabulary can be belonged to
A possibility that word to be predicted, has an impact, and therefore, the probability of every field is belonged in conjunction with word to be predicted, and in word to be predicted
Belong to a possibility that each word belongs to the word to be predicted in vocabulary in the case where different field, it is comprehensive to determine that each word belongs to this in vocabulary
The accuracy that each word in the vocabulary predicted belongs to next word after current word can be improved in a possibility that word to be predicted, from
And be conducive to improve the precision of the prediction affiliated sentence probability of occurrence of the current word.
It is understood that in order to improve prediction precision, after getting current word and first contextual information,
Be also based on the current word and the first contextual information, determine for characterize word sequence before the current word and current word it
Between semantic relation the second contextual information.Second contextual information can reflect out the word before current word and current word
Semantic association relationship in the sentence of sequence composition between each word and each word.
Correspondingly, can be based on second contextual information in above step S202, determine that the word to be predicted belongs to respectively
In the probability of different field;In step S203, it can determine that each word is belonging respectively in vocabulary according to second contextual information
First possibility of word to be predicted.
Further, when based on the prediction of next current word, which can be worked as with next
Preceding word is input in language model, as input information to improve prediction precision.
In order to make it easy to understand, including field distributed model with language model and multiple and different fields are corresponding estimates
Illustrate for function, e.g., referring to Fig. 3, it illustrates the signals of another implementation process of the word prediction technique of the embodiment of the present application
Figure, the process are suitable for the computer equipment of the application.The process may include:
S301 obtains the term vector w (t) of the current word for prediction, and language model trained in advance is the last
The the first contextual information s (t-1) determined.
Wherein, term vector is referred to as word insertion.
It is understood that it is different for the current word for prediction of different moments in language model, and in language mould
When type is needed based on semantic relation between the adjacent word in front and back, the first context letter which determines
The first contextual information that word sequence before breath actually current word has.
Wherein, contextual information can be indicated by vector.For the ease of distinguishing, the word sequence before current word is had
The first contextual information be expressed as s (t-1), and closed semantic between word sequence before subsequent expression current word and the current word
Second contextual information of system is expressed as s (t).
In the present embodiment, language model includes the field distribution for determining next word fields after current word
Model, and respectively correspond multiple and different fields estimate function.Neck in this kind of situation, in language model and language model
By multiple sentence samples, uniformly training obtains domain distributed model with function is estimated.
The current word w (t) and the first contextual information s (t-1) are converted to characterization by the language model and deserved by S302
Second contextual information s (t) of semantic relation between word sequence before preceding word and current word.
Such as, the current word w (t) and the first contextual information s (t-1) can be turned according to preset functional relation
It changes, the available second contextual information s (t).
For example, s (t) can be calculated by following formula two:
S (t)=sigmoid (Uw (t)+Ws (t-1)) (formula two);
Wherein, sigmoid is the function of setting, and U and W are preset vector, wherein U and W can be in training language
It says in model process and determines.
With language model for language model (the Recurrent Neural Network Based based on Recognition with Recurrent Neural Network
Language Model, RNNLM) for illustrate.Referring to fig. 4, it illustrates the part of this language model of RNNLM compositions to illustrate
Figure.
As seen from Figure 4, in the language model importation in addition to include be currently used in prediction current word word to
It measures except w (t), the upper level for also having the upper level hidden layer output of the corresponding hidden layer of the current word in the language model is hidden
It hides layer output vector s (t-1).Wherein, s (t-1) is actually current word to being inputted in the word sequence of the language model before
Semantic relation possessed by each word.Correspondingly, the term vector w (t) and upper level hidden layer output vector s (t-1) input is worked as
The corresponding current hidden layer of preceding word, so that current hidden layer output vector s (t-1) is obtained, current hidden layer output vector s (t-
1) characterization be exactly the term vector w (t) expression current word and current word before word sequence in semantic relation between each word.
The second contextual information s (t) is input in the distributed model of field by S303 by language model, to pass through neck
Domain distributed model determines that the word to be predicted after current word is belonging respectively to the probability of different field.
Optionally, which can be this time recurrence time network model of LSTM.Field distribution is true
The concrete mode for the probability that fixed word to be predicted belongs to different field may refer to the related introduction of preceding embodiment.
The second contextual information s (t), is separately input to that every field is corresponding to estimate function by S304, and is obtained every
A the first possibility distrabtion for estimating function output.
Wherein, which includes the first possibility that each word is belonging respectively to prediction word in vocabulary.Such as, this
One possibility distrabtion can be a vector, and the dimension of the vector is identical as word number in vocabulary, the vector of the first possibility distrabtion
The value of middle different dimensions indicates a possibility that different words belong to word to be predicted in vocabulary value.
It should be noted that the sequence of step S304 and step S304 are not limited to shown in Fig. 3, in practical applications, this
Two steps also may be performed simultaneously;Alternatively, first carry out step S304 executes step S305 again.
S305, based on corresponding the first possibility for estimating function output of the corresponding probability of every field and every field
Distribution is weighted summation, obtains the second possibility distrabtion.
Wherein, which includes the second possibility that each word is belonging respectively to word to be predicted in the vocabulary.
Such as, it is assumed that there is n field, n can be provided with accordingly and estimate function, each estimate the corresponding neck of function
Domain, it is assumed that the probability that word to be predicted belongs to i-th of field is expressed asAnd i-th of field is corresponding estimates the of function output
One may distribution expression vectorThe value of i is to n, then each word belongs to the second of word to be predicted in the comprehensive vocabulary determined
Possibility distrabtion PlIt can be obtained by following formula three:
In order to make it easy to understand, still by taking language model is RNNLM model as an example, and be illustrated in conjunction with Fig. 4.
As seen from Figure 4, distribution function model is further comprised in language model, and pre- with conventional only setting one
Estimating function difference, n are provided in the language model and estimates function, n is the quantity in field, specifically can be set as needed,
This n are estimated the difference of field corresponding to function.It is logtis function that function is estimated in Fig. 4.
As shown in Figure 4, after corresponding current hidden layer output hidden layer output vector s (t) of current word, s (t) is not only
It can be input in the distributed model of field, can also be separately input to multiple estimate in function.
Wherein, field distributed model can be based on the s (t), and next word to be predicted is belonging respectively to after analyzing current word
The probability of every field.Every field is corresponded into the weight that probability is calculated as subsequent weight in Fig. 4, therefore, word to be predicted
The probability for belonging to first field is expressed as weight 1, and the probability that word to be predicted belongs to second field is expressed as weight 2, with
This analogizes, and the probability that word to be predicted belongs to n-th of field is expressed as weight n.
Wherein, the corresponding logtis function in each field can export a vector logtis.Logtis is actually not
Normalized probability distribution, a word in each dimensional representation vocabulary in logtis, word represented by different dimensions is different, accordingly
, the value of each dimension is indicated in the case where word to be predicted belongs to the field in the logtis, the vocabulary which indicates
In word a possibility that being belonging respectively to the word to be predicted degree.In Fig. 4, by first field it is corresponding estimate function output
Logtis is expressed as logtis1, and the logtis for estimating function output in second field is expressed as logtis2, with such
It pushes away, the logtis for estimating function output in n-th of field is expressed as logtisn.
Correspondingly, in order to integrate field to be predicted be belonging respectively to every field probability and each field it is corresponding
The the first possibility distrabtion logtis for estimating function output, determining that each word in vocabulary is belonging respectively to the second of word to be predicted may
Property distribution, then need the logtis for estimating function output in the weight based on each field and each field, be weighted and ask
With logtis after being weighted.
Multiple second possibilities for including in the second possibility distrabtion are normalized in S306, obtain in characterization vocabulary
Each word is belonging respectively to the probability distribution of word to be predicted.
It is understood that each second possibility is without normalized probability distribution in the second possibility distrabtion.
In this way, although the probability that the word belongs to word to be predicted also can if the value of the second possibility of some word is bigger in vocabulary
It is higher, but the second possibility distrabtion is not the probability distribution of conventional language model final output, is based on second possibility
Distribution may be difficult the probability height for intuitively finding out which word belongs to word to be predicted, therefore, alternatively, also need
The value of multiple second possibilities in second possibility distrabtion is normalized.
Such as, illustrate in conjunction with Fig. 4, the output layer of language model is in addition to including above-mentioned multiple logtis letters in Fig. 4
Except number (estimating function), also there is a softmax function.Wherein, multiple field it is corresponding estimate function and
The softmax function belongs to the output layer of this language model of RNNLM.
As seen from Figure 4, multiple multiple logtis for estimating function output are weighted summation, after obtaining weighted sum
Logtis after, which can be input in the softmax function, in this way, softmax function can be defeated
Each word belongs to the probability distribution of the word to be predicted in vocabulary out.
It is understood that being directed to a sentence to be predicted, language model successively makees word each in the sentence to be predicted
For the current word for prediction, and, for each current word, language model prediction goes out each word in the vocabulary and belongs to after current word
After the probability distribution of word to be predicted, language model can also determine the language to be predicted according to each probability distribution predicted
The probability of occurrence of sentence.Either, for a sentence to be predicted, language model prediction go out each word in the vocabulary belong to current word it
Afterwards after the probability distribution of word to be predicted, it can also determine that each word is to be predicted with this respectively in vocabulary according to the probability distribution
The probability of occurrence of candidate's sentence composed by sentence.The probability of occurrence or language to be predicted of the application sentence to be predicted for determination
The specific implementation of the probability of occurrence for the candidate sentence that sentence is separately constituted with word each in vocabulary is without restriction.
It is understood that being provided with the field distributed model and multiple the case where estimating function in language model
Under, the language model can be trained by multiple sentence samples, and when language model training is completed, the language mould
Field distributed model in type with function is estimated also complete by training.In that case, train language model, field distributed model
It is realm information and language identical, that the field distributed model of advantageously ensuring that predicts with sentence sample used by function is estimated
The realm information that function characterization is estimated in speech model matches, and is conducive to improve prediction precision.It is set with independently of language model
It sets a field distributed model and the stand-alone training field distributed model is compared, field distribution function is set in language model and is predicted
In vocabulary each word belong to the probability of word to be predicted accuracy it is higher.
Train the process of the language model in order to facilitate understanding, below in the way of a kind of train language model for carry out
It introduces, e.g., referring to Fig. 5, it illustrates a kind of implementation process schematic diagram of train language model, which may include:
S501 obtains multiple sentence samples for training.
Wherein, each sentence sample includes one or more words.The sequencing of each word is solid in each sentence sample
Fixed.
The sentence sample is input in language model for each sentence sample, obtains the language model prediction by S502
The probability of occurrence of the sentence sample out.
S503 judges the language model prediction according to the probability of occurrence for each sentence sample that the language model prediction goes out
Whether accuracy reaches requirement, if it is, training is completed;If it is not, then field distribution in adjustment language model, language model
Function and each relevant parameter for estimating function, and return to step S502.
It is understood that sentence sample is all the sentence for meeting human language.For each sentence sample, sentence sample
In each word sequence of positions be it is fixed, therefore, the probability of occurrence that language model prediction goes out sentence sample is higher, then explanation should
The accuracy of prediction model prediction is higher.Correspondingly, the probability of occurrence of each sentence sample gone out by language model prediction, it can
Finally to analyze the accuracy of the language model prediction.
It is understood that then explanation is directed to sentence sample in the case where the prediction accuracy of language model meets the requirements
Each current word in this, the word to be predicted after the current word that field distributed model predicts in language model belong to each neck
The accuracy of the probability in domain is also satisfactory.Correspondingly, the corresponding function of estimating of every field is estimated out in language model
The first possibility distrabtion be also satisfactory, therefore, the field distribution when language model training is completed, in language model
It model and estimates function and is also trained to complete.
It should be noted that Fig. 5 is intended merely to facilitate the process for understanding train language model, and with a kind of training language
Model approach has carried out simple introduction, but it is understood that, in practical applications, the mode of train language model can have
Other may, for training the language model by other means the case where is applied equally to the application.
In above word prediction technique embodiment, since the quantity of word in vocabulary is larger, in this way, for multiple fields point
Corresponding first possibility distrabtion of vocabulary is not predicted (word each in vocabulary is belonging respectively to the of word to be predicted for ease of description
One possibility is known as corresponding first possibility distrabtion of vocabulary), and multiple first possibility distrabtions predicted are added
Read group total is weighed, is necessarily required to occupy the memory increased, and since data processing amount is big, also influences whether predetermined speed, cause
Forecasting efficiency is relatively low.
In order to be further reduced EMS memory occupation, and improve forecasting efficiency on the basis of guaranteeing to predict precision, this
Apply for that inventor studies discovery: the quantity of word is larger in vocabulary, however the quantity of the common word of people is relatively fewer, e.g., word
The rank of 200,000 words may be reached in table, but the commonly used word of people may only have 1 ten thousand to 2 ten thousand or so.It, can be with based on this
The first possibility distrabtion that these everyday words of domain prediction will be divided to belong to word to be predicted only for the everyday words in vocabulary, and
Carry out corresponding weighted sum;And for these remaining seldom words, then it may not need a point domain prediction, only need to directly predict
These a possibility that seldom word belongs to word to be predicted out.In this way, being equivalent to only since the frequency of use of seldom word is lower
Less prediction precision is had lost, but EMS memory occupation can be greatly reduced, and improve forecasting efficiency.
Based on above the study found that total vocabulary comprising whole words can also be divided into two parts by the application, specifically according to
The frequency of use that word is used by a user is divided into two parts, wherein a part is high frequency vocabulary, and another part is low frequency vocabulary.It should
Total vocabulary is the vocabulary comprising multiple words constructed in advance, and the total quantity of the word in total vocabulary is more than word in high frequency vocabulary
Total quantity, certainly, the total quantity of word is also more than the total quantity of the word of low frequency vocabulary in total vocabulary.In that case, total word
Table is equivalent to the vocabulary of preceding embodiment.Wherein, the high frequency vocabulary is in total vocabulary by the higher multiple words of frequency of use by being constituted,
Such as, it sorts from high to low according to the frequency of use of word, using word of the frequency of use sequence in preceding specific bit in total vocabulary as height
Word in frequency vocabulary.Correspondingly, low frequency vocabulary is made of the multiple words for being not belonging to high frequency vocabulary in total vocabulary.As it can be seen that low
Frequency of use of the frequency of use of word in frequency vocabulary lower than the word in high frequency vocabulary.Wherein it is possible to by data statistic analysis,
Determine that user for the access times etc. of each word, finally obtains the frequency of use of each word.
Correspondingly, different processing can be carried out for the word in the word and low frequency vocabulary in high frequency vocabulary.Such as, referring to figure
6, it illustrates a kind of another flow diagrams of word prediction technique of the application.The method of the present embodiment can be applied to the application
Computer equipment, this method may include:
S601, the first context letter that the word sequence before obtaining current word and the current word for prediction has
Breath.
S602 is based on the current word and the first contextual information, determines that the word to be predicted after the current word is belonging respectively to
The probability in multiple and different fields.
Above step S601 and S602 may refer to the related introduction of preceding embodiment, and details are not described herein.
S603 is based on the current word and the first contextual information for each field, determines each word difference in high frequency vocabulary
Belong to the first possibility of the word to be predicted.
Wherein, for each field, which is the high frequency words in the case where the word to be predicted belongs to the field
A possibility that word in table belongs to word to be predicted.
For the ease of distinguishing, the word in high frequency vocabulary is properly termed as high frequency words, and each word in low frequency vocabulary is properly termed as low
Frequency word.It is understood that belonging to the feelings in each field in word to be predicted in the embodiment of the present application only for high frequency vocabulary
Under condition, predict that each high frequency words in the high frequency vocabulary belong to the first possibility of word to be predicted.As such, it is desirable to which field is divided to calculate the
The quantity of the high frequency words of one possibility is relatively fewer, advantageously reduces EMS memory occupation, reduces data processing amount, improves data processing
Efficiency.
Wherein, high frequency vocabulary and the difference of vocabulary in preceding embodiment are only in that the quantity of word, and predict high frequency
Each word belongs in the process and preceding embodiment of the first possibility of word to be predicted that each word belongs to word to be predicted in vocabulary in vocabulary
The first possibility process it is identical, specifically may refer to the related introduction of preceding embodiment, details are not described herein.
It is similar to preceding embodiment, optionally, after step S601, it is also based on the current word and about first
Literary information determines that the second context for semantic relation between the word sequence before characterizing the current word and the current word is believed
Breath.Correspondingly, second contextual information can be based on, determine that each word is belonging respectively to the first of the word to be predicted in high frequency vocabulary
Possibility.
S604 is belonging respectively to the corresponding high frequency of probability and each field in multiple and different fields according to the word to be predicted
Each word is belonging respectively to the first possibility of the word to be predicted in vocabulary, determines that each word is belonging respectively to word to be predicted in the high frequency vocabulary
The second possibility.
Due to the negligible amounts of word in high frequency vocabulary, each high frequency words corresponding first in calculated high frequency vocabulary
The negligible amounts of possibility, i.e., the number of the first possibility distrabtion comprising corresponding first possibility of high frequency words each in high frequency vocabulary
Measure it is less, accordingly, it is determined that each word is belonging respectively to during the second possibility of word to be predicted in the high frequency vocabulary, required processing
Data volume also can be relatively fewer, advantageously reduces EMS memory occupation, and improve treatment effeciency.
S605 is based on the current word and the first contextual information, determines that each word is belonging respectively to this to pre- in low frequency vocabulary
Survey the third possibility of word.
Wherein, for the ease of distinguishing, a possibility that word in low frequency vocabulary is belonged into word to be predicted referred to as third possibility.
It is understood that although the frequency of use of each word is relatively low in low frequency vocabulary, in low frequency vocabulary respectively
Word is it could also be possible that the word in sentence to be predicted either forms the word of new sentence with sentence to be predicted.
For example, by taking the candidate sentence that speech recognition goes out is " a winged Chong sky is grey " as an example.In order to predict candidate's sentence
Probability of occurrence, need to predict the probability of each lexeme word sequence before the word in candidate's sentence respectively.Wherein, it is necessary to
Prediction " Chong " is the probability of next word after " winged ", and " Chong " belongs to word in low frequency vocabulary, therefore, only according to " winged " with
And " one " corresponding contextual information, a possibility that each word is belonging respectively to word to be predicted in low frequency vocabulary (i.e. third is predicted respectively
Possibility), just it is available to " Chong " belong to " winged " later next word probability.
Wherein it is determined that the mode of each corresponding third possibility of word does not consider current word actually in low frequency vocabulary
The field that next word belongs to later, and it is directly based upon current word and the first contextual information, determine that each word belongs in low-frequency word
A possibility that word to be predicted.
S606 is belonging respectively in the second possibility and low frequency vocabulary of word to be predicted respectively according to each word in the high frequency vocabulary
Word is belonging respectively to the third possibility of the word to be predicted, constructs the possibility that each word in total vocabulary is belonging respectively to the word to be predicted
Property.
It is understood that the dimension of total vocabulary is the sum of word quantity and word quantity in the low frequency vocabulary in high frequency vocabulary,
And high frequency vocabulary is not present with the word in low frequency vocabulary and is overlapped, therefore, word each in high frequency vocabulary is belonged to word second to be predicted can
Energy property is combined with the third possibility that each word in the low frequency vocabulary belongs to the word to be predicted, includes high frequency words so as to construct
A possibility that each word belongs to word to be predicted in table and low frequency vocabulary, that is, obtained each word in total vocabulary and be belonging respectively to pre-
A possibility that surveying word.
Such as, it is assumed that include word 1 and word 2 in high frequency vocabulary, while suppositive 1 belongs to the second possibility expression of word to be predicted
Possibility 1, the second possibility that word 2 belongs to word to be predicted is possibility 2;And assuming low frequency vocabulary includes word 3, word 4, word 5, word
6 and word 7, it is assumed that word 3, word 4, word 5, word 6 and word 7 are belonging respectively to the third possibility of word to be predicted successively are as follows: possibility 3, can
It can property 4, possibility 5, possibility 6 and possibility 7.Then this two parts is combined, each word in total vocabulary is obtained and belongs to word to be predicted
A possibility that may include: word 1: possibility 1;Word 2: possibility 2;Word 3: possibility 3;Word 4: possibility 4;Word 5: possibility 5;
Word 6: possibility 6;Word 7: possibility 7.
S607, a possibility that being belonging respectively to word to be predicted to each word in total vocabulary, are normalized, and obtain each in total vocabulary
Word is belonging respectively to the probability distribution of word to be predicted.
Wherein, step S607 is optional step, the purpose is to by each word in total vocabulary belong to word to be predicted can
Energy property is normalized, and belongs to the probability distribution of word to be predicted to obtain each word in total vocabulary, to intuitively recognize in vocabulary
Different words belong to the size of a possibility that word to be predicted degree.
Similar to preceding embodiment, in the case where total vocabulary distinguishes high frequency vocabulary and low frequency vocabulary, language model can also
To include that field distributed model and multiple fields respectively correspond and estimate function.The difference is that multiple in the language model
The corresponding function of estimating in field is to carry out possibility prediction for the word in high frequency vocabulary, moreover, further including in the language model
It is corresponding with low frequency vocabulary to estimate function, and the corresponding function of estimating of the low frequency vocabulary is that the word progress being directed in low frequency vocabulary can
It can property prediction.Specifically, may refer to Fig. 7, it illustrates a kind of another flow diagrams of word prediction technique of the application, originally
Embodiment is suitable for the computer equipment of the application, which may include:
S701 obtains the term vector w (t) of the current word for prediction, and language model trained in advance is the last
The the first contextual information s (t-1) determined.
The current word w (t) and the first contextual information s (t-1) are converted to characterization by the language model and deserved by S702
Second contextual information s (t) of semantic relation between word sequence before preceding word and current word.
The second contextual information s (t) is input in the distributed model of field by S703 by language model, to pass through neck
Domain distributed model determines that the word to be predicted after current word is belonging respectively to the probability of different field.
Above step S701 to S703 may refer to the related introduction of preceding embodiment, and details are not described herein
It is pre- to be separately input to the corresponding high frequency of every field by language model by S704 by the second contextual information s (t)
Estimate function, and obtains the first possibility distrabtion that each high frequency estimates function output.
Wherein, which includes the first possibility that each word is belonging respectively to prediction word in high frequency vocabulary.Such as,
First possibility distrabtion can be a vector, and the dimension of the vector is identical as word number in high frequency vocabulary, the first possibility point
The value of different dimensions indicates a possibility that different words belong to word to be predicted in high frequency vocabulary value in the vector of cloth.
Wherein, it is contemplated that every field is corresponding to be estimated function only for the word in high frequency vocabulary to carry out possibility pre-
It surveys, therefore, is distinguished for the ease of function of estimating corresponding with low frequency vocabulary, the corresponding function of estimating in each field is referred to as height
Frequency estimates function, and by low frequency vocabulary it is corresponding estimate function and be known as high frequency estimate function.
S705, estimate function output based on the corresponding probability of every field and the corresponding high frequency of every field first can
Energy property distribution is weighted summation, obtains the second possibility distrabtion.
Wherein, which includes the second possibility that each word is belonging respectively to word to be predicted in the high frequency vocabulary
Property.
In order to make it easy to understand, still by taking language model is RNNLM model as an example, and be illustrated in conjunction with Fig. 8.It can be with by Fig. 8
Find out, includes that distribution function model and the corresponding n high frequency of n different field estimate function in language model.N is field
Quantity.All functions of estimating are assumed to be logtis function in fig. 8, so high frequency can be estimated function is described as high frequency
Logtis function.
As shown in Figure 8, determine that the process of current hidden layer output hidden layer output vector s (t) is identical as Fig. 4.Obtaining s
(t) after, s (t) can be input in the distributed model of field, can be also separately input to each high frequency and be estimated in function.
It is similar to Fig. 4, in fig. 8, every field is corresponded into probability and is denoted as weight, correspondingly, word to be predicted belongs to first
The probability in a field is expressed as weight 1, and the probability that word to be predicted belongs to second field is expressed as weight 2, and so on, to pre-
The probability that survey word belongs to n-th of field is expressed as weight n.
Correspondingly, the corresponding high frequency logtis function in each field can export a vector logtis.By first field
The logtis that corresponding high frequency estimates function output is expressed as logtis1, and the corresponding high frequency in second field is estimated function
The logtis of output is expressed as logtis2, and so on, the logtis that the high frequency in n-th of field estimates function output is indicated
For logtisn.
Further, the high frequency of weight and each field based on each field estimates the logtis of function output, carries out
Weighted sum, the high frequency logtis after being weighted, high frequency logtis indicate that each word belongs to word to be predicted in high frequency vocabulary
Second possibility distrabtion.
The second contextual information s (t) is input to the corresponding low frequency of low frequency vocabulary by language model and estimates letter by S706
Number, and obtain the third possibility distrabtion that low frequency estimates function output.
Wherein, which includes a possibility that each word in low frequency vocabulary is belonging respectively to word to be predicted.
Such as Fig. 8, language model is in addition to including that distribution function model and the corresponding n high frequency of n different field estimate function
Except, it further include the function of estimating for predicting a possibility that each word belongs to word to be predicted in low frequency vocabulary, i.e. low frequency estimates letter
Number.
Correspondingly, the s (t) of hidden layer output can also be input into the low frequency and estimate in function, which is estimated in function
Based on the s (t), logtis a possibility that each word belongs to word to be predicted in low frequency vocabulary is predicted, it is for the ease of distinguishing, low frequency is pre-
The logtis for estimating function output is expressed as low frequency logtis.
Optionally, it is contemplated that the quantity of word is larger in low frequency vocabulary, therefore, calculates low frequency if directly utilizing and estimating function
A possibility that each word is belonging respectively to word to be predicted in vocabulary, then this, which estimates function, needs to export the biggish vector of dimension,
In this way, influencing whether to estimate the computational efficiency of function.In order to improve the computational efficiency for estimating function, low frequency vocabulary can be carried out
Therefore multiple words in low frequency vocabulary can be divided into m group by dimensionality reduction, m is the natural number more than or equal to 2, specifically can be with
It is set as needed.Wherein, each group includes multiple words, and in each group the frequency of use of each word summation it is equal.
M dimensionality reduction square is also respectively provided between function correspondingly, estimating in hidden layer low frequency corresponding with low frequency vocabulary
Battle array (being not shown in fig. 8), the corresponding group of each dimensionality reduction matrix.
Illustrate by taking a group as an example, after hidden layer exports the second contextual information s (t), s (t) can be corresponded to by the group
Dimensionality reduction matrix, to reduce the dimension of s (t).Then, the s (t) after dimensionality reduction is input to low frequency and estimated in function by language model,
The low frequency estimates function can be based on the s (t) after the dimensionality reduction, and that predicts that multiple low-frequency words in the group are belonging respectively to word to be predicted can
Energy property obtains the corresponding possibility distrabtion of the group, the dimension of the s (t) after the corresponding vector dimension of the possibility distrabtion and dimensionality reduction
It is identical.
Such as, a possibility that i-th group of output that low frequency vocabulary marks off, is distributed logtisiIt can be expressed as follows:
logitsi=(projis(t)+biaspi)×taili+biasi(formula four);
Wherein, i is the natural number from 1 to m.projiFor i-th group of corresponding dimensionality reduction matrix.tialiIt indicates in i-th group
The vector of each word;biaspiFor the first preset bias vector;biasiFor the second preset bias vector.
It is to illustrate by taking a group as an example above, each group is required to operate according to such as upper type, wherein by different groups
The quantity of middle word is different, therefore, difference organize the dimensions of corresponding dimensionality reduction matrix also can all differences, but final each group can
The vector dimension of energy property distribution is all the same.Such as, it is assumed that S (t) is 1024, and assumes to include 10,000 words in a group 1, and organizes 2
In include 20,000 words, then the corresponding dimensionality reduction matrix of group 1 can be 512 dimensions, correspondingly, the corresponding possibility distrabtion of group 1
Vector dimension is 512 dimensions;And the vector dimension of a possibility that organizing 2 corresponding dimensionality reduction matrixes can be 216, then group 2 is corresponding indignation
Number is also 512 dimensions.
Correspondingly, the vector based on each group of corresponding possibility distrabtion, can construct each in all groups for characterizing
A possibility that a word is belonging respectively to word to be predicted distribution.
S707 is based on corresponding second possibility of the high frequency vocabulary and the corresponding third possibility of low frequency vocabulary, building
Total possibility distrabtion that each word in total vocabulary is belonging respectively to the word to be predicted is characterized out.
Wherein, which includes a possibility that each word is belonging respectively to the word to be predicted in total vocabulary.
As shown in Figure 8, the dimension of low frequency logtis is different with the dimension of high frequency logtis, low frequency logtis and high frequency
Logtis is only capable of characterizing a possibility that participle belongs to word to be predicted in the middle part of total vocabulary.Therefore, by the dimension of low frequency logtis
Being combined into dimension vector identical with word quantity in total vocabulary with high frequency logtis is actually to construct one
Logtis is belonging respectively to the word possibility to be predicted to characterize each word in total vocabulary, in order to distinguish, the logtis that will construct
Referred to as total logtis.
S708, a possibility that corresponding to the multiple words for including in total possibility distrabtion, are normalized, and obtain characterizing total word
Each word is belonging respectively to the probability distribution of word to be predicted in table.
As shown in figure 8, the total logtis that will be used to characterize a possibility that each word is belonging respectively to the word to be predicted in total vocabulary
It is input to softmax function.Correspondingly, the probability distribution of softmax function output just includes that each word of total vocabulary belongs to this
The probability of word to be predicted.
A kind of application scenarios that the embodiment of the present application is applicable in are introduced in the scheme of the application in order to facilitate understanding.
Such as Fig. 9, it illustrates a kind of composition schematic diagrams for application scenarios that the word prediction technique of the application is applicable in.
In Fig. 9 by taking application scenarios are language identification scene as an example.It such as, as seen from Figure 9, include voice in the scene
System, speech recognition system include: computer equipment 901, data statistics server 902 and speech recognition server 903.
Wherein, computer equipment can be the server for being used to analyze candidate sentence probability of occurrence in speech recognition system,
The language model that the application any of the above one embodiment is mentioned can be preset in the computer equipment.
The data statistics server can determine that vocabulary provides foundation for computer equipment.
It is understood that being speech recognition server in Fig. 9 and the computer equipment with language model is two
Illustrate for independent equipment, but it is understood that, in practical applications, computer equipment and speech recognition server
It can be same equipment.
As seen from Figure 9, the voice to be identified that such as user inputs can be sent to speech recognition by user terminal 904
Server 903, as shown in step S91.
And speech recognition server 903 can convert out the possible corresponding multiple candidate sentences texts of voice institute to be identified
This.And in order to determine this multiple candidate statement text, which is only and meets the higher statement text of degree of conformity of human language,
It is exactly which candidate sentence is the relatively high statement text of the accuracy that is gone out based on speech recognition, then voice server can be by base
It is sent to the computer equipment 901 in multiple candidate statement texts that voice to be identified is converted out, as shown in step S92 in Fig. 9,
Correspondingly, the computer equipment 901 can be according to the scheme of preceding embodiment introduction, for each candidate sentence text
This, successively using each word in candidate's statement text as current word, and predicts that each word belongs to the current word in vocabulary
Word to be predicted probability distribution.Then, based on the probability distribution predicted, and each word in candidate statement text is combined, it can
To analyze the probability of occurrence of candidate statement text, as shown in step S93.
Then, the probability of occurrence of each candidate statement text predicted is sent to the voice and known by computer equipment 901
Other server 903, as shown in step S94.
And speech recognition server 903 according to these sequences of candidate statement text probability of occurrence from high to low to candidate language
Sentence text sequence, and the candidate statement text after sequence is returned into user terminal, the voice is quickly selected to be conducive to user
Corresponding statement text, as shown in step S95.
It is understood that Fig. 9 be only illustrate for a kind of application scenarios, but it is understood that, the application is real
Apply application scenarios that the scheme of example is applicable in can there are many may, the application is without restriction to this.
On the other hand, present invention also provides a kind of word prediction meanss.Such as, referring to Figure 10, it illustrates the application one kind
The composed structure schematic diagram of word prediction meanss one embodiment, the computer that the word prediction meanss of the application are suitable for the application are set
It is standby, the apparatus may include:
Acquiring unit 1001 is inputted, for obtaining the word sequence tool being used for before the current word and the current word of prediction
The first contextual information having;
Domain prediction unit 1002 determines the current word for being based on the current word and the first contextual information
Word to be predicted later is belonging respectively to the probability in multiple and different fields;
First predicting unit 1003 is based on the current word and first context for being directed to each field
Information determines that each word in vocabulary is belonging respectively to the first possibility of the word to be predicted, first possibility be it is described to
In the case that prediction word belongs to the field, a possibility that word in the vocabulary belongs to the word to be predicted;The vocabulary is
The set comprising multiple words constructed in advance;
Second predicting unit 1004, for being belonging respectively to the probability in multiple and different fields according to the word to be predicted, and
Each word is belonging respectively to the first possibility of the word to be predicted in the corresponding vocabulary in each field, determines each in the vocabulary
Word is belonging respectively to the second possibility of the word to be predicted.
Optionally, which can also include:
Normalization unit, for determining that each word is belonging respectively to described to be predicted in the vocabulary in second predicting unit
After second possibility of word, the second possibility for being belonging respectively to the word to be predicted to each word in the vocabulary carries out normalizing
Change, obtains the probability distribution that each word in the vocabulary is belonging respectively to the word to be predicted.
Optionally, in order to reduce the EMS memory occupation in prediction, forecasting efficiency is improved, in the application word prediction meanss, this
The vocabulary in one predicting unit and the second predicting unit is high frequency vocabulary, and the high frequency vocabulary is by being used frequency in total vocabulary
The higher multiple words of rate are constituted.In that case, it may refer to Figure 11, it illustrates a kind of word prediction meanss of the application are another
The composed structure schematic diagram of a embodiment, the device of the present embodiment and preceding embodiment device the difference is that, the device
Can also include:
Third predicting unit 1005 determines in low frequency vocabulary for being based on the current word and the first contextual information
Each word is belonging respectively to the third possibility of the word to be predicted, and the low frequency vocabulary is by being not belonging to the high frequency in total vocabulary
Multiple words of vocabulary are constituted, and total vocabulary is the set comprising multiple words constructed in advance, and, the sum of word in total vocabulary
Total quantity of the amount more than word in the high frequency vocabulary;
Predict combining unit 1006, it may for being belonging respectively to the second of word to be predicted according to each word in the high frequency vocabulary
Property and low frequency vocabulary in each word be belonging respectively to the third possibility of the word to be predicted, construct each word point in total vocabulary
A possibility that not belonging to the word to be predicted.
Optionally, in the embodiment of apparatus above, which can also include:
Context switch unit, for obtaining the current word and first context in the input acquiring unit
After information, it is based on the current word and first contextual information, is determined for characterizing the current word and described current
Second contextual information of semantic relation between word sequence before word;
The domain prediction unit, is specifically used for, and is based on the described second information above, determine after the current word to
Prediction word is belonging respectively to the probability in multiple and different fields;
First predicting unit, specifically, it is based on second contextual information for being directed to each field,
Determine that each word in vocabulary is belonging respectively to the first possibility of the word to be predicted.
Further, the domain prediction unit, comprising:
Domain prediction subelement, for using field distributed model trained in advance, determine after the current word to
Prediction word is belonging respectively to the probability in multiple and different fields, and the field distributed model is to be obtained based on multiple sentence sample trainings
's.
In one implementation, the input acquiring unit may include:
Input obtain subelement, for obtains be used for prediction current word term vector, and in advance train language mould
The first contextual information that type the last time is determined, the language model include the field distributed model and the multiple
Different field is corresponding to estimate function, wherein the field distributed mode in the language model and the language model
Type and the function of estimating are obtained by the unified training of multiple sentence samples;
Correspondingly, first predicting unit, comprising:
First prediction subelement, for second contextual information to be separately input to, every field is corresponding to estimate letter
Number, and the first possibility distrabtion for each estimating function output is obtained, first possibility distrabtion includes each word point in vocabulary
Do not belong to the first possibility of the prediction word.
Optionally, the second predicting unit is specifically used for, corresponding based on the corresponding probability of every field and every field
The first possibility distrabtion for estimating function output is weighted summation, obtains the second possibility distrabtion, second possibility point
Cloth includes the second possibility that each word is belonging respectively to the word to be predicted in the vocabulary.
In order to make it easy to understand, it illustrates a kind of composed structures of computer equipment in the embodiment of the present application referring to Figure 12
Schematic diagram.In Fig. 7, which may include: processor 1201, memory 1202, communication interface 1203, defeated
Enter unit 1204 and display 1205 and communication bus 1206.
Processor 1201, communication interface 1203, input unit 1204, display 1205, passes through communication at memory 1202
Bus 1206 completes mutual communication.
In the embodiment of the present application, the processor 1201 can be central processing unit (Central Processing
Unit, CPU), application-specific integrated circuit (application-specific integrated circuit, ASIC), number
Signal processor (DSP), specific integrated circuit (ASIC), ready-made programmable gate array (FPGA) or other programmable logic devices
Part etc..
The processor can call the program stored in memory 1202, specifically, can handle more than device can execute
Operation performed by computer equipment side in Fig. 1 and Fig. 9.
For storing one or more than one program in memory 1202, program may include program code, the journey
Sequence code includes computer operation instruction, in the embodiment of the present application, is at least stored in the memory for realizing following function
The program of energy:
The first contextual information that word sequence before obtaining current word and the current word for prediction has;
Based on the current word and the first contextual information, determine that the word to be predicted after the current word is belonging respectively to
The probability in multiple and different fields;
For each field, it is based on the current word and first contextual information, determines each word point in vocabulary
Do not belong to the first possibility of the word to be predicted, first possibility is to belong to the feelings in the field in the word to be predicted
Under condition, a possibility that word in the vocabulary belongs to the word to be predicted;The vocabulary is to be constructed in advance comprising multiple words
Set;
The corresponding vocabulary of probability and each field in multiple and different fields is belonging respectively to according to the word to be predicted
In each word be belonging respectively to the first possibility of the word to be predicted, determine that each word is belonging respectively to the word to be predicted in the vocabulary
The second possibility.
In one possible implementation, which may include storing program area and storage data area, wherein
Storing program area can storage program area and at least one function (such as sound-playing function, image player function etc.) institute
The application program etc. needed;Storage data area can store the data created in the use process according to computer.
In addition, memory 1202 may include high-speed random access memory, it can also include nonvolatile memory, example
Such as at least one disk memory, flush memory device or other volatile solid-state parts.
The communication interface 1203 can be the interface of communication module, such as the interface of gsm module.
The application can also include display 1204 and input unit 1205, which includes display panel, such as
Touch display panel etc.;The input unit can be with touch sensing unit, keyboard etc..
Certainly, computer equipment structure shown in Figure 12 does not constitute the limit to computer equipment in the embodiment of the present application
Fixed, computer equipment may include than more or fewer components shown in Figure 12, or the certain portions of combination in practical applications
Part.
On the other hand, present invention also provides a kind of storage medium, it is executable that computer is stored in the storage medium
Instruction when the computer executable instructions are loaded and executed by processor, is realized described in as above any one embodiment
Word prediction technique.
It should be noted that all the embodiments in this specification are described in a progressive manner, each embodiment weight
Point explanation is the difference from other embodiments, and the same or similar parts between the embodiments can be referred to each other.
For device class embodiment, since it is basically similar to the method embodiment, so being described relatively simple, related place ginseng
See the part explanation of embodiment of the method.
Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by
One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation
Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning
Covering non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes that
A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or
The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged
Except there is also other identical elements in the process, method, article or equipment for including element.
The foregoing description of the disclosed embodiments can be realized those skilled in the art or using the present invention.To this
A variety of modifications of a little embodiments will be apparent for a person skilled in the art, and the general principles defined herein can
Without departing from the spirit or scope of the present invention, to realize in other embodiments.Therefore, the present invention will not be limited
It is formed on the embodiments shown herein, and is to fit to consistent with the principles and novel features disclosed in this article widest
Range.
The above is only the preferred embodiment of the present invention, it is noted that those skilled in the art are come
It says, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications also should be regarded as
Protection scope of the present invention.
Claims (15)
1. a kind of word prediction technique characterized by comprising
The first contextual information that word sequence before obtaining current word and the current word for prediction has;
Based on the current word and the first contextual information, it is multiple to determine that the word to be predicted after the current word is belonging respectively to
The probability of different field;
For each field, it is based on the current word and the first contextual information, determines that each word is belonging respectively to institute in vocabulary
The first possibility of word to be predicted is stated, first possibility is the institute in the case where the word to be predicted belongs to the field
A possibility that word in predicate table belongs to the word to be predicted;The vocabulary is the set comprising multiple words constructed in advance;
It is belonging respectively in the corresponding vocabulary of probability and each field in multiple and different fields respectively according to the word to be predicted
Word is belonging respectively to the first possibility of the word to be predicted, determines that each word is belonging respectively to the of the word to be predicted in the vocabulary
Two possibilities.
2. word prediction technique according to claim 1, which is characterized in that the vocabulary is high frequency vocabulary, the high frequency words
For table by being constituted in total vocabulary by the higher multiple words of frequency of use, total vocabulary is the collection comprising multiple words constructed in advance
It closes, and, total quantity of the total quantity of word more than word in the high frequency vocabulary in total vocabulary;
Further include:
Based on the current word and the first contextual information, determine that each word is belonging respectively to the word to be predicted in low frequency vocabulary
Third possibility, the low frequency vocabulary are made of the multiple words for being not belonging to the high frequency vocabulary in total vocabulary;
Each word difference in the second possibility and low frequency vocabulary of word to be predicted is belonging respectively to according to each word in the high frequency vocabulary
The third possibility for belonging to the word to be predicted constructs the possibility that each word in total vocabulary is belonging respectively to the word to be predicted
Property.
3. word prediction technique according to claim 1, which is characterized in that each word is belonging respectively in the determination vocabulary
After second possibility of the word to be predicted, further includes:
The second possibility for being belonging respectively to the word to be predicted to each word in the vocabulary is normalized, and obtains in the vocabulary
Each word is belonging respectively to the probability distribution of the word to be predicted.
4. word prediction technique according to any one of claims 1 to 3, which is characterized in that obtain described for prediction
After the first contextual information that word sequence before current word and the current word has, further includes:
Based on the current word and first contextual information, determine for before characterizing the current word and the current word
Word sequence between semantic relation the second contextual information;
It is described to be based on the current word and the first contextual information, determine that the word to be predicted after the current word is belonging respectively to
The probability in multiple and different fields, comprising:
Based on the described second information above, determine that the word to be predicted after the current word is belonging respectively to the general of multiple and different fields
Rate;
It is described to be directed to each field, it is based on the current word and first contextual information, determines each word point in vocabulary
Do not belong to the first possibility of the word to be predicted, comprising:
For each field, it is based on second contextual information, determines that each word is belonging respectively to described to be predicted in vocabulary
First possibility of word.
5. word prediction technique according to claim 4, which is characterized in that be predicted after the determination current word
Word is belonging respectively to the probability in multiple and different fields, comprising:
Using field distributed model trained in advance, determine that the word to be predicted after the current word is belonging respectively to multiple and different necks
The probability in domain, the field distributed model are obtained based on multiple sentence sample trainings.
6. word prediction technique according to claim 5, which is characterized in that the current word obtained for prediction and institute
The first contextual information that word sequence before stating current word has, comprising:
Obtain the term vector of the current word for prediction, and language model the last time trained in advance determine first on
Context information, the language model includes the field distributed model and the multiple different field is corresponding estimates letter
Number, wherein the field distributed model and the function of estimating in the language model and the language model are by more
The unified training of a sentence sample obtains;
It is described to be directed to each field, be based on second contextual information, determine in vocabulary each word be belonging respectively to it is described to
Predict the first possibility of word, comprising:
Second contextual information is separately input to every field is corresponding to be estimated in function, and obtains and each estimates function
First possibility distrabtion of output, first possibility distrabtion include each word is belonging respectively to the prediction word in vocabulary first
Possibility.
7. word prediction technique according to claim 6, which is characterized in that each word is belonging respectively in the determination vocabulary
Second possibility of the word to be predicted, comprising:
It is carried out based on corresponding the first possibility distrabtion for estimating function output of the corresponding probability of every field and every field
Weighted sum, obtains the second possibility distrabtion, second possibility distrabtion include in the vocabulary each word be belonging respectively to it is described
Second possibility of word to be predicted.
8. a kind of word prediction meanss characterized by comprising
Acquiring unit is inputted, first had for obtaining the word sequence being used for before the current word and the current word of prediction
Contextual information;
Domain prediction unit, for being based on the current word and the first contextual information, determine after the current word to
Prediction word is belonging respectively to the probability in multiple and different fields;
First predicting unit is based on the current word and the first contextual information, determines vocabulary for being directed to each field
In each word be belonging respectively to the first possibility of the word to be predicted, first possibility be belong in the word to be predicted it is described
In the case where field, a possibility that word in the vocabulary belongs to the word to be predicted;The vocabulary is the packet constructed in advance
Set containing multiple words;
Second predicting unit, for be belonging respectively to according to the word to be predicted multiple and different fields probability and each field
Each word is belonging respectively to the first possibility of the word to be predicted in the corresponding vocabulary, determines that each word belongs to respectively in the vocabulary
In the second possibility of the word to be predicted.
9. word prediction meanss according to claim 8, which is characterized in that first predicting unit and the second predicting unit
In the vocabulary be high frequency vocabulary, the high frequency vocabulary is described by being constituted in total vocabulary by the higher multiple words of frequency of use
Total vocabulary is the set comprising multiple words constructed in advance, and, the total quantity of word is more than in the high frequency vocabulary in total vocabulary
The total quantity of word;
Further include:
Third predicting unit determines each word difference in low frequency vocabulary for being based on the current word and the first contextual information
Belong to the third possibility of the word to be predicted, the low frequency vocabulary is by being not belonging to the more of the high frequency vocabulary in total vocabulary
A word is constituted;
Combining unit is predicted, for being belonging respectively to the second possibility of word to be predicted and low according to each word in the high frequency vocabulary
Each word is belonging respectively to the third possibility of the word to be predicted in frequency vocabulary, constructs each word in total vocabulary and is belonging respectively to institute
A possibility that stating word to be predicted.
10. word prediction meanss according to claim 8, which is characterized in that further include:
Normalization unit, for determining that each word is belonging respectively to the word to be predicted in the vocabulary in second predicting unit
After second possibility, the second possibility for being belonging respectively to the word to be predicted to each word in the vocabulary is normalized, and is obtained
Into the vocabulary, each word is belonging respectively to the probability distribution of the word to be predicted.
11. according to the described in any item word prediction meanss of claim 8 to 10, which is characterized in that further include:
Context switch unit, for obtaining the current word and first contextual information in the input acquiring unit
Later, the current word and first contextual information are based on, determine for characterize the current word and the current word it
Second contextual information of semantic relation between preceding word sequence;
The domain prediction unit, is specifically used for, and is based on the described second information above, determines to be predicted after the current word
Word is belonging respectively to the probability in multiple and different fields;
First predicting unit determines specifically, being based on second contextual information for being directed to each field
Each word is belonging respectively to the first possibility of the word to be predicted in vocabulary.
12. word prediction meanss according to claim 11, which is characterized in that the domain prediction unit, comprising:
Domain prediction subelement, it is to be predicted after the current word for determining using field distributed model trained in advance
Word is belonging respectively to the probability in multiple and different fields, and the field distributed model is obtained based on multiple sentence sample trainings.
13. word prediction meanss according to claim 12, which is characterized in that the input acquiring unit, comprising:
Input obtain subelement, for obtains be used for prediction current word term vector, and in advance train language model most
The first contextual information closely once determined, the language model includes the field distributed model and the multiple difference
Field is corresponding to estimate function, wherein the field distributed model in the language model and the language model and
The function of estimating is obtained by the unified training of multiple sentence samples;
First predicting unit, comprising:
First prediction subelement, every field is corresponding to estimate function for second contextual information to be separately input to,
And obtain each estimate function output the first possibility distrabtion, first possibility distrabtion include in vocabulary each word belong to respectively
In the first possibility of the prediction word.
14. a kind of computer equipment characterized by comprising
Processor and memory;
Wherein, the processor is for executing the program stored in the memory;
For storing program, described program is at least used for the memory:
The first contextual information that word sequence before obtaining current word and the current word for prediction has;
Based on the current word and the first contextual information, it is multiple to determine that the word to be predicted after the current word is belonging respectively to
The probability of different field;
For each field, it is based on the current word and the first contextual information, determines that each word is belonging respectively to institute in vocabulary
The first possibility of word to be predicted is stated, first possibility is the institute in the case where the word to be predicted belongs to the field
A possibility that word in predicate table belongs to the word to be predicted;The vocabulary is the set comprising multiple words constructed in advance;
It is belonging respectively in the corresponding vocabulary of probability and each field in multiple and different fields respectively according to the word to be predicted
Word is belonging respectively to the first possibility of the word to be predicted, determines that each word is belonging respectively to the of the word to be predicted in the vocabulary
Two possibilities.
15. a kind of storage medium, which is characterized in that be stored with computer executable instructions, the calculating in the storage medium
When machine executable instruction is loaded and executed by processor, the described in any item word prediction techniques of claim 1 to 7 as above are realized.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910740458.3A CN110377916B (en) | 2018-08-17 | 2018-08-17 | Word prediction method, word prediction device, computer equipment and storage medium |
CN201810942238.4A CN109117480B (en) | 2018-08-17 | 2018-08-17 | Word prediction method, word prediction device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810942238.4A CN109117480B (en) | 2018-08-17 | 2018-08-17 | Word prediction method, word prediction device, computer equipment and storage medium |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910740458.3A Division CN110377916B (en) | 2018-08-17 | 2018-08-17 | Word prediction method, word prediction device, computer equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109117480A true CN109117480A (en) | 2019-01-01 |
CN109117480B CN109117480B (en) | 2022-05-27 |
Family
ID=64852831
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810942238.4A Active CN109117480B (en) | 2018-08-17 | 2018-08-17 | Word prediction method, word prediction device, computer equipment and storage medium |
CN201910740458.3A Active CN110377916B (en) | 2018-08-17 | 2018-08-17 | Word prediction method, word prediction device, computer equipment and storage medium |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910740458.3A Active CN110377916B (en) | 2018-08-17 | 2018-08-17 | Word prediction method, word prediction device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN109117480B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110032644A (en) * | 2019-04-03 | 2019-07-19 | 人立方智能科技有限公司 | Language model pre-training method |
CN110222578A (en) * | 2019-05-08 | 2019-09-10 | 腾讯科技(深圳)有限公司 | The method and apparatus of confrontation test picture talk system |
CN111444710A (en) * | 2019-01-15 | 2020-07-24 | 阿里巴巴集团控股有限公司 | Word segmentation method and word segmentation device |
CN111783431A (en) * | 2019-04-02 | 2020-10-16 | 北京地平线机器人技术研发有限公司 | Method and device for predicting word occurrence probability by using language model and training language model |
CN113051936A (en) * | 2021-03-16 | 2021-06-29 | 昆明理工大学 | Method for enhancing Hanyue neural machine translation based on low-frequency word representation |
WO2021127987A1 (en) * | 2019-12-24 | 2021-07-01 | 深圳市优必选科技股份有限公司 | Polyphonic character prediction method and disambiguation method, apparatuses, device and computer readable storage medium |
CN113221870A (en) * | 2021-05-28 | 2021-08-06 | 北京有竹居网络技术有限公司 | OCR (optical character recognition) method, device, storage medium and equipment for mobile terminal |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110765239B (en) * | 2019-10-29 | 2023-03-28 | 腾讯科技(深圳)有限公司 | Hot word recognition method, device and storage medium |
CN111680519B (en) * | 2020-04-28 | 2023-04-07 | 平安科技(深圳)有限公司 | Text translation method and device based on dimension reduction barrel model |
CN111639160A (en) * | 2020-05-29 | 2020-09-08 | 达闼机器人有限公司 | Domain identification method, interaction method, electronic device and storage medium |
CN114942986B (en) * | 2022-06-21 | 2024-03-19 | 平安科技(深圳)有限公司 | Text generation method, text generation device, computer equipment and computer readable storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103869999A (en) * | 2012-12-11 | 2014-06-18 | 百度国际科技(深圳)有限公司 | Method and device for sorting candidate items generated by input method |
WO2017082624A1 (en) * | 2015-11-09 | 2017-05-18 | 박태운 | Sentence prediction input system |
CN107621891A (en) * | 2017-09-28 | 2018-01-23 | 北京新美互通科技有限公司 | A kind of text entry method, device and electronic equipment |
CN108334496A (en) * | 2018-01-30 | 2018-07-27 | 中国科学院自动化研究所 | Human-computer dialogue understanding method and system and relevant device for specific area |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8136050B2 (en) * | 2003-11-21 | 2012-03-13 | Nuance Communications, Inc. | Electronic device and user interface and input method therefor |
CN101645066B (en) * | 2008-08-05 | 2011-08-24 | 北京大学 | Method for monitoring novel words on Internet |
CN103870001B (en) * | 2012-12-11 | 2018-07-10 | 百度国际科技(深圳)有限公司 | A kind of method and electronic device for generating candidates of input method |
CN103544246A (en) * | 2013-10-10 | 2014-01-29 | 清华大学 | Method and system for constructing multi-emotion dictionary for internet |
US9785630B2 (en) * | 2014-05-30 | 2017-10-10 | Apple Inc. | Text prediction using combined word N-gram and unigram language models |
CN105550173A (en) * | 2016-02-06 | 2016-05-04 | 北京京东尚科信息技术有限公司 | Text correction method and device |
US11061948B2 (en) * | 2016-09-22 | 2021-07-13 | Verizon Media Inc. | Method and system for next word prediction |
CN108304424B (en) * | 2017-03-30 | 2021-09-07 | 腾讯科技(深圳)有限公司 | Text keyword extraction method and text keyword extraction device |
CN107424612B (en) * | 2017-07-28 | 2021-07-06 | 北京搜狗科技发展有限公司 | Processing method, apparatus and machine-readable medium |
CN107506414B (en) * | 2017-08-11 | 2020-01-07 | 武汉大学 | Code recommendation method based on long-term and short-term memory network |
CN107908616B (en) * | 2017-10-18 | 2022-01-28 | 北京京东尚科信息技术有限公司 | Method and device for predicting trend words |
-
2018
- 2018-08-17 CN CN201810942238.4A patent/CN109117480B/en active Active
- 2018-08-17 CN CN201910740458.3A patent/CN110377916B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103869999A (en) * | 2012-12-11 | 2014-06-18 | 百度国际科技(深圳)有限公司 | Method and device for sorting candidate items generated by input method |
WO2017082624A1 (en) * | 2015-11-09 | 2017-05-18 | 박태운 | Sentence prediction input system |
CN107621891A (en) * | 2017-09-28 | 2018-01-23 | 北京新美互通科技有限公司 | A kind of text entry method, device and electronic equipment |
CN108334496A (en) * | 2018-01-30 | 2018-07-27 | 中国科学院自动化研究所 | Human-computer dialogue understanding method and system and relevant device for specific area |
Non-Patent Citations (4)
Title |
---|
YUETING ZHUANG ET AL.: "Probabilistic Word Selection via Topic Modeling", 《IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING》 * |
YUETING ZHUANG ET AL.: "Probabilistic Word Selection via Topic Modeling", 《IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING》, vol. 27, no. 6, 1 June 2015 (2015-06-01), pages 1643 - 1655, XP011579826, DOI: 10.1109/TKDE.2014.2377727 * |
张萍 等: "词汇联想研究: 前世、今生、未来", 《外语教学理论与实践》 * |
张萍 等: "词汇联想研究: 前世、今生、未来", 《外语教学理论与实践》, no. 4, 31 December 2017 (2017-12-31), pages 1 - 12 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111444710A (en) * | 2019-01-15 | 2020-07-24 | 阿里巴巴集团控股有限公司 | Word segmentation method and word segmentation device |
CN111444710B (en) * | 2019-01-15 | 2023-04-18 | 阿里巴巴集团控股有限公司 | Word segmentation method and word segmentation device |
CN111783431A (en) * | 2019-04-02 | 2020-10-16 | 北京地平线机器人技术研发有限公司 | Method and device for predicting word occurrence probability by using language model and training language model |
CN111783431B (en) * | 2019-04-02 | 2024-05-24 | 北京地平线机器人技术研发有限公司 | Method and device for training predicted word occurrence probability and language model by using language model |
CN110032644A (en) * | 2019-04-03 | 2019-07-19 | 人立方智能科技有限公司 | Language model pre-training method |
CN110222578A (en) * | 2019-05-08 | 2019-09-10 | 腾讯科技(深圳)有限公司 | The method and apparatus of confrontation test picture talk system |
CN110222578B (en) * | 2019-05-08 | 2022-12-27 | 腾讯科技(深圳)有限公司 | Method and apparatus for challenge testing of speak-with-picture system |
WO2021127987A1 (en) * | 2019-12-24 | 2021-07-01 | 深圳市优必选科技股份有限公司 | Polyphonic character prediction method and disambiguation method, apparatuses, device and computer readable storage medium |
CN113051936A (en) * | 2021-03-16 | 2021-06-29 | 昆明理工大学 | Method for enhancing Hanyue neural machine translation based on low-frequency word representation |
CN113221870A (en) * | 2021-05-28 | 2021-08-06 | 北京有竹居网络技术有限公司 | OCR (optical character recognition) method, device, storage medium and equipment for mobile terminal |
CN113221870B (en) * | 2021-05-28 | 2024-05-24 | 北京有竹居网络技术有限公司 | OCR (optical character recognition) method, device, storage medium and equipment for mobile terminal |
Also Published As
Publication number | Publication date |
---|---|
CN110377916B (en) | 2022-12-16 |
CN110377916A (en) | 2019-10-25 |
CN109117480B (en) | 2022-05-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109117480A (en) | Word prediction technique, device, computer equipment and storage medium | |
CN110209806B (en) | Text classification method, text classification device and computer readable storage medium | |
CN109948143B (en) | Answer extraction method of community question-answering system | |
CN110019732B (en) | Intelligent question answering method and related device | |
CN106855853A (en) | Entity relation extraction system based on deep neural network | |
CN108363790A (en) | For the method, apparatus, equipment and storage medium to being assessed | |
CN110287312A (en) | Calculation method, device, computer equipment and the computer storage medium of text similarity | |
CN108804677A (en) | In conjunction with the deep learning question classification method and system of multi-layer attention mechanism | |
CN109271493A (en) | A kind of language text processing method, device and storage medium | |
CN111382573A (en) | Method, apparatus, device and storage medium for answer quality assessment | |
CN108268439B (en) | Text emotion processing method and device | |
CN108108347B (en) | Dialogue mode analysis system and method | |
CN108304373A (en) | Construction method, device, storage medium and the electronic device of semantic dictionary | |
CN110019822B (en) | Few-sample relation classification method and system | |
CN106529525A (en) | Chinese and Japanese handwritten character recognition method | |
CN110287311A (en) | File classification method and device, storage medium, computer equipment | |
CN104778283A (en) | User occupation classification method and system based on microblog | |
CN109558591A (en) | Chinese event detection method and device | |
CN110852071B (en) | Knowledge point detection method, device, equipment and readable storage medium | |
CN116644148A (en) | Keyword recognition method and device, electronic equipment and storage medium | |
Balaji et al. | Text summarization using NLP technique | |
CN117763126A (en) | Knowledge retrieval method, device, storage medium and apparatus | |
CN113723077A (en) | Sentence vector generation method and device based on bidirectional characterization model and computer equipment | |
CN110287396A (en) | Text matching technique and device | |
CN110334353B (en) | Analysis method, device, equipment and storage medium for word sequence recognition performance |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |