Summary of the invention
This specification one or more embodiment describes a kind of training side for the matched neural network model of question and answer
Method and device can reduce resource consumption on the basis of accurately identifying user's question sentence, promote processing speed.
In a first aspect, providing a kind of training method for the matched neural network model of question and answer, method includes:
Obtain each user's question sentence and the corresponding tag along sort of each user's question sentence in sample set;
Using the first nerves network model trained, predict that each user's question sentence is obtained in each classificatory first probability
Point, wherein the number of plies of the first nerves network model is N;
Using nervus opticus network model to be trained, predict that each user's question sentence is obtained in each classificatory second probability
Point, wherein the number of plies of the nervus opticus network model is M, M < N;
According to second probability score and first probability score, first-loss function is obtained;
According to the tag along sort of second probability score and each user's question sentence, the second loss function is obtained;
The first-loss function and second loss function are combined, total losses function is obtained;
According to the total losses function, the nervus opticus network model is trained, the second of initial training is obtained
Neural network model.
In a kind of possible embodiment, the first nerves network model is trained in advance in the following manner:
Using each user's question sentence and the corresponding tag along sort of each user's question sentence as one group of training sample, to described
One neural network model is trained, and obtains the first nerves network model trained.
It is described according to second probability score and first probability score in a kind of possible embodiment, it obtains
To first-loss function, comprising:
By second probability score divided by predefined parameter after, by normalized, obtain the of each user's question sentence
One output valve;
According to the first probability score of the first output valve of each user's question sentence and each user's question sentence, first-loss is obtained
Function;First probability score is and to obtain after normalized divided by the predefined parameter.
In a kind of possible embodiment, the contingency table according to second probability score and each user's question sentence
Label, obtain the second loss function, comprising:
Second probability score is passed through into normalized, obtains the second output valve of each user's question sentence;
According to the tag along sort of the second output valve of each user's question sentence and each user's question sentence, the second loss letter is obtained
Number.
It is described that the first-loss function and second loss function are subjected to group in a kind of possible embodiment
It closes, obtains total losses function, comprising:
By the first-loss function multiplied by the first weight, by second loss function multiplied by the second weight, to the two
Summation, obtains total losses function, wherein the first weight is greater than the second weight.
In a kind of possible embodiment, after the nervus opticus network model for obtaining initial training, the side
Method further include:
Using each user's question sentence and the corresponding tag along sort of each user's question sentence as one group of training sample, to preliminary instruction
Experienced nervus opticus network model continues to train, and obtains continuing the nervus opticus network model after training.
Further, the method also includes:
Continue the nervus opticus network model after training using described, predicts classification belonging to active user's question sentence.
In a kind of possible embodiment, the nervus opticus network model to be trained is by above and below pre-training
Literary omnidirectional's prediction model, the pre-training task of the nervus opticus network model include that cloze test and upper and lower sentence judge that two are appointed
Business.
In a kind of possible embodiment, the number of plies of the nervus opticus network model is 2.
Second aspect, provides a kind of training device for the matched neural network model of question and answer, and device includes:
Acquiring unit, for obtaining each user's question sentence and the corresponding contingency table of each user's question sentence in sample set
Label;
First predicting unit, for predicting each user's question sentence each using the first nerves network model trained
Classificatory first probability score, wherein the number of plies of the first nerves network model is N;
Second predicting unit predicts each user's question sentence each for utilizing nervus opticus network model to be trained
Classificatory second probability score, wherein the number of plies of the nervus opticus network model is M, M < N;
First comparing unit, the second probability score and first prediction for being predicted according to second predicting unit
First probability score of unit prediction, obtains first-loss function;
Second comparing unit, the second probability score and the acquiring unit for being predicted according to second predicting unit
The tag along sort of each user's question sentence obtained, obtains the second loss function;
Assembled unit, the first-loss function for obtaining first comparing unit are obtained with second comparing unit
To the second loss function be combined, obtain total losses function;
First training unit, the total losses function for being obtained according to the assembled unit, to the nervus opticus network
Model is trained, and obtains the nervus opticus network model of initial training.
The third aspect provides a kind of computer readable storage medium, is stored thereon with computer program, when the calculating
When machine program executes in a computer, enable computer execute first aspect method.
Fourth aspect provides a kind of calculating equipment, including memory and processor, and being stored in the memory can hold
Line code, when the processor executes the executable code, the method for realizing first aspect.
The method and apparatus provided by this specification embodiment, not with the mode of common trained question and answer Matching Model
Together, when being trained to nervus opticus network model, the prediction result for the first nerves network model trained is utilized,
In, first nerves network model is for nervus opticus network model, and structure is complicated, by introducing first nerves network mould
The prediction result of type induces the training of nervus opticus network model, knowledge migration is realized, so that nervus opticus network model
Resource consumption can be reduced on the basis of accurately identifying user's question sentence, promote processing speed, that is to say, that pass through this instruction
The mode for practicing question and answer Matching Model saves a large amount of calculation resources and modelling effect and difference substantially not big before.
Specific embodiment
With reference to the accompanying drawing, the scheme provided this specification is described.
Fig. 1 is the implement scene schematic diagram of one embodiment that this specification discloses.The implement scene is related to for question and answer
The training of matched neural network model, the neural network model are alternatively referred to as question and answer Matching Model.For a long time, question and answer match
It is conflict between the accuracy and processing speed that model identifies user's question sentence.If using the more large-sized model of the number of plies
(Big Model (s)), then the accuracy of user's question sentence identification is higher, but processing speed is slow;If using the less small mould of the number of plies
Type (Smal Model), then processing speed is fast, but the accuracy of user's question sentence identification is low.And for question and answer Matching Model,
Since it is usually applied to accuracy and place of the robot customer service to the Real-time Answer of user's question sentence, to the identification of user's question sentence
Reason speed has higher requirement.This specification embodiment proposes solution for this contradiction, the think of that knowledge is distilled
Want to be introduced into the training process to question and answer Matching Model, to may be implemented to identify user's question sentence using the mini Mod after training
Accuracy and processing speed can meet demand.
Knowledge distillation is used as total losses function by introducing soft object (soft target) relevant to teacher's network
The a part of (total loss) realizes knowledge migration with the training of inducing student network.Wherein, teacher's network is complicated but pushes away
Manage superior performance;Student network simplifies, low complex degree.
As shown in Figure 1, teacher's network (i.e. large-sized model) prediction output divided by parameter preset T (divided by T) it
Afterwards, normalized (such as softmax transformation) is done again, the probability distribution (i.e. soft object) of softening can be obtained, for example, si
[0.1,0.6 ..., 0.1], between 0~1, value distribution more mitigates parameter preset T numerical value.Parameter preset T numerical value is got over
Greatly, distribution more mitigates;And parameter preset T numerical value is too small, may amplify the probability of mistake classification, introduces unnecessary noise.
Hard goal (hard target) is then the true mark of sample, can use one-hot vector representation, such as yi[0,1 ..., 0].
Total losses function (total loss) is designed as the weighted average of cross entropy corresponding to soft object and hard goal, wherein soft object
The weighting coefficient λ of cross entropy is bigger, shows that migration induction more relies on the contribution of teacher's network, this is to have very much to training initial stage
It is necessary, help that student network is allowed more easily to identify simple sample, but the training later period needs the appropriate ratio for reducing soft object
Weight allows true mark to help to identify difficult sample.In addition, the reasoning performance of teacher's network is usually better than student network, and mould
Type capacity then without concrete restriction, and teacher's network reasoning precision is higher, is more conducive to the study of student network.
This specification embodiment, by knowledge migration, to obtain being more suitable reasoning by trained large-sized model
Mini Mod.It can carry out question and answer to user's question sentence using trained mini Mod to have matched, that is to say, that prediction
(prediction) classification of user's question sentence.It is understood that the input of model can be the vector of user's question sentence
(vector)。
Fig. 2 shows the training method flow charts for the matched neural network model of question and answer according to one embodiment, should
Method can be based on application scenarios shown in FIG. 1.As shown in Fig. 2, being used for the matched neural network model of question and answer in the embodiment
Training method the following steps are included: step 21, obtain each user's question sentence and each user's question sentence pair in sample set
The tag along sort answered;Step 22, the first nerves network model that utilization has been trained predicts each user's question sentence in each classification
The first probability score, wherein the number of plies of the first nerves network model be N;Step 23, nervus opticus to be trained is utilized
Network model predicts each user's question sentence in each classificatory second probability score, wherein the nervus opticus network model
The number of plies be M, M < N;Step 24, according to second probability score and first probability score, first-loss function is obtained;
Step 25, according to the tag along sort of second probability score and each user's question sentence, the second loss function is obtained;Step 26,
The first-loss function and second loss function are combined, total losses function is obtained;Step 27, according to described total
Loss function is trained the nervus opticus network model, obtains the nervus opticus network model of initial training.It retouches below
State the specific executive mode of above each step.
First in step 21, each user's question sentence and the corresponding contingency table of each user's question sentence in sample set are obtained
Label.It is understood that the tag along sort can be understood as the hard goal in application scenarios shown in Fig. 1, when there are multiple classification
When, the corresponding tag along sort of each user's question sentence uniquely determines.For example, the corresponding tag along sort of each user's question sentence can be with
As shown in Table 1.
Table one: the mapping table of user's question sentence and tag along sort
User's question sentence |
Tag along sort |
User's question sentence 1 |
Classification 1 |
User's question sentence 2 |
Classification 1 |
User's question sentence 3 |
Classification 2 |
User's question sentence 4 |
Classification 3 |
Referring to table one, user's question sentence 1 and the corresponding tag along sort of user's question sentence 2 are classification 1, that is to say, that different use
Family question sentence can correspond to same tag along sort, but the corresponding tag along sort of user's question sentence is unique.
Then in step 22, the first nerves network model that utilization has been trained predicts each user's question sentence in each classification
On the first probability score, wherein the number of plies of the first nerves network model be N.It is understood that the first nerves net
Network model can be understood as the large-sized model in application scenarios shown in Fig. 1, which can be understood as application shown in Fig. 1
Soft object in scene.
In one example, the first nerves network model is trained in advance in the following manner:
Using each user's question sentence and the corresponding tag along sort of each user's question sentence as one group of training sample, to described
One neural network model is trained, and obtains the first nerves network model trained.
In one example, first nerves network model is characterized using the alternating binary coding device completely based on converter
(bidirectional encoder representations from transformers, bert) model, to user's question sentence
Classify, and exports the knowledge point of user's question matching.
Then predict each user's question sentence in each classification using nervus opticus network model to be trained in step 23
On the second probability score, wherein the number of plies of the nervus opticus network model be M, M < N.It is understood that second mind
It can be understood as the mini Mod in application scenarios shown in Fig. 1 through network model, which can be understood as wait train
Nervus opticus network model prediction result, since nervus opticus network model is also without training, the second probability is obtained
Split-phase is not accurate enough for the first probability score.
In one example, the nervus opticus network model to be trained is the context omnidirectional prediction by pre-training
Model, such as bert model, the pre-training task of the nervus opticus network model include that cloze test and upper and lower sentence judge two
A task.
In one example, the number of plies of the nervus opticus network model is 2, such as 2 layers of bert model, for meter
The consumption for calculating resource, is approximately 1/6th of complete bert model.
First-loss function is obtained according to second probability score and first probability score in step 24 again.It can
With understanding, above-mentioned first-loss function be can be, but not limited to using cross entropy loss function (cross entropy
loss)。
Application scenarios shown in Figure 1, in one example, by second probability score divided by predefined parameter after,
By normalized, the first output valve of each user's question sentence is obtained;According to the first output valve of each user's question sentence and respectively
First probability score of a user's question sentence, obtains first-loss function;First probability score is the first nerves network
Model presets the output of level divided by the predefined parameter, and obtained after normalized.
Second loss is obtained according to the tag along sort of second probability score and each user's question sentence in step 25 again
Function.It is understood that above-mentioned second loss function can be, but not limited to using cross entropy loss function.
Second probability score is passed through normalized in one example by application scenarios shown in Figure 1,
Obtain the second output valve of each user's question sentence;According to the classification of the second output valve of each user's question sentence and each user's question sentence
Label obtains the second loss function.
Again in step 26, the first-loss function and second loss function are combined, total losses letter is obtained
Number.It is understood that combined mode can be, but not limited to by the way of weighted sum.
In one example, by the first-loss function multiplied by the first weight, by second loss function multiplied by
Two weights sum to the two, obtain total losses function, wherein the first weight is greater than the second weight.
Finally the nervus opticus network model is trained according to the total losses function in step 27, is obtained just
Walk the nervus opticus network model of training.It is understood that can be by minimizing loss function solution and assessment models.
In one example, after step 27, by each user's question sentence and the corresponding contingency table of each user's question sentence
Label are used as one group of training sample, continue to train to the nervus opticus network model of initial training, obtain after continuing training
Nervus opticus network model.
It is understood that total losses function is designed as the weighted average of cross entropy corresponding to soft object and hard goal,
Wherein the weighting coefficient of soft object cross entropy is bigger, shows that migration induction more relies on the contribution of teacher's network, this is to training initial stage
Stage be it is necessary, help to allow student network more easily to identify simple sample, but the training later period needs appropriate reduce
The specific gravity of soft object allows tag along sort help to identify difficult sample.
Further, continue the nervus opticus network model after training using described, predict belonging to active user's question sentence
Classification.
The method provided by this specification embodiment, it is different from the mode of common trained question and answer Matching Model, right
When nervus opticus network model is trained, the prediction result for the first nerves network model trained is utilized, wherein first
Neural network model is for nervus opticus network model, and structure is complicated, by introducing the pre- of first nerves network model
The training as a result, induction nervus opticus network model is surveyed, realizes knowledge migration, so that nervus opticus network model can be
On the basis of accurately identifying user's question sentence, resource consumption is reduced, promotes processing speed, that is to say, that pass through this trained question and answer
The mode of Matching Model saves a large amount of calculation resources and modelling effect and difference substantially not big before.
According to the embodiment of another aspect, a kind of training device for the matched neural network model of question and answer is also provided,
The device is used to execute the training method for the matched neural network model of question and answer of this specification embodiment offer.Fig. 3 shows
Out according to the schematic block diagram of the training device for the matched neural network model of question and answer of one embodiment.As shown in figure 3,
The device 300 includes:
Acquiring unit 31, for obtaining each user's question sentence and the corresponding classification of each user's question sentence in sample set
Label;
First predicting unit 32, for predicting each user's question sentence each using the first nerves network model trained
A classificatory first probability score, wherein the number of plies of the first nerves network model is N;
Second predicting unit 33 predicts each user's question sentence each for utilizing nervus opticus network model to be trained
A classificatory second probability score, wherein the number of plies of the nervus opticus network model is M, M < N;
First comparing unit 34, the second probability score and described first for being predicted according to second predicting unit 33
The first probability score that predicting unit 32 is predicted, obtains first-loss function;
Second comparing unit 35, the second probability score and the acquisition for being predicted according to second predicting unit 33
The tag along sort for each user's question sentence that unit 31 obtains, obtains the second loss function;
Assembled unit 36, the first-loss function for obtaining first comparing unit 34 are single compared with described second
The second loss function that member 35 obtains is combined, and obtains total losses function;
First training unit 37, the total losses function for being obtained according to the assembled unit 36, to the nervus opticus
Network model is trained, and obtains the nervus opticus network model of initial training.
Optionally, as one embodiment, the first nerves network model is trained in advance in the following manner:
Using each user's question sentence and the corresponding tag along sort of each user's question sentence as one group of training sample, to described
One neural network model is trained, and obtains the first nerves network model trained.
Optionally, as one embodiment, first comparing unit 34 is specifically used for:
By second probability score divided by predefined parameter after, by normalized, obtain the of each user's question sentence
One output valve;
According to the first probability score of the first output valve of each user's question sentence and each user's question sentence, first-loss is obtained
Function;First probability score is and to obtain after normalized divided by the predefined parameter.
Optionally, as one embodiment, second comparing unit 35 is specifically used for:
Second probability score is passed through into normalized, obtains the second output valve of each user's question sentence;
According to the tag along sort of the second output valve of each user's question sentence and each user's question sentence, the second loss letter is obtained
Number.
Optionally, as one embodiment, the assembled unit 36, specifically for obtaining first comparing unit 34
First-loss function multiplied by the first weight, the second loss function that second comparing unit 35 is obtained multiplied by second power
Weight sums to the two, obtains total losses function, wherein the first weight is greater than the second weight.
Optionally, as one embodiment, described device further include:
Second training unit, for obtained in first training unit initial training nervus opticus network model it
Afterwards, each user's question sentence and the corresponding tag along sort of each user's question sentence acquiring unit 31 obtained is as one group of instruction
Practice sample, the nervus opticus network model for the initial training that first training unit obtains is continued to train, obtain after
Nervus opticus network model after continuous training.
Further, described device further include:
Predicting unit, the nervus opticus network model after continuing to train for what is obtained using second training unit,
Predict classification belonging to active user's question sentence.
Optionally, as one embodiment, the nervus opticus network model to be trained is by above and below pre-training
Literary omnidirectional's prediction model, the pre-training task of the nervus opticus network model include that cloze test and upper and lower sentence judge that two are appointed
Business.
Optionally, as one embodiment, the number of plies of the nervus opticus network model is 2.
The device provided by this specification embodiment, it is different from the mode of common trained question and answer Matching Model, right
When nervus opticus network model is trained, the prediction result for the first nerves network model trained is utilized, wherein first
Neural network model is for nervus opticus network model, and structure is complicated, by introducing the pre- of first nerves network model
The training as a result, induction nervus opticus network model is surveyed, realizes knowledge migration, so that nervus opticus network model can be
On the basis of accurately identifying user's question sentence, resource consumption is reduced, promotes processing speed, that is to say, that pass through this trained question and answer
The mode of Matching Model saves a large amount of calculation resources and modelling effect and difference substantially not big before.
According to the embodiment of another aspect, a kind of computer readable storage medium is also provided, is stored thereon with computer journey
Sequence enables computer execute method described in conjunction with Figure 2 when the computer program executes in a computer.
According to the embodiment of another further aspect, a kind of calculating equipment, including memory and processor, the memory are also provided
In be stored with executable code, when the processor executes the executable code, realize method described in conjunction with Figure 2.
Those skilled in the art are it will be appreciated that in said one or multiple examples, function described in the invention
It can be realized with hardware, software, firmware or their any combination.It when implemented in software, can be by these functions
Storage in computer-readable medium or as on computer-readable medium one or more instructions or code transmitted.
Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects
It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention
Protection scope, all any modification, equivalent substitution, improvement and etc. on the basis of technical solution of the present invention, done should all
Including within protection scope of the present invention.