CN110427466B - Training method and device for neural network model for question-answer matching - Google Patents

Training method and device for neural network model for question-answer matching Download PDF

Info

Publication number
CN110427466B
CN110427466B CN201910507153.8A CN201910507153A CN110427466B CN 110427466 B CN110427466 B CN 110427466B CN 201910507153 A CN201910507153 A CN 201910507153A CN 110427466 B CN110427466 B CN 110427466B
Authority
CN
China
Prior art keywords
neural network
network model
loss function
trained
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910507153.8A
Other languages
Chinese (zh)
Other versions
CN110427466A (en
Inventor
马良庄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Advanced New Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Advanced New Technologies Co Ltd filed Critical Advanced New Technologies Co Ltd
Priority to CN201910507153.8A priority Critical patent/CN110427466B/en
Publication of CN110427466A publication Critical patent/CN110427466A/en
Application granted granted Critical
Publication of CN110427466B publication Critical patent/CN110427466B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The embodiment of the specification provides a training method and device for a neural network model for question-answer matching, wherein the method comprises the following steps: acquiring classification labels corresponding to all user questions in a sample set; predicting a first probability score of each user question on each category by using the trained first neural network model; predicting a second probability score of each user question on each category by using a second neural network model to be trained, wherein the number of layers of the second neural network model is smaller than that of the first neural network model; obtaining a first loss function according to the second probability score and the first probability score; obtaining a second loss function according to the second probability score and the classification labels of the questions of each user; combining the first loss function and the second loss function into a total loss function; according to the total loss function, the second neural network model is trained, so that the resource consumption can be reduced and the processing speed can be improved on the basis of accurately identifying the question of the user.

Description

Training method and device for neural network model for question-answer matching
Technical Field
One or more embodiments of the present specification relate to the field of computers, and more particularly, to a training method and apparatus for a neural network model for question-answer matching.
Background
Natural language processing (natural language processing, NLP) is a science that has studied various theories and methods that enable efficient communication between humans and computers in natural language. In NLP, a typical application is question-answer matching for user questions to enable customer service robots to answer user questions based on the results of the question-answer matching.
In a customer service robot system, for the purpose of accurately identifying a user question, in general, a neural network model for question-answer matching has a complex structure, consumes very much computing resources, and has a slow processing speed, so that a service timeout condition occurs.
Therefore, an improved scheme is hoped to be provided, the resource consumption can be reduced on the basis of accurately identifying the question of the user, and the processing speed is improved.
Disclosure of Invention
One or more embodiments of the present disclosure describe a training method and apparatus for a neural network model for question-answer matching, which can reduce resource consumption and increase processing speed on the basis of accurately identifying a user question.
In a first aspect, a training method for a neural network model for question-answer matching is provided, the method comprising:
acquiring classification labels corresponding to all user questions in a sample set;
predicting a first probability score of each user question on each category by using a trained first neural network model, wherein the number of layers of the first neural network model is N;
predicting second probability scores of all user questions on all classifications by using a second neural network model to be trained, wherein the number of layers of the second neural network model is M, and M < N;
obtaining a first loss function according to the second probability score and the first probability score;
obtaining a second loss function according to the second probability score and the classification labels of the questions of each user;
combining the first loss function and the second loss function to obtain a total loss function;
and training the second neural network model according to the total loss function to obtain a primarily trained second neural network model.
In one possible implementation, the first neural network model is pre-trained by:
and training the first neural network model by taking each user question and the classification label corresponding to each user question as a group of training samples to obtain the trained first neural network model.
In a possible implementation manner, the obtaining a first loss function according to the second probability score and the first probability score includes:
dividing the second probability score by a preset parameter, and carrying out normalization processing to obtain a first output value of each user question;
obtaining a first loss function according to the first output value of each user question and the first probability score of each user question; the first probability score is obtained by dividing the first probability score by the preset parameter and carrying out normalization processing.
In a possible implementation manner, the obtaining a second loss function according to the second probability score and the classification labels of the questions of the respective users includes:
normalizing the second probability score to obtain a second output value of each user question;
and obtaining a second loss function according to the second output value of each user question and the classification label of each user question.
In a possible implementation manner, the combining the first loss function and the second loss function to obtain a total loss function includes:
multiplying the first loss function by a first weight, multiplying the second loss function by a second weight, and summing the two to obtain a total loss function, wherein the first weight is greater than the second weight.
In one possible embodiment, after the obtaining the primarily trained second neural network model, the method further includes:
taking each user question and the classification label corresponding to each user question as a group of training samples, and continuing training the primarily trained second neural network model to obtain a continuously trained second neural network model.
Further, the method further comprises:
and predicting the category to which the current user question belongs by using the second neural network model after continuing training.
In one possible implementation manner, the second neural network model to be trained is a pre-trained contextual omnidirectional prediction model, and the pre-training task of the second neural network model comprises two tasks of shape filling and context judgment.
In one possible implementation, the number of layers of the second neural network model is 2.
In a second aspect, there is provided a training apparatus for a neural network model for question-answer matching, the apparatus comprising:
the acquisition unit is used for acquiring each user question in the sample set and the classification label corresponding to each user question;
the first prediction unit is used for predicting first probability scores of all user questions on all classifications by using a trained first neural network model, wherein the number of layers of the first neural network model is N;
the second prediction unit is used for predicting second probability scores of all user questions on all classifications by using a second neural network model to be trained, wherein the number of layers of the second neural network model is M, and M < N;
the first comparison unit is used for obtaining a first loss function according to the second probability score predicted by the second prediction unit and the first probability score predicted by the first prediction unit;
the second comparison unit is used for obtaining a second loss function according to the second probability score predicted by the second prediction unit and the classification labels of the user questions acquired by the acquisition unit;
the combining unit is used for combining the first loss function obtained by the first comparing unit with the second loss function obtained by the second comparing unit to obtain a total loss function;
and the first training unit is used for training the second neural network model according to the total loss function obtained by the combining unit to obtain a primarily trained second neural network model.
In a third aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first aspect.
In a fourth aspect, there is provided a computing device comprising a memory having executable code stored therein and a processor which, when executing the executable code, implements the method of the first aspect.
According to the method and the device provided by the embodiment of the specification, different from a common way of training the question-answer matching model, when the second neural network model is trained, the prediction result of the trained first neural network model is utilized, wherein the first neural network model has a complex structure relative to the second neural network model, and the training of the second neural network model is induced by introducing the prediction result of the first neural network model, so that knowledge migration is realized, and the second neural network model can reduce resource consumption and improve processing speed on the basis of accurately identifying a user question, that is, by the way of training the question-answer matching model, a large amount of operation resources are saved, and the model effect is basically not different from that before.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic illustration of an implementation scenario of an embodiment disclosed herein;
FIG. 2 illustrates a flowchart of a training method for a neural network model for question-answer matching, according to one embodiment;
FIG. 3 illustrates a schematic block diagram of a training apparatus for a neural network model for question-answer matching, according to one embodiment.
Detailed Description
The following describes the scheme provided in the present specification with reference to the drawings.
Fig. 1 is a schematic diagram of an implementation scenario of an embodiment disclosed in the present specification. This implementation scenario involves the training of a neural network model for question-answer matching, which may also be referred to as a question-answer matching model. For a long time, the accuracy of question and answer matching models for identifying user questions and the processing speed are a pair of contradictions. If a Big Model (s)) with more layers is used, the accuracy of user question recognition is higher, but the processing speed is low; if a small Model (Smal Model) with a small number of layers is used, the processing speed is high, but the accuracy of user question recognition is low. For the question-answer matching model, the method is generally applied to real-time solution of a user question by a robot customer service, so that the accuracy and the processing speed of the user question identification are both high. According to the embodiment of the specification, a solution is provided for the contradiction, and the idea of knowledge distillation is introduced into the training process of the question-answer matching model, so that the accuracy and the processing speed of identifying the question sentences of the user can meet the requirements by using the trained small model.
Knowledge distillation achieves knowledge migration by introducing a soft target (soft target) associated with the teacher network as part of the total loss function (total loss) to induce training of the student network. Wherein, the teacher network is complex, but the reasoning performance is superior; the student network is simple and low in complexity.
As shown in FIG. 1, the predicted output of the teacher network (i.e., large model) divided by the preset parameters T (divided by T) and then normalized (e.g., softmax transformed) may yield a softened probability distribution (i.e., soft target), e.g., s i [0.1,0.6,…,0.1]The value of the preset parameter T is between 0 and 1, and the value distribution is mild. The larger the value of the preset parameter T is, the more gentle the distribution is; however, if the value of the preset parameter T is too small, the probability of erroneous classification may be amplified, and unnecessary noise may be introduced. The hard target is then the true annotation of the sample, which can be represented by a one-hot vector, e.g., y i [0,1,…,0]. The total loss function (total loss) is designed as a weighted average of the cross entropy corresponding to the soft target and the hard target, wherein the larger the weighted coefficient lambda of the cross entropy of the soft target is, the more dependent the migration induction isThe contribution of the teacher's network is necessary for the early stage of training, helping the student's network to identify simple samples more easily, but the later stage of training needs to reduce the specific gravity of soft targets properly, letting true labels help identify difficult samples. In addition, the reasoning performance of the teacher network is generally superior to that of the student network, the model capacity is not particularly limited, and the higher the reasoning precision of the teacher network is, the more favorable the learning of the student network is.
According to the embodiment of the specification, through knowledge migration, a small model which is more suitable for reasoning is obtained through the trained large model. The trained small model can be used for carrying out question-answer matching on the user question, namely predicting (predicting) the category of the user question. It is understood that the input of the model may be a vector of user questions.
Fig. 2 illustrates a flowchart of a training method for a neural network model for question-answer matching, which may be based on the application scenario illustrated in fig. 1, according to one embodiment. As shown in fig. 2, the training method for the neural network model for question-answer matching in this embodiment includes the steps of: step 21, acquiring each user question in a sample set and a classification label corresponding to each user question; step 22, predicting a first probability score of each user question on each category by using a trained first neural network model, wherein the number of layers of the first neural network model is N; step 23, predicting second probability scores of all user questions on all classifications by using a second neural network model to be trained, wherein the number of layers of the second neural network model is M, and M < N; step 24, obtaining a first loss function according to the second probability score and the first probability score; step 25, obtaining a second loss function according to the second probability score and the classification labels of the questions of each user; step 26, combining the first loss function and the second loss function to obtain a total loss function; and step 27, training the second neural network model according to the total loss function to obtain a primarily trained second neural network model. Specific implementations of the above steps are described below.
First, in step 21, each user question in the sample set and a classification label corresponding to each user question are obtained. It can be understood that the classification tag can be understood as a hard object in the application scenario shown in fig. 1, and when there are multiple classifications, the classification tag corresponding to each user question is uniquely determined. For example, the classification labels corresponding to the respective user questions may be as shown in table one.
Table one: correspondence table between user question and classification label
Question of user Classification label
User question 1 Classification 1
User question 2 Classification 1
User question 3 Classification 2
User question 4 Classification 3
Referring to table one, the classification tags corresponding to the user question 1 and the user question 2 are classified as class 1, that is, different user questions may correspond to the same classification tag, but the classification tag corresponding to one user question is unique.
Next, at step 22, a first probability score for each user question over each category is predicted using a trained first neural network model, where the number of layers of the first neural network model is N. It is understood that the first neural network model may be understood as a large model in the application scenario shown in fig. 1, and the first probability score may be understood as a soft target in the application scenario shown in fig. 1.
In one example, the first neural network model is pre-trained by:
and training the first neural network model by taking each user question and the classification label corresponding to each user question as a group of training samples to obtain the trained first neural network model.
In one example, the first neural network model classifies user questions using a complete converter-based bi-directional encoder characterization (bidirectional encoder representations from transformers, bert) model and outputs knowledge points of user question matches.
Then, in step 23, a second probability score of each user question on each category is predicted by using a second neural network model to be trained, wherein the number of layers of the second neural network model is M, and M < N. It will be appreciated that the second neural network model may be understood as a small model in the application scenario shown in fig. 1, and the second probability score may be understood as a predicted result of the second neural network model to be trained, and the second probability score may not be accurate enough with respect to the first probability score since the second neural network model has not been trained yet.
In one example, the second neural network model to be trained is a pre-trained contextual omnidirectional prediction model, such as a bert model, and the pre-training tasks of the second neural network model include two tasks of shape filling and sentence judging.
In one example, the number of layers of the second neural network model is 2, e.g., a 2-layer bert model, which is about one sixth of the complete bert model for consumption of computing resources.
And in step 24, a first loss function is obtained according to the second probability score and the first probability score. It is to be appreciated that the first loss function described above may be, but is not limited to, employing a cross entropy loss function (cross entropy loss).
Referring to the application scenario shown in fig. 1, in an example, after dividing the second probability score by a predetermined parameter, performing normalization processing to obtain a first output value of each user question; obtaining a first loss function according to the first output value of each user question and the first probability score of each user question; the first probability score is obtained by dividing the output of the preset level of the first neural network model by the preset parameter and performing normalization processing.
And in step 25, obtaining a second loss function according to the second probability score and the classification labels of the questions of each user. It is understood that the second loss function described above may be, but is not limited to, the use of a cross entropy loss function.
Referring to the application scenario shown in fig. 1, in an example, the second probability score is normalized to obtain a second output value of each user question; and obtaining a second loss function according to the second output value of each user question and the classification label of each user question.
And then, in step 26, the first loss function and the second loss function are combined to obtain a total loss function. It is understood that the manner of combining may be, but is not limited to, a weighted summation manner.
In one example, the first loss function is multiplied by a first weight, the second loss function is multiplied by a second weight, and the two are summed to obtain a total loss function, wherein the first weight is greater than the second weight.
Finally, in step 27, the second neural network model is trained according to the total loss function, so as to obtain a primarily trained second neural network model. It will be appreciated that the model can be solved and evaluated by minimizing the loss function.
In one example, after step 27, training is continued on the preliminarily trained second neural network model using each user question and the classification label corresponding to each user question as a set of training samples, to obtain a second neural network model after continuing training.
It will be appreciated that the total loss function is designed as a weighted average of the cross entropy of soft and hard targets, where a greater weighting coefficient of the cross entropy of soft targets indicates that migration induction is more dependent on the teacher's network contribution, which is necessary for early training, helping to allow the student's network to more easily identify simple samples, but later training requires a proper reduction in the specific gravity of soft targets, allowing the classification labels to help identify difficult samples.
Further, predicting the category to which the current user question belongs by using the second neural network model after continuing training.
According to the method provided by the embodiment of the specification, different from a common way of training a question-answer matching model, when the second neural network model is trained, the prediction result of the trained first neural network model is utilized, wherein the first neural network model is complex in structure relative to the second neural network model, and the second neural network model is induced to be trained by introducing the prediction result of the first neural network model, so that knowledge migration is realized, and the second neural network model can reduce resource consumption and improve processing speed on the basis of accurately identifying a user question, namely, by the way of training the question-answer matching model, a large amount of operation resources are saved, and the model effect is basically not different from that before.
According to an embodiment of another aspect, there is further provided a training apparatus for a neural network model for question-answer matching, which is used for executing the training method for the neural network model for question-answer matching provided in the embodiments of the present specification. FIG. 3 illustrates a schematic block diagram of a training apparatus for a neural network model for question-answer matching, according to one embodiment. As shown in fig. 3, the apparatus 300 includes:
an obtaining unit 31, configured to obtain each user question in the sample set and a classification label corresponding to each user question;
a first prediction unit 32, configured to predict a first probability score of each user question on each category by using a trained first neural network model, where the number of layers of the first neural network model is N;
a second prediction unit 33, configured to predict a second probability score of each user question on each category by using a second neural network model to be trained, where the number of layers of the second neural network model is M, M < N;
a first comparing unit 34, configured to obtain a first loss function according to the second probability score predicted by the second predicting unit 33 and the first probability score predicted by the first predicting unit 32;
a second comparing unit 35, configured to obtain a second loss function according to the second probability score predicted by the second predicting unit 33 and the classification labels of the question sentences of each user acquired by the acquiring unit 31;
a combining unit 36, configured to combine the first loss function obtained by the first comparing unit 34 with the second loss function obtained by the second comparing unit 35 to obtain a total loss function;
a first training unit 37, configured to train the second neural network model according to the total loss function obtained by the combining unit 36, to obtain a primarily trained second neural network model.
Optionally, as an embodiment, the first neural network model is pre-trained by:
and training the first neural network model by taking each user question and the classification label corresponding to each user question as a group of training samples to obtain the trained first neural network model.
Optionally, as an embodiment, the first comparing unit 34 is specifically configured to:
dividing the second probability score by a preset parameter, and carrying out normalization processing to obtain a first output value of each user question;
obtaining a first loss function according to the first output value of each user question and the first probability score of each user question; the first probability score is obtained by dividing the first probability score by the preset parameter and carrying out normalization processing.
Optionally, as an embodiment, the second comparing unit 35 is specifically configured to:
normalizing the second probability score to obtain a second output value of each user question;
and obtaining a second loss function according to the second output value of each user question and the classification label of each user question.
Optionally, as an embodiment, the combining unit 36 is specifically configured to multiply the first loss function obtained by the first comparing unit 34 by a first weight, multiply the second loss function obtained by the second comparing unit 35 by a second weight, and sum the two to obtain a total loss function, where the first weight is greater than the second weight.
Optionally, as an embodiment, the apparatus further includes:
the second training unit is configured to, after the first training unit obtains the preliminarily trained second neural network model, use each user question and the classification label corresponding to each user question obtained by the obtaining unit 31 as a set of training samples, and continuously train the preliminarily trained second neural network model obtained by the first training unit to obtain a continuously trained second neural network model.
Further, the apparatus further comprises:
and the predicting unit is used for predicting the category to which the current user question belongs by using the second neural network model which is obtained by the second training unit and is subjected to continuous training.
Optionally, as an embodiment, the second neural network model to be trained is a pre-trained contextual omnidirectional prediction model, and the pre-training task of the second neural network model includes two tasks of shape filling and context judgment.
Alternatively, as an embodiment, the number of layers of the second neural network model is 2.
According to the device provided by the embodiment of the specification, different from a common way of training a question-answer matching model, when the second neural network model is trained, the prediction result of the trained first neural network model is utilized, wherein the first neural network model has a complex structure relative to the second neural network model, and the training of the second neural network model is induced by introducing the prediction result of the first neural network model, so that knowledge migration is realized, and the second neural network model can reduce resource consumption and improve processing speed on the basis of accurately identifying a user question, namely, by the way of training the question-answer matching model, a large amount of operation resources are saved, and the model effect is basically not different from that before.
According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 2.
According to an embodiment of yet another aspect, there is also provided a computing device including a memory having executable code stored therein and a processor that, when executing the executable code, implements the method described in connection with fig. 2.
Those skilled in the art will appreciate that in one or more of the examples described above, the functions described in the present invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, these functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The foregoing embodiments have been provided for the purpose of illustrating the general principles of the present invention in further detail, and are not to be construed as limiting the scope of the invention, but are merely intended to cover any modifications, equivalents, improvements, etc. based on the teachings of the invention.

Claims (18)

1. A training method for a neural network model for question-answer matching, the method comprising:
acquiring classification labels corresponding to all user questions in a sample set;
predicting a first probability score of each user question on each category by using a trained first neural network model, wherein the number of layers of the first neural network model is N;
predicting second probability scores of all user questions on all classifications by using a second neural network model to be trained, wherein the number of layers of the second neural network model is M, and M < N;
obtaining a first loss function according to the second probability score and the first probability score;
obtaining a second loss function according to the second probability score and the classification labels of the questions of each user;
combining the first loss function and the second loss function to obtain a total loss function;
training the second neural network model according to the total loss function to obtain a primarily trained second neural network model;
taking each user question and the classification label corresponding to each user question as a group of training samples, and continuing training the primarily trained second neural network model to obtain a continuously trained second neural network model.
2. The method of claim 1, wherein the first neural network model is pre-trained by:
and training the first neural network model by taking each user question and the classification label corresponding to each user question as a group of training samples to obtain the trained first neural network model.
3. The method of claim 1, wherein the deriving a first loss function from the second probability score and the first probability score comprises:
dividing the second probability score by a preset parameter, and carrying out normalization processing to obtain a first output value of each user question;
obtaining a first loss function according to the first output value of each user question and the first probability score of each user question; the first probability score is obtained by dividing the first probability score by the preset parameter and carrying out normalization processing.
4. The method of claim 1, wherein the deriving a second loss function based on the second probability score and the classification labels of the respective user questions comprises:
normalizing the second probability score to obtain a second output value of each user question;
and obtaining a second loss function according to the second output value of each user question and the classification label of each user question.
5. The method of claim 1, wherein the combining the first loss function with the second loss function results in a total loss function, comprising:
multiplying the first loss function by a first weight, multiplying the second loss function by a second weight, and summing the two to obtain a total loss function, wherein the first weight is greater than the second weight.
6. The method of claim 1, wherein the method further comprises:
and predicting the category to which the current user question belongs by using the second neural network model after continuing training.
7. The method of claim 1, wherein the second neural network model to be trained is a pre-trained contextual omnidirectional predictive model, the pre-training tasks of the second neural network model comprising two tasks of shape filling and context judgment.
8. The method of claim 1, wherein the second neural network model has a number of layers of 2.
9. A training apparatus for a neural network model for question-answer matching, the apparatus comprising:
the acquisition unit is used for acquiring each user question in the sample set and the classification label corresponding to each user question;
the first prediction unit is used for predicting first probability scores of all user questions on all classifications by using a trained first neural network model, wherein the number of layers of the first neural network model is N;
the second prediction unit is used for predicting second probability scores of all user questions on all classifications by using a second neural network model to be trained, wherein the number of layers of the second neural network model is M, and M < N;
the first comparison unit is used for obtaining a first loss function according to the second probability score predicted by the second prediction unit and the first probability score predicted by the first prediction unit;
the second comparison unit is used for obtaining a second loss function according to the second probability score predicted by the second prediction unit and the classification labels of the user questions acquired by the acquisition unit;
the combining unit is used for combining the first loss function obtained by the first comparing unit with the second loss function obtained by the second comparing unit to obtain a total loss function;
the first training unit is used for training the second neural network model according to the total loss function obtained by the combining unit to obtain a primarily trained second neural network model;
the second training unit is used for taking each user question and the classification label corresponding to each user question obtained by the obtaining unit as a group of training samples, and continuing training the preliminarily trained second neural network model obtained by the first training unit to obtain a continuously trained second neural network model.
10. The apparatus of claim 9, wherein the first neural network model is pre-trained by:
and training the first neural network model by taking each user question and the classification label corresponding to each user question as a group of training samples to obtain the trained first neural network model.
11. The apparatus of claim 9, wherein the first comparing unit is specifically configured to:
dividing the second probability score by a preset parameter, and carrying out normalization processing to obtain a first output value of each user question;
obtaining a first loss function according to the first output value of each user question and the first probability score of each user question; the first probability score is obtained by dividing the first probability score by the preset parameter and carrying out normalization processing.
12. The apparatus of claim 9, wherein the second comparing unit is specifically configured to:
normalizing the second probability score to obtain a second output value of each user question;
and obtaining a second loss function according to the second output value of each user question and the classification label of each user question.
13. The apparatus according to claim 9, wherein the combining unit is specifically configured to multiply the first loss function obtained by the first comparing unit by a first weight, multiply the second loss function obtained by the second comparing unit by a second weight, and sum the two to obtain a total loss function, where the first weight is greater than the second weight.
14. The apparatus of claim 9, wherein the apparatus further comprises:
and the predicting unit is used for predicting the category to which the current user question belongs by using the second neural network model which is obtained by the second training unit and is subjected to continuous training.
15. The apparatus of claim 9, wherein the second neural network model to be trained is a pre-trained contextual omnidirectional predictive model, the pre-training tasks of the second neural network model comprising two tasks of shape filling and context judgment.
16. The apparatus of claim 9, wherein the second neural network model has a number of layers of 2.
17. A computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any of claims 1-8.
18. A computing device comprising a memory having executable code stored therein and a processor, which when executing the executable code, implements the method of any of claims 1-8.
CN201910507153.8A 2019-06-12 2019-06-12 Training method and device for neural network model for question-answer matching Active CN110427466B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910507153.8A CN110427466B (en) 2019-06-12 2019-06-12 Training method and device for neural network model for question-answer matching

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910507153.8A CN110427466B (en) 2019-06-12 2019-06-12 Training method and device for neural network model for question-answer matching

Publications (2)

Publication Number Publication Date
CN110427466A CN110427466A (en) 2019-11-08
CN110427466B true CN110427466B (en) 2023-05-26

Family

ID=68407623

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910507153.8A Active CN110427466B (en) 2019-06-12 2019-06-12 Training method and device for neural network model for question-answer matching

Country Status (1)

Country Link
CN (1) CN110427466B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110991613B (en) * 2019-11-29 2022-08-02 支付宝(杭州)信息技术有限公司 Method and system for training neural network
CN110909146B (en) * 2019-11-29 2022-09-09 支付宝(杭州)信息技术有限公司 Label pushing model training method, device and equipment for pushing question-back labels
CN110909815B (en) * 2019-11-29 2022-08-12 深圳市商汤科技有限公司 Neural network training method, neural network training device, neural network processing device, neural network training device, image processing device and electronic equipment
CN111159397B (en) * 2019-12-04 2023-04-18 支付宝(杭州)信息技术有限公司 Text classification method and device and server
CN111078854B (en) * 2019-12-13 2023-10-27 北京金山数字娱乐科技有限公司 Training method and device of question-answer prediction model, and question-answer prediction method and device
CN111199149B (en) * 2019-12-17 2023-10-20 航天信息股份有限公司 Sentence intelligent clarification method and system for dialogue system
CN111274789B (en) * 2020-02-06 2021-07-06 支付宝(杭州)信息技术有限公司 Training method and device of text prediction model
CN111310823B (en) * 2020-02-12 2024-03-29 北京迈格威科技有限公司 Target classification method, device and electronic system
CN111339302A (en) * 2020-03-06 2020-06-26 支付宝(杭州)信息技术有限公司 Method and device for training element classification model
CN111428765B (en) * 2020-03-17 2022-08-30 武汉大学 Target detection method based on global convolution and local depth convolution fusion
CN111797895B (en) * 2020-05-30 2024-04-26 华为技术有限公司 Training method, data processing method, system and equipment for classifier
CN111680148B (en) * 2020-08-14 2020-12-01 支付宝(杭州)信息技术有限公司 Method and device for intelligently responding to question of user
CN112434142B (en) * 2020-11-20 2023-04-07 海信电子科技(武汉)有限公司 Method for marking training sample, server, computing equipment and storage medium
CN113515614A (en) * 2021-06-29 2021-10-19 厦门渊亭信息科技有限公司 Knowledge distillation-based attribute identification method, terminal device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107977707A (en) * 2017-11-23 2018-05-01 厦门美图之家科技有限公司 A kind of method and computing device for resisting distillation neural network model
CN108009638A (en) * 2017-11-23 2018-05-08 深圳市深网视界科技有限公司 A kind of training method of neural network model, electronic equipment and storage medium
KR20180125905A (en) * 2017-05-16 2018-11-26 삼성전자주식회사 Method and apparatus for classifying a class to which a sentence belongs by using deep neural network
CN109598331A (en) * 2018-12-04 2019-04-09 北京芯盾时代科技有限公司 A kind of fraud identification model training method, fraud recognition methods and device
CN109816092A (en) * 2018-12-13 2019-05-28 北京三快在线科技有限公司 Deep neural network training method, device, electronic equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102563752B1 (en) * 2017-09-29 2023-08-04 삼성전자주식회사 Training method for neural network, recognition method using neural network, and devices thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20180125905A (en) * 2017-05-16 2018-11-26 삼성전자주식회사 Method and apparatus for classifying a class to which a sentence belongs by using deep neural network
CN107977707A (en) * 2017-11-23 2018-05-01 厦门美图之家科技有限公司 A kind of method and computing device for resisting distillation neural network model
CN108009638A (en) * 2017-11-23 2018-05-08 深圳市深网视界科技有限公司 A kind of training method of neural network model, electronic equipment and storage medium
CN109598331A (en) * 2018-12-04 2019-04-09 北京芯盾时代科技有限公司 A kind of fraud identification model training method, fraud recognition methods and device
CN109816092A (en) * 2018-12-13 2019-05-28 北京三快在线科技有限公司 Deep neural network training method, device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于BIGRU的番茄病虫害问答系统问句分类研究;赵明等;《农业机械学报》;20180327(第05期);全文 *

Also Published As

Publication number Publication date
CN110427466A (en) 2019-11-08

Similar Documents

Publication Publication Date Title
CN110427466B (en) Training method and device for neural network model for question-answer matching
CN109543039B (en) Natural language emotion analysis method based on deep network
EP3648014A1 (en) Model training method, data identification method and data identification device
CN111259625B (en) Intention recognition method, device, equipment and computer readable storage medium
WO2021143396A1 (en) Method and apparatus for carrying out classification prediction by using text classification model
CN109165692B (en) User character prediction device and method based on weak supervised learning
CN110046248B (en) Model training method for text analysis, text classification method and device
CN110377916B (en) Word prediction method, word prediction device, computer equipment and storage medium
CN111339302A (en) Method and device for training element classification model
CN110852755A (en) User identity identification method and device for transaction scene
CN112508334A (en) Personalized paper combining method and system integrating cognitive characteristics and test question text information
US11461613B2 (en) Method and apparatus for multi-document question answering
CN116992007B (en) Limiting question-answering system based on question intention understanding
CN113988079A (en) Low-data-oriented dynamic enhanced multi-hop text reading recognition processing method
CN113297351A (en) Text data labeling method and device, electronic equipment and storage medium
JP2017174004A (en) Sentence meaning classification calculation device, model learning device, method, and program
Costa et al. Automatic classification of computational thinking skills in elementary school math questions
CN116361655A (en) Model training method, standard problem prediction method, device, equipment and medium
CN113705092B (en) Disease prediction method and device based on machine learning
CN115730590A (en) Intention recognition method and related equipment
CN116994695A (en) Training method, device, equipment and storage medium of report generation model
CN112200488B (en) Risk identification model training method and device for business object
CN113449095A (en) Interview data analysis method and device
CN112131889A (en) Intelligent Chinese subjective question scoring method and system based on big data
CN110334353A (en) Analysis method, device, equipment and the storage medium of word order recognition performance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200924

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20200924

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

GR01 Patent grant
GR01 Patent grant