WO2021169478A1 - 神经网络模型的融合训练方法及装置 - Google Patents

神经网络模型的融合训练方法及装置 Download PDF

Info

Publication number
WO2021169478A1
WO2021169478A1 PCT/CN2020/134777 CN2020134777W WO2021169478A1 WO 2021169478 A1 WO2021169478 A1 WO 2021169478A1 CN 2020134777 W CN2020134777 W CN 2020134777W WO 2021169478 A1 WO2021169478 A1 WO 2021169478A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
prediction
training
neural network
prediction data
Prior art date
Application number
PCT/CN2020/134777
Other languages
English (en)
French (fr)
Other versions
WO2021169478A9 (zh
Inventor
蒋亮
温祖杰
梁忠平
张家兴
Original Assignee
支付宝(杭州)信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 支付宝(杭州)信息技术有限公司 filed Critical 支付宝(杭州)信息技术有限公司
Publication of WO2021169478A1 publication Critical patent/WO2021169478A1/zh
Publication of WO2021169478A9 publication Critical patent/WO2021169478A9/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • One or more embodiments of this specification relate to the field of data processing technology, and in particular to a method and device for fusion training of neural network models.
  • Deep learning has achieved results far beyond traditional methods in the fields of computer vision and natural language processing, and has now become a very mainstream method in the field of artificial intelligence.
  • the deeper the neural network the better the expected prediction effect.
  • a large amount of sample data such as text, images, and videos can be collected, and the neural network model can be trained according to the labels labeled for the sample data, so that the neural network model predicts the input data.
  • the marked labels are getting closer.
  • One or more embodiments of this specification describe a neural network model fusion training method and device, which can improve the effectiveness of neural network model training, thereby making the neural network model more accurate in business prediction of business data.
  • the specific technical solution is as follows.
  • the embodiment provides a fusion training method of a neural network model, which is executed by a computer.
  • the model training process of the neural network model includes several training cycles, and each training cycle corresponds to the use of all sample data in the training sample set.
  • the process of model training; the neural network model is used to perform business prediction on the input business data, and the method includes: obtaining the neural network model to be trained in the current first training period; obtaining the first training sample set This data and the corresponding first annotation data, the first sample data is input into the neural network model to be trained, and the first prediction data of the first sample data is obtained; when the first training period is not In the first training cycle, first target prediction data for the first sample data is acquired; wherein, the first target prediction data is obtained based on the accumulation of first historical prediction data, and the first historical prediction
  • the data includes the prediction data of the first sample data of the neural network model obtained at the end of the training period before the first training period; according to the first annotation data and the first target prediction data, the prediction data is compared with all The
  • the method further includes: detecting whether the first sample data is the last sample data in the training sample set; if so, determining the updated neural network model to be trained as the The first neural network model obtained at the end of the first training period.
  • the method further includes: inputting the first sample data into the first neural network model to obtain third prediction data; and combining the third prediction data with the first target prediction data Fusion, to obtain the target prediction data for the first sample data in the next training period.
  • the method further includes: when the first training period is the first training period, directly determining the second training period according to the comparison between the first annotation data and the first prediction data Prediction loss; updating the neural network model to be trained in a direction that reduces the second prediction loss.
  • the step of obtaining first target prediction data for the first sample data includes: obtaining second prediction data determined by a second neural network model for the first sample data; Wherein, the second neural network model is obtained at the end of the second training period, the second training period is the previous training period of the first training period; when the second training period is not the first training period Period, obtain the second target prediction data for the first sample data; wherein, the second target prediction data is based on the neural network model obtained at the end of the training period before the second training period. Obtained by accumulation of prediction data of the first sample data; based on the fusion of the second target prediction data and the second prediction data, the first target prediction data for the first sample data is determined.
  • the step of determining the first target prediction data for the first sample data based on the fusion of the second target prediction data and the second prediction data includes: acquiring the The first weight of the second target prediction data, and the second weight of the second prediction data; based on the first weight and the second weight, the second target prediction data and the second prediction data Perform a weighted average to obtain first target prediction data for the first sample data.
  • the first weight is less than the second weight.
  • the step of obtaining the first target prediction data for the first sample data further includes: when the second training period is the first training period, based on the second training period The prediction data determines the first target prediction data for the first sample data.
  • the step of determining the first prediction loss according to the comparison between the first annotation data and the first target prediction data and the first prediction data respectively includes: according to the A comparison between the first annotation data and the first prediction data to determine a first sub-prediction loss; according to the comparison between the first target prediction data and the first prediction data, a second sub-prediction loss is determined; Determine the first prediction loss according to the sum of the first sub-prediction loss and the second sub-prediction loss.
  • the first annotation data is an annotation value
  • the step of determining the first sub-prediction loss according to the comparison between the first annotation data and the first prediction data includes: One of a square error function and a log loss function, which compares the first annotation data with the first prediction data to obtain the first sub-prediction loss.
  • the first annotation data is an annotation classification
  • the step of determining the first sub-prediction loss according to the comparison between the first annotation data and the first prediction data includes: One of KL distance, cross entropy, and JS distance, comparing the first annotation data with the first prediction data to obtain the first sub-prediction loss.
  • the neural network model to be trained includes one of a deep neural network DNN, a convolutional neural network CNN, a recurrent neural network RNN, and a BERT model;
  • the business data includes: text, image, audio, At least one type of object data.
  • the embodiment provides a neural network model fusion training device, which is deployed in a computer.
  • the model training process of the neural network model includes several training cycles, and each training cycle corresponds to the use of all sample data in the training sample set.
  • the process of model training; the neural network model is used to perform business prediction on input business data, and the device includes: a first acquisition module configured to acquire the current neural network model to be trained in the first training period; second An obtaining module, configured to obtain first sample data and corresponding first annotation data in the training sample set, input the first sample data into the neural network model to be trained, and obtain the first sample The first prediction data of the data; the third acquisition module is configured to acquire the first target prediction data for the first sample data when the first training period is not the first training period; wherein, the first training period A target prediction data is obtained based on the accumulation of the first historical prediction data, and the first historical prediction data includes the comparison of the neural network model obtained at the end of the training period before the first training period on the first sample data
  • the device further includes: a first detection module configured to detect whether the first sample data is the last sample data in the training sample set; a second determination module configured to detect whether the first sample data is the last sample data in the training sample set; When the sample data is the last sample data in the training sample set, the updated neural network model to be trained is determined as the first neural network model obtained at the end of the training of the first training period.
  • the device further includes: a third determination module configured to input the first sample data into the first neural network model to obtain third prediction data;
  • the first target prediction data is fused to obtain target prediction data for the first sample data in the next training period.
  • the device further includes: a fourth determining module, configured to directly determine the relationship between the first annotation data and the first prediction data when the first training period is the first training period.
  • the second prediction loss is determined by the comparison; the second update module is configured to update the neural network model to be trained in a direction that reduces the second prediction loss.
  • the third acquisition module is specifically configured to: acquire the second prediction data determined by the second neural network model for the first sample data; wherein the second neural network model is in the first sample data.
  • the second training period is obtained at the end of the training, the second training period is the previous training period of the first training period; when the second training period is not the first training period, it is obtained for the first sample.
  • the second target prediction data of the data wherein the second target prediction data is based on the accumulation of the prediction data of the first sample data by the neural network model obtained at the end of the training period before the second training period Obtain; Based on the fusion of the second target prediction data and the second prediction data, determine the first target prediction data for the first sample data.
  • the method when the third acquisition module determines the first target prediction data for the first sample data based on the fusion of the second target prediction data and the second prediction data, the method includes : Obtain the first weight of the second target prediction data and the second weight of the second prediction data; based on the first weight and the second weight, compare the second target prediction data and the second weight The second prediction data is weighted and averaged to obtain the first target prediction data for the first sample data.
  • the first weight is less than the second weight.
  • the third acquisition module is further configured to: when the second training period is the first training period, determine that the first sample data is based on the second prediction data. The first target forecast data.
  • the first determining module is specifically configured to: determine the first sub-prediction loss according to the comparison between the first annotation data and the first prediction data; and according to the first target The prediction data and the first prediction data are compared to determine the second sub-prediction loss; the first prediction loss is determined according to the sum of the first sub-prediction loss and the second sub-prediction loss.
  • the first annotation data is an annotation value
  • the first determining module determines when the first sub-prediction loss is based on a comparison between the first annotation data and the first prediction data , Including: using one of a square error function and a logarithmic loss function to compare the first annotation data with the first prediction data to obtain the first sub-prediction loss.
  • the first annotation data is an annotation classification; the first determining module determines the time when the first sub-prediction loss is based on a comparison between the first annotation data and the first prediction data , Including: using one of KL distance, cross entropy, and JS distance to compare the first annotation data with the first prediction data to obtain the first sub-prediction loss.
  • the neural network model to be trained includes one of a deep neural network DNN, a convolutional neural network CNN, a recurrent neural network RNN, and a BERT model;
  • the business data includes: text, image, audio, At least one type of object data.
  • an embodiment provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed in a computer, the computer is caused to execute any method provided in the first aspect.
  • an embodiment provides a computing device, including a memory and a processor, the memory stores executable code, and when the processor executes the executable code, any method provided in the first aspect is implemented .
  • the prediction data of the first sample data can be accumulated by the several teacher models.
  • the obtained target prediction data adjusts the current model training of the first training cycle.
  • the prediction data of the neural network model to be trained not only should be as close as possible to the labeled data, but also the prediction data of the model and the accumulated prediction data should be as similar as possible.
  • This takes into account Several neural network models obtained in the previous stage of training provide guidance to the training of neural network models. Therefore, it is possible to reduce the vibration problem in the model training process, improve the effectiveness of neural network model training, and make the neural network model's business prediction of business data better. precise.
  • FIG. 1 is a schematic diagram of a process flow of a model training stage provided by an embodiment
  • FIG. 2 is a schematic flow chart of the model prediction stage provided by an embodiment
  • FIG. 3 is a schematic diagram of the principle of the model training process provided by an embodiment
  • FIG. 4 is a schematic flowchart of a neural network model fusion training method provided by an embodiment
  • Fig. 5 is a schematic block diagram of a neural network model fusion training device provided by an embodiment.
  • the neural network model contains a series of calculations and parameters in the calculations. These parameters can be called model parameters.
  • the processing process related to the neural network model can usually include a model training phase and a model prediction phase.
  • Training the neural network model is a process of continuously adjusting the model parameters so that when the sample data is predicted by the neural network model, the predicted data and the labeled data can be as consistent as possible.
  • Fig. 1 is a schematic diagram of a process flow of a model training stage provided by an embodiment. Among them, the training sample set contains a large amount of sample data and corresponding labels, and the labels can also be referred to as labeled data.
  • the sample data may include at least one of text, image, audio, and object data.
  • Object data can be understood as data related to physical objects, such as registered user data (such as user attributes and behavior data), urban road data (such as road congestion, road construction, etc.).
  • sample data can be input to the neural network model to obtain prediction data, the prediction data is compared with the label, and the neural network model is updated according to the comparison result.
  • the neural network model can be used to make business predictions on the input business data.
  • the service data may include at least one of text, image, audio, and object data.
  • Business predictions can include many types, such as predicting pedestrians, vehicles, and obstacles in images, and predicting text corresponding to audio.
  • Fig. 2 is a schematic flow chart of the model prediction stage provided by an embodiment. Among them, the image is input into the neural network model, and the prediction result output by the neural network model can be obtained, that is, the pedestrian area in the image.
  • Figure 2 is only an example of model business prediction. In actual scenarios, a variety of neural network models can be trained to perform many types of business predictions.
  • the embodiment of this specification provides a fusion training method of a neural network model.
  • the model training process of the neural network model includes several training cycles, and each training cycle corresponds to the process of using all sample data in the training sample set for model training.
  • the model can be adjusted according to the difference between the model's predicted data and the labeled data.
  • this embodiment introduces a teacher model, and the historical neural network model obtained in the previous stage of the model training process can be used as the teacher model of the subsequent stage training process.
  • Use the teacher model to guide the model training process, so that when determining the prediction loss, not only the difference between the predicted data and the labeled data, but also the predicted data of the teacher model and the predicted data of the neural network model to be trained can be considered. In turn, reduce the training shock that may occur during the model training process.
  • the historical prediction data of the sample data can be accumulated by several historical neural network models, and the accumulated prediction data can be used as the target prediction data. Guide the model training process.
  • Fig. 3 is a schematic diagram of a principle of the fusion training method provided by the embodiment of this specification.
  • model NN is referred to as the abbreviation of "neural network model NN to be trained”.
  • Di represents the prediction data determined by the model NN for the sample data Si.
  • Model NN1, model NN2, and model NN3 are the neural network models obtained at the end of training period 1, training period 2, and training period 3, respectively.
  • the model training process may include more training cycles. As for the specific number of training cycles included in the entire model training process, it can be determined when the training of the neural network model to be trained meets the convergence condition.
  • the model NN can determine the predicted data Di of the sample data Si, determine the loss based on the comparison of the predicted data Di and the labeled data, and update the model NN in the direction of reducing the loss.
  • the model NN1 can be obtained at the end of training cycle 1 and the model NN1 can be used as the historical neural network model of the subsequent training cycle. At this time, the sample data Si is input into the model NN1, and the historical prediction data HD1 can be obtained. This process does not update the model NN1.
  • the historical prediction data HD1 can be directly used as the target prediction data TD1, or the accumulation result of the historical prediction data HD1 and the initial prediction data can be used as the target prediction data TD1.
  • the model NN can determine the predicted data Di of the sample data Si, according to the comparison between the predicted data Di and the labeled data, and the comparison between the predicted data Di and the target predicted data TD1, determine Loss, update the model NN in the direction of reducing the loss.
  • the model NN2 can be obtained at the end of training cycle 2 and the model NN2 can be used as the historical neural network model of the subsequent training cycle. At this time, the sample data Si is input into the model NN2, and the historical prediction data HD2 can be obtained. This process does not update the model NN2.
  • the accumulation result of the historical prediction data HD2 and the target prediction data TD1 is used as the updated target prediction data TD2.
  • the updated target prediction data TD2 realizes the accumulation of the historical prediction data HD1 and the historical prediction data HD2.
  • the model NN can determine the predicted data Di of the sample data Si. According to the comparison between the predicted data Di and the labeled data, and the comparison between the predicted data Di and the target predicted data TD2, determine Loss, update the model NN in the direction of reducing the loss.
  • the model NN3 can be obtained at the end of training cycle 3, and the model NN3 can be used as the historical neural network model of the subsequent training cycle. At this time, the sample data Si is input into the model NN3, and the historical prediction data HD3 can be obtained. This process does not update the model NN3.
  • the accumulation result of the historical prediction data HD3 and the target prediction data TD2 is used as the updated target prediction data TD3.
  • the updated target prediction data TD3 realizes the accumulation of historical prediction data HD1, historical prediction data HD2, and historical prediction data HD3. Thereafter, the process proceeds in sequence until the model NN converges.
  • the fusion training method provided by the embodiment of the present specification will be described in detail in conjunction with the flowchart of FIG. 4.
  • the method is executed by a computer, and the execution subject can be any device, device, platform, or device cluster with computing and processing capabilities.
  • the training process is not guided by the historical neural network model, which is the no-teacher mode.
  • the first training cycle is another training cycle, there is the guidance of the historical neural network model, which is a teacher mode.
  • the model training method can be described through the following steps S410 to S450.
  • Step S410 Obtain the neural network model NN to be trained in the current first training cycle.
  • the model parameters in the neural network model NN to be trained have been trained many times, but they are not accurate enough.
  • the model parameters of the neural network model NN to be trained can be adjusted continuously until the model converges.
  • Step S420 Obtain the first sample data S1 and the corresponding first annotation data X1 in the training sample set, input the first sample data S1 into the neural network model NN to be trained, and obtain the first prediction of the first sample data S1 Data D1.
  • the first sample data S1 may be one or multiple (ie, a batch).
  • the first sample data may be characteristic data used to identify the sample.
  • the first sample data may include the pixel values of the pixels of the image; when the sample is a registered user, the first sample data may include data such as attribute characteristics and behavior characteristics, and the attribute characteristics may include user
  • the registration time, gender, occupation, etc., behavior characteristics of the user can be extracted from the behavior data related to the user.
  • the first annotation data X1 may correspond to different data types, for example, it may be an annotation value or an annotation classification.
  • the neural network model to be trained is a regression model
  • the first prediction data D1 is a predicted value
  • the neural network model to be trained is a classification model
  • the first prediction The data D1 usually includes the predicted probability distribution under each category.
  • the classification category includes three
  • the first annotation data may be (0,0,1), (0,1,0) or (1,0,0).
  • the neural network model NN to be trained can determine the first prediction data D1 of the input first sample data S1 according to the model parameters. When the number of the first sample data S1 is multiple, the first prediction data D1 of each first sample data S1 can be obtained through the neural network model NN to be trained.
  • Step S430 Obtain first target prediction data for the first sample data S1.
  • the first target prediction data may be TD2 in FIG. 3.
  • the first target prediction data is obtained based on the accumulation of the first historical prediction data, and the first historical prediction data includes the prediction data of the neural network model obtained at the end of the training period before the first training period on the first sample data. . If the neural network model obtained at the end of the training period before the first training period is used as the historical neural network model, the first historical prediction data includes the prediction data of several historical neural network models on the first sample data.
  • the training period before training period 3 includes training period 1 and training period 2.
  • the historical neural network models obtained at the end of training period 1 and training period 2 are model NN1 and model NN2, respectively.
  • the prediction data of the model NN1 for the first sample data S1 is HD1
  • the prediction data of the model NN2 for the first sample data S1 is HD2.
  • the prediction data HD1 and the prediction data HD2 may also be referred to as historical prediction data.
  • the first historical prediction data includes prediction data HD1 and prediction data HD2.
  • step S430 can be performed after the first annotation data X1 is obtained in step S420, and before the first sample data S1 is input to the neural network model NN to be trained, or it can be performed after the first sample data S1 is input to the neural network model NN to be trained. Execute afterwards.
  • Step S440 Determine the first prediction loss Loss1 according to the comparison between the first annotation data X1 and the first target prediction data (for example, TD2) and the first prediction data D1.
  • step S440 is used to determine the first prediction loss Loss11 of the first sample data S11, and the first For the first prediction loss Loss12 of the sample data S12, the first prediction loss Loss11 and Loss12 are fused to obtain the first prediction loss Loss1 after the fusion.
  • the neural network model to be trained can also be updated.
  • step S450 the neural network model NN to be trained is updated in the direction of reducing the first prediction loss Loss1. Updating the neural network model NN to be trained can be understood as adjusting the model parameters of the neural network model NN to be trained to reduce the prediction loss.
  • the above steps S410 to S450 realize an update of the model, which can be understood as a round of training in the model training process, and all the sample data in the training sample set can be trained in the above manner.
  • the number of training times for the neural network model NN to be trained is greater than the preset number threshold, that is, the number of training times is sufficient, or the first prediction loss Loss1 is less than the preset loss threshold, it can be determined that the model training is completed and the convergence condition is reached.
  • neural network models obtained in the training period before the first training period can be used as the teacher model, and the prediction data of the first sample data can be accumulated by the several teacher models, and the accumulated results Target prediction data adjusts the current model training of the first training cycle.
  • Several neural network models obtained in the first stage of training provide guidance to the training of neural network models, so as to reduce the vibration problem during model training, improve the effectiveness of neural network model training, and make the neural network model's business predictions on business data more accurate .
  • the model training process it can also be detected whether the first sample data S1 is the last sample data in the training sample set. If it is, the updated neural network model NN to be trained is determined as the first neural network model obtained at the end of the first training period. For example, when the first training period is training period 3, the neural network model NN3 is obtained at the end of training period 3 training.
  • the detection operation can be performed periodically according to a preset duration.
  • the first sample data S1 can also be input into the first neural network model to obtain the third prediction data; the third prediction data is fused with the first target prediction data to obtain the next training period Target prediction data for the first sample data.
  • step S430 when obtaining the first target prediction data for the first sample data S1, the neural network model obtained at the end of the training period before the first training period may be obtained, and the first sample The data S1 is input to the neural network model to obtain prediction data for the first sample data S1 respectively, and the first target prediction data is determined based on the average value of the obtained prediction data.
  • the neural network models NN2 and NN1 obtained at the end of the training period before training period 3 can be obtained, and the first sample data S1 is input into the models NN2 and NN1 to obtain the The prediction data HD2 and HD1 of the first sample data S1 obtain the first target prediction data TD2 based on the average value of the prediction data HD2 and HD1.
  • step S430 obtains the first target for the first sample data S1
  • the implementation shown in the following steps 1a to 3a can be used.
  • Step 1a Obtain second prediction data determined by the second neural network model for the first sample data S1.
  • the second neural network model is obtained at the end of the second training period.
  • the second training period is the previous training period of the first training period.
  • the first training period is training period 3 in FIG. 3, the second training period is training period 2.
  • the second neural network model is model NN2, and the second prediction data may be HD2.
  • each sample data in the training sample set can be input into the second neural network model in advance to obtain the corresponding prediction data set.
  • each sample data in the training sample set can be input to the model NN2 to obtain the corresponding prediction data set.
  • step 1a when obtaining the second prediction data HD2 determined by the model NN2 for the first sample data S1, read the saved second prediction data HD2 corresponding to the first sample data S1 from the above prediction data set. .
  • the first sample data S1 can also be directly input to the second neural network model NN2, and the second prediction data HD2 of the first sample data S1 can be obtained through the second neural network model NN2.
  • Step 2a Obtain second target prediction data for the first sample data S1.
  • the second target prediction data is obtained based on the accumulation of the prediction data of the first sample data by the neural network model obtained at the end of the training period before the second training period.
  • the second target prediction data may be obtained based on the accumulation of the second historical prediction data.
  • the second historical prediction data includes the prediction data of the first sample data S1 of the neural network model obtained at the end of the training period before the second training period.
  • the training period before training period 2 includes training period 1
  • the neural network model obtained at the end of training period 1 is model NN1.
  • the prediction data of the model NN1 for the first sample data is HD1. Therefore, the second historical prediction data includes the prediction data HD1. That is, the second target prediction data TD1 is obtained based on the accumulation of the prediction data HD1.
  • the second training period is not the first training period.
  • the first sample data based on the neural network model obtained at the end of the training period before the second training period can be obtained
  • the second target prediction data obtained by the accumulation of the prediction data.
  • Step 3a Determine the first target prediction data for the first sample data S1 based on the fusion of the second target prediction data and the second prediction data.
  • the second target prediction data and the second prediction data can be directly averaged, and the average value is determined as the first target prediction data. It is also possible to obtain the first weight w1 of the second target prediction data and the second weight w2 of the second prediction data, and perform a weighted average on the second target prediction data and the second prediction data based on the first weight w1 and the second weight w2 , Obtain the first target prediction data for the first sample data S1.
  • the first weight w1 and the second weight w2 can be preset.
  • this step may determine the first target prediction data TD2 for the first sample data S1 based on the fusion of the second target prediction data TD1 and the second prediction data HD2. More specifically, the second target prediction data TD1 and the second prediction data HD2 may be directly averaged, and the average value may be determined as the first target prediction data TD2. It is also possible to obtain the first weight w1 of the second target prediction data TD1 and the second weight w2 of the second prediction data HD2. Based on the first weight w1 and the second weight w2, the second target prediction data TD1 and the second prediction data HD2 performs a weighted average to obtain the first target prediction data TD2 for the first sample data S1.
  • steps 1a to 3a can be performed after the end of the second training period.
  • the process from step 1a to step 3a is performed to obtain the first target prediction data set for all the sample data of the training sample set.
  • the first target prediction data for the first sample data S1 is directly obtained from the stored first target prediction data set.
  • the second prediction data HD2 is determined by the model NN2, and the second target prediction data is determined by the historical neural network model before the model NN2.
  • the model NN2 is closer to the neural network model NN to be trained, and its quality is higher. Therefore, when setting the weight, you can pay more attention to the weight of the second prediction data HD2 in the accumulation, that is, make the first weight w1 is less than the second weight w2. In this way, the newer the prediction data can have a greater proportion in the first target prediction data, and the stability of the model training process will be better.
  • the specific steps of obtaining the first target prediction data for the first sample data S1 may include:
  • the second prediction data when determining the first target prediction data for the first sample data S1 based on the second prediction data, the second prediction data may be directly determined as the first target prediction data for the first sample data S1; or The accumulation result of the second prediction data and the initial prediction data is used as the first target prediction data.
  • the initial prediction data may include preset values.
  • the initial prediction data can include a uniform probability distribution.
  • the second prediction data HD1 determined by the second neural network model NN1 for the first sample data S1 can be obtained; based on the second prediction data HD1, determine the first target prediction data TD1 for the first sample data S1.
  • the second prediction data HD1 may be directly determined as the first target prediction data for the first sample data S1 TD1;
  • the accumulation result of the second prediction data HD1 and the initial prediction data can also be used as the first target prediction data TD1.
  • the second training period is the first training period
  • there is no other training period before the second training period Therefore, it is possible to determine the first training period for the first sample data directly based on the second prediction data obtained by the second neural network model. Target forecast data.
  • Step S440 the step of determining the first prediction loss Loss1 according to the comparison between the first annotation data X1 and the first target prediction data and the first prediction data D1, may specifically include the following implementation manners shown in steps 2b to 3b.
  • Step 1b Determine the first sub-prediction loss Loss_1 according to the comparison between the first annotation data X1 and the first prediction data D1.
  • step 1b may include adopting one of the square error function and the log loss function.
  • the labeled data X1 is compared with the first prediction data D1 to obtain the first sub-prediction loss Loss_1.
  • the first prediction data D1 is the prediction classification, that is, in the classification model, step 1b may include: using one of KL distance, cross entropy, and JS distance to compare the first annotation data X1 is compared with the first prediction data D1 to obtain the first sub-prediction loss Loss_1.
  • Step 2b Determine the second sub-prediction loss Loss_2 according to the comparison between the first target prediction data and the first prediction data D1.
  • this step 2b can also use the corresponding loss function in step 1b for calculation.
  • Step 3b Determine the first prediction loss Loss1 according to the sum of the first sub-prediction loss Loss_1 and the second sub-prediction loss Loss_1.
  • the sum of the first sub-prediction loss Loss_1 and the second sub-prediction loss Loss_2 can be directly determined as the first prediction loss Loss1. It may also be determined as the first prediction loss Loss1 according to the result of performing the preset processing on the sum value.
  • the no-teacher mode that is, when the first training period is the first training period, for example, in the training period 1 in Figure 3, since the previous training period does not exist in the training period 1, it can be directly based on the first labeled data
  • the comparison between X1 and the first prediction data D1 determines the second prediction loss Loss2, and updates the neural network model NN to be trained in the direction that reduces the second prediction loss Loss2.
  • the aforementioned neural network model to be trained may include Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), Recurrent Neural Network (RNN), and two-way encoder representation based on the Transformer model (Bidirectional Encoder Representations from Transformers, BERT) model.
  • DNN Deep Neural Networks
  • CNN Convolutional Neural Networks
  • RNN Recurrent Neural Network
  • BERT Two-way Encoder Representations from Transformers, BERT
  • Fig. 5 is a schematic block diagram of the neural network model fusion training device provided in this embodiment.
  • the device 500 is deployed in a computer, and the device embodiment corresponds to the method embodiment shown in FIGS. 3 to 4.
  • the model training process of the neural network model includes several training cycles, and each training cycle corresponds to a process of model training using all sample data in the training sample set, and the neural network model is used to perform business prediction on the input business data.
  • the device 500 includes the following modules.
  • the first obtaining module 510 is configured to obtain the neural network model to be trained in the current first training period.
  • the second acquisition module 520 is configured to acquire first sample data and corresponding first annotation data in the training sample set, input the first sample data into the neural network model to be trained, and obtain the first sample data The first prediction data of the same data.
  • the third obtaining module 530 is configured to obtain first target prediction data for the first sample data when the first training period is not the first training period; wherein, the first target prediction data is based on the The first historical prediction data is obtained by accumulation of the first historical prediction data, and the first historical prediction data includes the prediction data of the first sample data of the neural network model obtained at the end of the training period before the first training period.
  • the first determining module 540 is configured to determine the first prediction loss according to the comparison between the first annotation data and the first target prediction data and the first prediction data, respectively.
  • the first update module 550 is configured to update the neural network model to be trained in a direction that reduces the first prediction loss.
  • the device 500 further includes: a first detection module (not shown in the figure), configured to detect whether the first sample data is the last sample data in the training sample set; and a second determination Module (not shown in the figure), configured to determine the updated neural network model to be trained as the first training period training when the first sample data is the last sample data in the training sample set The first neural network model obtained at the end.
  • a first detection module (not shown in the figure), configured to detect whether the first sample data is the last sample data in the training sample set
  • a second determination Module (not shown in the figure), configured to determine the updated neural network model to be trained as the first training period training when the first sample data is the last sample data in the training sample set The first neural network model obtained at the end.
  • the device 500 further includes: a third determination module (not shown in the figure), configured to input the first sample data into the first neural network model to obtain third prediction data;
  • the third prediction data is fused with the first target prediction data to obtain target prediction data for the first sample data in the next training period.
  • the device 500 further includes: a fourth determining module 531, configured to directly base on the first annotation data and the first prediction data when the first training period is the first training period The comparison between them determines the second prediction loss; the second update module 541 is configured to update the neural network model to be trained in a direction that reduces the second prediction loss.
  • the third acquisition module 530 is specifically configured to: acquire the second prediction data determined by the second neural network model for the first sample data; wherein, the second neural network model is in the second training Obtained at the end of the periodic training, the second training period is the previous training period of the first training period; when the second training period is not the first training period, obtain the data for the first sample data Second target prediction data; wherein the second target prediction data is obtained based on the accumulation of the prediction data of the first sample data by the neural network model obtained at the end of the training period before the second training period; Based on the fusion of the second target prediction data and the second prediction data, first target prediction data for the first sample data is determined.
  • the method when the third acquisition module 530 determines the first target prediction data for the first sample data based on the fusion of the second target prediction data and the second prediction data, the method includes: Obtain the first weight of the second target prediction data and the second weight of the second prediction data; based on the first weight and the second weight, the second target prediction data and the second weight A weighted average of the two prediction data is performed to obtain the first target prediction data for the first sample data.
  • the first weight is less than the second weight.
  • the third acquisition module 530 is further configured to: when the second training period is the first training period, determine the first sample data based on the second prediction data. One target forecast data.
  • the first determining module 540 is specifically configured to: determine the first sub-prediction loss according to the comparison between the first annotation data and the first prediction data; and according to the first target prediction data The second sub-prediction loss is determined by comparison with the first prediction data; and the first prediction loss is determined according to the sum of the first sub-prediction loss and the second sub-prediction loss.
  • the first annotation data is an annotation value
  • the first determining module 540 determines the first sub-prediction loss based on the comparison between the first annotation data and the first prediction data
  • the method includes: using one of a square error function and a logarithmic loss function to compare the first annotation data with the first prediction data to obtain the first sub-prediction loss.
  • the first annotation data is an annotation classification
  • the first determining module 540 determines the first sub-prediction loss based on the comparison between the first annotation data and the first prediction data When, it includes: using one of KL distance, cross entropy, and JS distance to compare the first annotation data with the first prediction data to obtain the first sub-prediction loss.
  • the neural network model to be trained includes one of DNN, CNN, RNN, and BERT models; the service data includes at least one of text, image, audio, and object data.
  • the foregoing device embodiment corresponds to the method embodiment, and for specific description, please refer to the description of the method embodiment part, which will not be repeated here.
  • the device embodiment is obtained based on the corresponding method embodiment, and has the same technical effect as the corresponding method embodiment. For specific description, please refer to the corresponding method embodiment.
  • a computer-readable storage medium on which a computer program is stored, and when the computer program is executed in a computer, the computer is caused to execute the methods described in FIGS. 3 to 4.
  • a computing device including a memory and a processor, the memory stores executable code, and when the processor executes the executable code, the implementation of FIGS. 3 to 3 4 described methods.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Image Analysis (AREA)
  • Feedback Control In General (AREA)

Abstract

本说明书实施例提供一种神经网络模型的融合训练方法及装置。通过神经网络模型的模型训练过程包括若干训练周期,每个训练周期对应于使用训练样本集中所有样本数据进行模型训练的过程,神经网络模型用于对输入的业务数据进行业务预测。在当前的第一训练周期中,当第一训练周期不是第一个训练周期时,基于第一训练周期之前的训练周期训练结束时得到的神经网络模型对第一样本数据的预测数据的累积,而得到的第一目标预测数据,即根据第一目标预测数据对待训练神经网络模型的训练过程进行调整,更新待训练神经网络模型。

Description

神经网络模型的融合训练方法及装置 技术领域
本说明书一个或多个实施例涉及数据处理技术领域,尤其涉及一种神经网络模型的融合训练方法及装置。
背景技术
深度学习在计算机视觉、自然语言处理领域都取得了远超传统方法的效果,现在已经成为人工智能领域非常主流的方法。通常来讲,神经网络的深度越深,期望的预测效果越好。在对神经网络模型进行训练时,可以采集大量的文本、图像、视频等样本数据,并依据针对样本数据标注的标签,对神经网络模型进行训练,以使得神经网络模型对输入数据的预测结果与标注的标签逐渐接近。
因此,希望能有改进的方案,可以提高神经网络模型训练的有效性,在使用神经网络模型对业务数据进行业务预测时可以提高业务预测的准确性。
发明内容
本说明书一个或多个实施例描述了神经网络模型的融合训练方法及装置,可以提高神经网络模型训练的有效性,进而使得神经网络模型对业务数据的业务预测更准确。具体的技术方案如下。
第一方面,实施例提供了一种神经网络模型的融合训练方法,通过计算机执行,所述神经网络模型的模型训练过程包括若干训练周期,每个训练周期对应于使用训练样本集中所有样本数据进行模型训练的过程;所述神经网络模型用于对输入的业务数据进行业务预测,所述方法包括:获取当前的第一训练周期的待训练神经网络模型;获取所述训练样本集中的第一样本数据和对应的第一标注数据,将所述第一样本数据输入所述待训练神经网络模型,并得到所述第一样本数据的第一预测数据;当所述第一训练周期不是第一个训练周期时,获取针对所述第一样本数据的第一目标预测数据;其中,所述第一目标预测数据基于对第一历史预测数据的累积而得到,所述第一历史预测数据包括所述第一训练周期之前的训练周期训练结束时得到的神经网络模型对所述第一样本数据的预测数据;根据所述第一标注数据和所述第一目标预测数据分别与所述第一预测数据之间的比较,确定第一预测损失;向使得所述第一预测损失减小的方向,更新所述待训练神经网络模型。
在一种实施方式中,该方法还包括:检测所述第一样本数据是否为所述训练样本集中的最后一个样本数据;如果是,则将更新后的待训练神经网络模型确定为所述第一训练周期训练结束时得到的第一神经网络模型。
在一种实施方式中,该方法还包括:将所述第一样本数据输入所述第一神经网络模型,得到第三预测数据;将所述第三预测数据与所述第一目标预测数据融合,得到下一训练周期时针对所述第一样本数据的目标预测数据。
在一种实施方式中,该方法还包括:当所述第一训练周期是第一个训练周期时,直接根据所述第一标注数据和所述第一预测数据之间的比较,确定第二预测损失;向使得所述第二预测损失减小的方向,更新所述待训练神经网络模型。
在一种实施方式中,所述获取针对所述第一样本数据的第一目标预测数据的步骤,包括:获取第二神经网络模型针对所述第一样本数据确定的第二预测数据;其中,所述第二神经网络模型在第二训练周期训练结束时得到,所述第二训练周期为所述第一训练周期的前一训练周期;当所述第二训练周期不是第一个训练周期时,获取针对所述第一样本数据的第二目标预测数据;其中,所述第二目标预测数据基于所述第二训练周期之前的训练周期训练结束时得到的神经网络模型对所述第一样本数据的预测数据的累积而得到;基于所述第二目标预测数据与所述第二预测数据的融合,确定针对所述第一样本数据的第一目标预测数据。
在一种实施方式中,所述基于所述第二目标预测数据与所述第二预测数据的融合,确定针对所述第一样本数据的第一目标预测数据的步骤,包括:获取所述第二目标预测数据的第一权重,以及所述第二预测数据的第二权重;基于所述第一权重和所述第二权重,对所述第二目标预测数据和所述第二预测数据进行加权平均,得到针对所述第一样本数据的第一目标预测数据。
在一种实施方式中,所述第一权重小于所述第二权重。
在一种实施方式中,所述获取针对所述第一样本数据的第一目标预测数据的步骤,还包括:当所述第二训练周期是第一个训练周期时,基于所述第二预测数据,确定针对所述第一样本数据的第一目标预测数据。
在一种实施方式中,所述根据所述第一标注数据和所述第一目标预测数据分别与所述第一预测数据之间的比较,确定第一预测损失的步骤,包括:根据所述第一标注数据与所述第一预测数据之间的比较,确定第一子预测损失;根据所述第一目标预测数据与 所述第一预测数据之间的比较,确定第二子预测损失;根据所述第一子预测损失和所述第二子预测损失的和值,确定第一预测损失。
在一种实施方式中,所述第一标注数据为标注值;所述根据所述第一标注数据与所述第一预测数据之间的比较,确定第一子预测损失的步骤,包括:采用平方误差函数、对数损失函数中的一种,对所述第一标注数据与所述第一预测数据进行比较,得到第一子预测损失。
在一种实施方式中,所述第一标注数据为标注分类;所述根据所述第一标注数据与所述第一预测数据之间的比较,确定第一子预测损失的步骤,包括:采用KL距离、交叉熵、JS距离中的一种,对所述第一标注数据与所述第一预测数据进行比较,得到第一子预测损失。
在一种实施方式中,所述待训练神经网络模型包括深度神经网络DNN、卷积神经网络CNN、循环神经网络RNN和BERT模型中的一种;所述业务数据包括:文本、图像、音频、对象数据中的至少一种。
第二方面,实施例提供了一种神经网络模型的融合训练装置,部署在计算机中,所述神经网络模型的模型训练过程包括若干训练周期,每个训练周期对应于使用训练样本集中所有样本数据进行模型训练的过程;所述神经网络模型用于对输入的业务数据进行业务预测,所述装置包括:第一获取模块,配置为获取当前的第一训练周期的待训练神经网络模型;第二获取模块,配置为获取所述训练样本集中的第一样本数据和对应的第一标注数据,将所述第一样本数据输入所述待训练神经网络模型,并得到所述第一样本数据的第一预测数据;第三获取模块,配置为当所述第一训练周期不是第一个训练周期时,获取针对所述第一样本数据的第一目标预测数据;其中,所述第一目标预测数据基于对第一历史预测数据的累积而得到,所述第一历史预测数据包括所述第一训练周期之前的训练周期训练结束时得到的神经网络模型对所述第一样本数据的预测数据;第一确定模块,配置为根据所述第一标注数据和所述第一目标预测数据分别与所述第一预测数据之间的比较,确定第一预测损失;第一更新模块,配置为向使得所述第一预测损失减小的方向,更新所述待训练神经网络模型。
在一种实施方式中,装置还包括:第一检测模块,配置为检测所述第一样本数据是否为所述训练样本集中的最后一个样本数据;第二确定模块,配置为当所述第一样本数据是所述训练样本集中的最后一个样本数据时,将更新后的待训练神经网络模型确定为所述第一训练周期训练结束时得到的第一神经网络模型。
在一种实施方式中,装置还包括:第三确定模块,配置为将所述第一样本数据输入所述第一神经网络模型,得到第三预测数据;将所述第三预测数据与所述第一目标预测数据融合,得到下一训练周期时针对所述第一样本数据的目标预测数据。
在一种实施方式中,装置还包括:第四确定模块,配置为当所述第一训练周期是第一个训练周期时,直接根据所述第一标注数据和所述第一预测数据之间的比较,确定第二预测损失;第二更新模块,配置为向使得所述第二预测损失减小的方向,更新所述待训练神经网络模型。
在一种实施方式中,所述第三获取模块,具体配置为:获取第二神经网络模型针对所述第一样本数据确定的第二预测数据;其中,所述第二神经网络模型在第二训练周期训练结束时得到,所述第二训练周期为所述第一训练周期的前一训练周期;当所述第二训练周期不是第一个训练周期时,获取针对所述第一样本数据的第二目标预测数据;其中,所述第二目标预测数据基于所述第二训练周期之前的训练周期训练结束时得到的神经网络模型对所述第一样本数据的预测数据的累积而得到;基于所述第二目标预测数据与所述第二预测数据的融合,确定针对所述第一样本数据的第一目标预测数据。
在一种实施方式中,所述第三获取模块,基于所述第二目标预测数据与所述第二预测数据的融合,确定针对所述第一样本数据的第一目标预测数据时,包括:获取所述第二目标预测数据的第一权重,以及所述第二预测数据的第二权重;基于所述第一权重和所述第二权重,对所述第二目标预测数据和所述第二预测数据进行加权平均,得到针对所述第一样本数据的第一目标预测数据。
在一种实施方式中,所述第一权重小于所述第二权重。
在一种实施方式中,所述第三获取模块,还配置为:当所述第二训练周期是第一个训练周期时,基于所述第二预测数据,确定针对所述第一样本数据的第一目标预测数据。
在一种实施方式中,所述第一确定模块,具体配置为:根据所述第一标注数据与所述第一预测数据之间的比较,确定第一子预测损失;根据所述第一目标预测数据与所述第一预测数据之间的比较,确定第二子预测损失;根据所述第一子预测损失和所述第二子预测损失的和值,确定第一预测损失。
在一种实施方式中,所述第一标注数据为标注值;所述第一确定模块,根据所述第一标注数据与所述第一预测数据之间的比较,确定第一子预测损失时,包括:采用平方误差函数、对数损失函数中的一种,对所述第一标注数据与所述第一预测数据进行比较, 得到第一子预测损失。
在一种实施方式中,所述第一标注数据为标注分类;所述第一确定模块,根据所述第一标注数据与所述第一预测数据之间的比较,确定第一子预测损失时,包括:采用KL距离、交叉熵、JS距离中的一种,对所述第一标注数据与所述第一预测数据进行比较,得到第一子预测损失。
在一种实施方式中,所述待训练神经网络模型包括深度神经网络DNN、卷积神经网络CNN、循环神经网络RNN和BERT模型中的一种;所述业务数据包括:文本、图像、音频、对象数据中的至少一种。
第三方面,实施例提供了一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行第一方面提供的任一方法。
第四方面,实施例提供了一种计算设备,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现第一方面提供的任一方法。
在本说明书实施例提供的方法和装置中,可以将第一训练周期之前的训练周期得到的若干神经网络模型作为老师模型,将若干老师模型对第一样本数据的预测数据进行累积,根据累积得到的目标预测数据,对当前的第一训练周期的模型训练进行调整。本说明书实施例在训练神经网络模型的过程中,不仅要使得待训练神经网络模型的预测数据与标注数据尽可能接近,还要使得模型的预测数据与累积的预测数据尽可能相似,这考虑了前一阶段训练得到的若干神经网络模型对待训练神经网络模型的指导,因此能够减少模型训练过程中的震荡问题,提高神经网络模型训练的有效性,进而使得神经网络模型对业务数据的业务预测更准确。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单的介绍。显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为一个实施例提供的模型训练阶段的流程示意图;
图2为一个实施例提供的模型预测阶段的流程示意图;
图3为一个实施例提供的模型训练过程的原理示意图;
图4为一个实施例提供的神经网络模型的融合训练方法的流程示意图;
图5为一个实施例提供的神经网络模型的融合训练装置的示意性框图。
具体实施方式
下面结合附图,对本说明书提供的方案进行描述。
神经网络模型包含一系列运算和运算中的参数,这些参数可以称为模型参数。与神经网络模型相关的处理过程,通常可以包括模型训练阶段和模型预测阶段。对神经网络模型进行训练,是不断地调整模型参数,使得通过神经网络模型对样本数据进行预测时,预测数据与标注数据能够尽可能一致的过程。图1为一个实施例提供的模型训练阶段的流程示意图。其中,训练样本集包含大量的样本数据和对应的标签(label),标签也可称为标注数据。样本数据可以包括文本、图像、音频、对象数据中的至少一种。对象数据可以理解为与实体对象相关的数据,例如注册用户数据(例如用户属性、行为等数据)、城市道路数据(例如道路拥堵、道路建设等数据)。在一轮模型训练中,可以将样本数据输入神经网络模型,得到预测数据,将预测数据与标签进行比较,并根据比较结果对神经网络模型进行更新。
当神经网络模型训练好之后,神经网络模型可以用于对输入的业务数据进行业务预测。业务数据可以包括文本、图像、音频、对象数据中的至少一种。业务预测可以包括很多种,例如预测图像中的行人、车辆、障碍物,预测音频对应的文字等。图2为一个实施例提供的模型预测阶段的流程示意图。其中,将图像输入神经网络模型,可以得到神经网络模型输出的预测结果,即图像中的行人区域。图2仅仅是对模型业务预测的一种举例,在实际场景中可以训练多种神经网络模型,用于执行很多种类的业务预测。
本说明书实施例提供了神经网络模型的融合训练方法。其中,神经网络模型的模型训练过程包括若干训练周期,每个训练周期对应于使用训练样本集中所有样本数据进行模型训练的过程。
在模型训练过程中,可以根据模型的预测数据与标注数据之间的差异,对模型进行调整。为了减少模型训练过程中可能产生的训练震荡等问题,本实施例引入老师模型,可以将模型训练过程中前一阶段得到的历史神经网络模型作为后一阶段训练过程的老师模型。使用老师模型对模型训练过程进行一定的指导,使得在确定预测损失时,不仅可以考虑预测数据与标注数据之间的差异,还可以考虑老师模型的预测数据与待训练神经网络模型的预测数据之间的差异,进而减少模型训练过程中可能产生的训练震荡等问 题。
前一阶段得到的若干历史神经网络模型都可以作为老师模型。其中,距离待训练神经网络模型较近的模型质量较好,距离待训练神经网络模型较远的模型与待训练神经网络模型的差异性较好。为了通过若干历史神经网络模型对待训练神经网络模型进行指导,可以将若干历史神经网络模型针对样本数据的历史预测数据进行累积,并将累积得到的预测数据作为目标预测数据,通过该目标预测数据对模型训练过程进行指导。
图3为本说明书实施例提供的融合训练方法的一种原理示意图。其中,在任意一个训练周期中,使用训练样本集中的所有样本数据Si对待训练神经网络模型NN进行训练。在以下表述中,将“模型NN”作为“待训练神经网络模型NN”的简称。Di表示模型NN针对样本数据Si确定的预测数据。模型NN1、模型NN2、模型NN3分别为训练周期1、训练周期2、训练周期3训练结束时得到的神经网络模型。虽然图3中只示出了4个训练周期,但是在实际场景中,模型训练过程可以包含更多的训练周期。至于整个模型训练过程具体包含多少个训练周期,可以在待训练神经网络模型的训练满足收敛条件时确定。
在训练周期1中,将样本数据Si输入模型NN之后,模型NN可以确定样本数据Si的预测数据Di,根据预测数据Di与标注数据的比较确定损失,向减小损失的方向更新模型NN。
训练周期1训练结束时可以得到模型NN1,模型NN1可以作为后续训练周期的历史神经网络模型。此时再将样本数据Si输入模型NN1,可以得到历史预测数据HD1,此过程并不对模型NN1进行更新。
历史预测数据HD1可以直接作为目标预测数据TD1,也可以将历史预测数据HD1与初始预测数据的累积结果作为目标预测数据TD1。
在训练周期2中,将样本数据Si输入模型NN之后,模型NN可以确定样本数据Si的预测数据Di,根据预测数据Di与标注数据的比较,以及预测数据Di与目标预测数据TD1的比较,确定损失,向减小损失的方向更新模型NN。
训练周期2训练结束时可以得到模型NN2,模型NN2可以作为后续训练周期的历史神经网络模型。此时再将样本数据Si输入模型NN2,可以得到历史预测数据HD2,此过程并不对模型NN2进行更新。
将历史预测数据HD2与目标预测数据TD1的累积结果作为更新的目标预测数据 TD2。更新的目标预测数据TD2实现了对历史预测数据HD1和历史预测数据HD2的累积。
在训练周期3中,将样本数据Si输入模型NN之后,模型NN可以确定样本数据Si的预测数据Di,根据预测数据Di与标注数据的比较,以及预测数据Di与目标预测数据TD2的比较,确定损失,向减小损失的方向更新模型NN。
训练周期3训练结束时可以得到模型NN3,模型NN3可以作为后续训练周期的历史神经网络模型。此时再将样本数据Si输入模型NN3,可以得到历史预测数据HD3,此过程并不对模型NN3进行更新。
将历史预测数据HD3与目标预测数据TD2的累积结果作为更新的目标预测数据TD3。更新的目标预测数据TD3实现了对历史预测数据HD1、历史预测数据HD2和历史预测数据HD3的累积。此后过程依次进行,直至模型NN收敛。
以上内容为结合图3对本说明书实施例进行的简略说明。下面再结合图4的流程示意图,对本说明书实施例提供的融合训练方法进行详细说明。该方法通过计算机执行,执行主体具体可以为任何具有计算、处理能力的装置、设备、平台或设备集群。针对当前的第一训练周期,当第一训练周期是第一个训练周期时,训练过程无历史神经网络模型的指导,此为无老师模式。当第一训练周期是其他训练周期时,有历史神经网络模型的指导,此为有老师模式。在有老师模式中,可以通过以下步骤S410~S450对模型训练方法进行说明。
步骤S410,获取当前的第一训练周期的待训练神经网络模型NN。
由于第一训练周期并不是第一个训练周期,因此待训练神经网络模型NN中的模型参数已经过多次训练,但是还不够准确。本实施例中可以不断地对待训练神经网络模型NN的模型参数进行调整,直至模型收敛。
步骤S420,获取训练样本集中的第一样本数据S1和对应的第一标注数据X1,将第一样本数据S1输入待训练神经网络模型NN,并得到第一样本数据S1的第一预测数据D1。
其中,第一样本数据S1可以是一个,也可以是多个(即一批)。第一样本数据可以是用于标识样本的特征数据。例如,当样本为图像时,第一样本数据可以包括图像的像素点的像素值;当样本为注册用户时,第一样本数据可以包括属性特征和行为特征等数据,属性特征可以包括用户的注册时间、性别、职业等,行为特征可以从与用户相关 的行为数据中提取。
在具体实现时,第一标注数据X1可以对应不同的数据类型,例如可以是标注值,也可以是标注分类。当第一标注数据是标注值时,待训练神经网络模型为回归模型,第一预测数据D1为预测值;当第一标注数据是标注分类时,待训练神经网络模型为分类模型,第一预测数据D1通常包括所预测的各个分类下的概率分布。例如,当分类类别包含三个时,第一标注数据可以为(0,0,1)、(0,1,0)或者(1,0,0)。
待训练神经网络模型NN可以根据模型参数确定输入的第一样本数据S1的第一预测数据D1。当第一样本数据S1的数量为多个时,可以分别通过待训练神经网络模型NN得到每一个第一样本数据S1的第一预测数据D1。
步骤S430,获取针对第一样本数据S1的第一目标预测数据。例如,当第一训练周期为训练周期3时,第一目标预测数据可以为图3中的TD2。
其中,第一目标预测数据基于对第一历史预测数据的累积而得到,第一历史预测数据包括第一训练周期之前的训练周期训练结束时得到的神经网络模型对第一样本数据的预测数据。如果将第一训练周期之前的训练周期训练结束时得到的神经网络模型作为历史神经网络模型,则第一历史预测数据包括若干历史神经网络模型对第一样本数据的预测数据。
当第一训练周期为训练周期3时,训练周期3之前的训练周期包括训练周期1和训练周期2。训练周期1和训练周期2训练结束时得到的历史神经网络模型分别为模型NN1和模型NN2。模型NN1对第一样本数据S1的预测数据为HD1,模型NN2对第一样本数据S1的预测数据为HD2。预测数据HD1和预测数据HD2也可以称为历史预测数据。第一历史预测数据包括预测数据HD1和预测数据HD2。
上述步骤S430可以在步骤S420中获取第一标注数据X1之后,将第一样本数据S1输入待训练神经网络模型NN之前执行,也可以在将第一样本数据S1输入待训练神经网络模型NN之后执行。
步骤S440,根据第一标注数据X1和第一目标预测数据(例如可以为TD2)分别与第一预测数据D1之间的比较,确定第一预测损失Loss1。
当第一样本数据S1的数量为多个时,例如针对两个第一样本数据S11和S12,分别采用步骤S440的方式确定第一样本数据S11的第一预测损失Loss11,以及第一样本数据S12的第一预测损失Loss12,对第一预测损失Loss11和Loss12进行融合,得到融合 后的第一预测损失Loss1。
根据第一预测数据D1与第一标注数据X1的比较,也可以更新待训练神经网络模型。但是,本实施例中不仅要使得第一预测数据D1与第一标注数据X1逐渐接近,还要使得第一预测数据D1与第一目标预测数据尽量接近。这样能够减少模型训练过程中的过拟合、训练震荡等问题。
步骤S450,向使得第一预测损失Loss1减小的方向,更新待训练神经网络模型NN。更新待训练神经网络模型NN,可以理解为调整待训练神经网络模型NN的模型参数,使得预测损失减小。
以上步骤S410~S450实现了对模型的一次更新,其可以理解为模型训练过程中的一轮训练,采用上述方式可以对训练样本集中的所有样本数据进行训练。
当针对待训练神经网络模型NN的训练次数大于预设次数阈值,即训练次数足够多,或者第一预测损失Loss1小于预设损失阈值时,可以确定模型训练完成,达到收敛的条件。
由上述内容可见,本实施例中,可以将第一训练周期之前的训练周期得到的若干神经网络模型作为老师模型,将若干老师模型对第一样本数据的预测数据进行累积,根据累积得到的目标预测数据,对当前的第一训练周期的模型训练进行调整。本实施例在训练神经网络模型的过程中,不仅要使得待训练神经网络模型的预测数据与标注数据尽可能接近,还要使得模型的预测数据与累积的预测数据尽可能相似,这考虑了前一阶段训练得到的若干神经网络模型对待训练神经网络模型的指导,因此能够减少模型训练过程中的震荡问题,提高神经网络模型训练的有效性,进而使得神经网络模型对业务数据的业务预测更准确。
在模型训练过程中,还可以检测第一样本数据S1是否为训练样本集中的最后一个样本数据。如果是,则将更新后的待训练神经网络模型NN确定为第一训练周期训练结束时得到的第一神经网络模型。例如,当第一训练周期为训练周期3时,训练周期3训练结束时得到神经网络模型NN3。该检测操作可以按照预设时长周期性进行。
在得到第一神经网络模型之后,还可以将第一样本数据S1输入第一神经网络模型,得到第三预测数据;将第三预测数据与第一目标预测数据融合,得到下一训练周期时针对第一样本数据的目标预测数据。
下面继续对上述实施例中若干步骤的具体实施方式进行说明。在一种实施方式中, 步骤S430,获取针对第一样本数据S1的第一目标预测数据时,可以获取第一训练周期之前的训练周期训练结束时得到的神经网络模型,将第一样本数据S1输入该神经网络模型,分别得到针对第一样本数据S1的预测数据,基于得到的预测数据的均值确定第一目标预测数据。例如,当第一训练周期为训练周期3时,可以获取训练周期3之前的训练周期训练结束时得到的神经网络模型NN2和NN1,将第一样本数据S1输入模型NN2和NN1,分别得到针对第一样本数据S1的预测数据HD2和HD1,基于预测数据HD2和HD1的均值得到第一目标预测数据TD2。
上述实施方式中,在每个训练周期都需要将样本数据输入历史神经网络模型,并对若干历史预测数据进行累积。为了避免重复计算,提高处理效率,当第一训练周期不是第一和第二个训练周期时,即不是训练周期1和训练周期2时,步骤S430获取针对第一样本数据S1的第一目标预测数据时,可以采用以下步骤1a~步骤3a所示的实施方式。
步骤1a,获取第二神经网络模型针对第一样本数据S1确定的第二预测数据。其中,第二神经网络模型在第二训练周期训练结束时得到。第二训练周期为第一训练周期的前一训练周期。例如,当第一训练周期为图3中的训练周期3时,第二训练周期为训练周期2。第二神经网络模型为模型NN2,第二预测数据可以为HD2。
本步骤中,可以预先将训练样本集中的各个样本数据输入第二神经网络模型,得到对应的预测数据集合。例如,可以在训练周期2训练结束且得到模型NN2之后,将训练样本集中各个样本数据输入模型NN2,得到对应的预测数据集合。
在步骤1a中,获取模型NN2针对第一样本数据S1确定的第二预测数据HD2时,从上述预测数据集合中读取保存的与第一样本数据S1对应的第二预测数据HD2即可。
在步骤2a中,也可以直接将第一样本数据S1输入第二神经网络模型NN2,通过第二神经网络模型NN2得到第一样本数据S1的第二预测数据HD2。
步骤2a,获取针对第一样本数据S1的第二目标预测数据。
其中,第二目标预测数据基于第二训练周期之前的训练周期训练结束时得到的神经网络模型对第一样本数据的预测数据的累积而得到。其中,第二目标预测数据可以基于对第二历史预测数据的累积而得到。第二历史预测数据包括第二训练周期之前的训练周期训练结束时得到的神经网络模型对第一样本数据S1的预测数据。例如,训练周期2之前的训练周期包括训练周期1,训练周期1训练结束时得到的神经网络模型为模型NN1。模型NN1针对第一样本数据的预测数据为HD1。因此,第二历史预测数据包括 预测数据HD1。即第二目标预测数据TD1基于对预测数据HD1的累积得到。
本实施例中,第二训练周期不是第一个训练周期。当第二训练周期不是第一个训练周期时,第二训练周期之前还存在其他训练周期,因此可以获得基于第二训练周期之前的训练周期训练结束时得到的神经网络模型对第一样本数据的预测数据的累积而得到的第二目标预测数据。
步骤3a,基于第二目标预测数据与第二预测数据的融合,确定针对第一样本数据S1的第一目标预测数据。
本步骤中,可以直接对第二目标预测数据与第二预测数据求均值,将该均值确定为第一目标预测数据。也可以获取第二目标预测数据的第一权重w1,以及第二预测数据的第二权重w2,基于第一权重w1和第二权重w2,对第二目标预测数据和第二预测数据进行加权平均,得到针对第一样本数据S1的第一目标预测数据。第一权重w1和第二权重w2可以预先设定。
仍然以第二训练周期为训练周期3为例,本步骤可以基于第二目标预测数据TD1与第二预测数据HD2的融合,确定针对第一样本数据S1的第一目标预测数据TD2。更具体的,可以直接对第二目标预测数据TD1与第二预测数据HD2求均值,将该均值确定为第一目标预测数据TD2。也可以获取第二目标预测数据TD1的第一权重w1,以及第二预测数据HD2的第二权重w2,基于第一权重w1和第二权重w2,对第二目标预测数据TD1和第二预测数据HD2进行加权平均,得到针对第一样本数据S1的第一目标预测数据TD2。
上述步骤1a~步骤3a可以在第二训练周期训练结束之后执行。针对训练样本集中的所有样本数据,均执行上述步骤1a~步骤3a的过程,得到针对训练样本集的所有样本数据的第一目标预测数据集。在第一训练周期的步骤S430中直接从保存的第一目标预测数据集中获取针对第一样本数据S1的第一目标预测数据。
如前所述的例子中,第二预测数据HD2是通过模型NN2确定,第二目标预测数据是通过模型NN2之前的历史神经网络模型确定。在模型训练过程中,模型NN2距离待训练神经网络模型NN更近,其质量更高,因此在设定权重时,可以更看重第二预测数据HD2在累积中的权重,即,使得第一权重w1小于第二权重w2。这样,能够使得越新的预测数据在第一目标预测数据中的比重越大,模型训练过程中的稳定性更好。
当当前的第一训练周期为第二个训练周期,第二训练周期是第一个训练周期时,步 骤S430中,获取针对第一样本数据S1的第一目标预测数据的具体步骤可以包括:
获取第二神经网络模型针对第一样本数据S1确定的第二预测数据;基于第二预测数据,确定针对第一样本数据S1的第一目标预测数据。
具体的,基于第二预测数据,确定针对第一样本数据S1的第一目标预测数据时,可以直接将第二预测数据确定为针对第一样本数据S1的第一目标预测数据;也可以将第二预测数据与初始预测数据的累积结果作为第一目标预测数据。
其中,在回归模型中,初始预测数据可以包括预设值。在分类模型中,初始预测数据可以包括均匀的概率分布。
例如,当第一训练周期为训练周期2,第二训练周期为训练周期1时,可以获取第二神经网络模型NN1针对第一样本数据S1确定的第二预测数据HD1;基于第二预测数据HD1,确定针对第一样本数据S1的第一目标预测数据TD1。
具体的,基于第二预测数据HD1,确定针对第一样本数据S1的第一目标预测数据TD1时,可以直接将第二预测数据HD1确定为针对第一样本数据S1的第一目标预测数据TD1;也可以将第二预测数据HD1与初始预测数据的累积结果作为第一目标预测数据TD1。
当第二训练周期为第一个训练周期时,第二训练周期之前不存在其他训练周期,因此可以直接基于第二神经网络模型得到的第二预测数据,确定针对第一样本数据的第一目标预测数据。
下面继续对图4实施例的具体实施方式进行说明,仍旧以当前的第一训练周期为训练周期3作为例子。步骤S440,根据第一标注数据X1和第一目标预测数据分别与第一预测数据D1之间的比较,确定第一预测损失Loss1的步骤,具体可以包括以下步骤2b~3b所示的实施方式。
步骤1b,根据第一标注数据X1与第一预测数据D1之间的比较,确定第一子预测损失Loss_1。
当第一标注数据X1为标注值时,第一预测数据D1为预测值,即在回归模型的训练中,步骤1b可以包括,采用平方误差函数、对数损失函数中的一种,对第一标注数据X1与第一预测数据D1进行比较,得到第一子预测损失Loss_1。
当第一标注数据X1为标注分类时,第一预测数据D1为预测分类,即在分类模型中, 步骤1b可以包括,采用KL距离、交叉熵、JS距离中的一种,对第一标注数据X1与第一预测数据D1进行比较,得到第一子预测损失Loss_1。
步骤2b,根据第一目标预测数据与第一预测数据D1之间的比较,确定第二子预测损失Loss_2。
在回归模型和分类模型的训练中,本步骤2b也可以采用与步骤1b中对应的损失函数进行计算。
步骤3b,根据第一子预测损失Loss_1和第二子预测损失Loss_1的和值,确定第一预测损失Loss1。
本步骤中,可以直接将第一子预测损失Loss_1和第二子预测损失Loss_2的和值,确定为第一预测损失Loss1。也可以依据对该和值进行预设处理后的结果确定为第一预测损失Loss1。
在无老师模式中,即当第一训练周期是第一个训练周期时,例如在图3中的训练周期1中,由于训练周期1不存在前一训练周期,因此可以直接根据第一标注数据X1和第一预测数据D1之间的比较,确定第二预测损失Loss2,向使得第二预测损失Loss2减小的方向,更新待训练神经网络模型NN。
上述待训练神经网络模型可以包括深度神经网络(Deep Neural Networks,DNN)、卷积神经网络(Convolutional Neural Networks,CNN)、循环神经网络(Recurrent Neural Network,RNN)和基于Transformer模型的双向编码器表征(Bidirectional Encoder Representations from Transformers,BERT)模型中的一种。
上述内容对本说明书的特定实施例进行了描述,其他实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行,并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要按照示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的,或者可能是有利的。
图5为本实施例提供的神经网络模型的融合训练装置的示意性框图。该装置500部署在计算机中,该装置实施例与图3~图4所示方法实施例相对应。其中,神经网络模型的模型训练过程包括若干训练周期,每个训练周期对应于使用训练样本集中所有样本数据进行模型训练的过程,所述神经网络模型用于对输入的业务数据进行业务预测。装置500包括以下模块。
第一获取模块510,配置为获取当前的第一训练周期的待训练神经网络模型。
第二获取模块520,配置为获取所述训练样本集中的第一样本数据和对应的第一标注数据,将所述第一样本数据输入所述待训练神经网络模型,并得到所述第一样本数据的第一预测数据。
第三获取模块530,配置为当所述第一训练周期不是第一个训练周期时,获取针对所述第一样本数据的第一目标预测数据;其中,所述第一目标预测数据基于对第一历史预测数据的累积而得到,所述第一历史预测数据包括所述第一训练周期之前的训练周期训练结束时得到的神经网络模型对所述第一样本数据的预测数据。
第一确定模块540,配置为根据所述第一标注数据和所述第一目标预测数据分别与所述第一预测数据之间的比较,确定第一预测损失。
第一更新模块550,配置为向使得所述第一预测损失减小的方向,更新所述待训练神经网络模型。
在一种实施方式中,装置500还包括:第一检测模块(图中未示出),配置为检测所述第一样本数据是否为所述训练样本集中的最后一个样本数据;第二确定模块(图中未示出),配置为当所述第一样本数据是所述训练样本集中的最后一个样本数据时,将更新后的待训练神经网络模型确定为所述第一训练周期训练结束时得到的第一神经网络模型。
在一种实施方式中,装置500还包括:第三确定模块(图中未示出),配置为将所述第一样本数据输入所述第一神经网络模型,得到第三预测数据;将所述第三预测数据与所述第一目标预测数据融合,得到下一训练周期时针对所述第一样本数据的目标预测数据。
在一种实施方式中,装置500还包括:第四确定模块531,配置为当所述第一训练周期是第一个训练周期时,直接根据所述第一标注数据和所述第一预测数据之间的比较,确定第二预测损失;第二更新模块541,配置为向使得所述第二预测损失减小的方向,更新所述待训练神经网络模型。
在一种实施方式中,第三获取模块530具体配置为:获取第二神经网络模型针对所述第一样本数据确定的第二预测数据;其中,所述第二神经网络模型在第二训练周期训练结束时得到,所述第二训练周期为所述第一训练周期的前一训练周期;当所述第二训练周期不是第一个训练周期时,获取针对所述第一样本数据的第二目标预测数据; 其中,所述第二目标预测数据基于所述第二训练周期之前的训练周期训练结束时得到的神经网络模型对所述第一样本数据的预测数据的累积而得到;基于所述第二目标预测数据与所述第二预测数据的融合,确定针对所述第一样本数据的第一目标预测数据。
在一种实施方式中,第三获取模块530,基于所述第二目标预测数据与所述第二预测数据的融合,确定针对所述第一样本数据的第一目标预测数据时,包括:获取所述第二目标预测数据的第一权重,以及所述第二预测数据的第二权重;基于所述第一权重和所述第二权重,对所述第二目标预测数据和所述第二预测数据进行加权平均,得到针对所述第一样本数据的第一目标预测数据。
在一种实施方式中,所述第一权重小于所述第二权重。
在一种实施方式中,第三获取模块530还配置为:当所述第二训练周期是第一个训练周期时,基于所述第二预测数据,确定针对所述第一样本数据的第一目标预测数据。
在一种实施方式中,第一确定模块540具体配置为:根据所述第一标注数据与所述第一预测数据之间的比较,确定第一子预测损失;根据所述第一目标预测数据与所述第一预测数据之间的比较,确定第二子预测损失;根据所述第一子预测损失和所述第二子预测损失的和值,确定第一预测损失。
在一种实施方式中,所述第一标注数据为标注值;所述第一确定模块540,根据所述第一标注数据与所述第一预测数据之间的比较,确定第一子预测损失时,包括:采用平方误差函数、对数损失函数中的一种,对所述第一标注数据与所述第一预测数据进行比较,得到第一子预测损失。
在一种实施方式中,所述第一标注数据为标注分类;所述第一确定模块540,根据所述第一标注数据与所述第一预测数据之间的比较,确定第一子预测损失时,包括:采用KL距离、交叉熵、JS距离中的一种,对所述第一标注数据与所述第一预测数据进行比较,得到第一子预测损失。
在一种实施方式中,所述待训练神经网络模型包括DNN、CNN、RNN和BERT模型中的一种;所述业务数据包括:文本、图像、音频、对象数据中的至少一种。
上述装置实施例与方法实施例相对应,具体说明可以参见方法实施例部分的描述,此处不再赘述。装置实施例是基于对应的方法实施例得到,与对应的方法实施例具有同样的技术效果,具体说明可参见对应的方法实施例。
在本说明书的另一实施例中,提供了一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行图3~图4描述的方法。
在本说明书的另一实施例中,提供了一种计算设备,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现图3~图4描述的方法。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于存储介质和计算设备实施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。
本领域技术人员应该可以意识到,在上述一个或多个示例中,本申请实施例所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。
以上所述的具体实施方式,对本申请实施例的目的、技术方案和有益效果进行了进一步的详细说明。所应理解的是,以上所述仅为本申请实施例的具体实施方式而已,并不用于限定本申请的保护范围,凡在本申请的技术方案的基础之上所做的任何修改、等同替换、改进等,均应包括在本申请的保护范围之内。

Claims (26)

  1. 一种神经网络模型的融合训练方法,通过计算机执行,所述神经网络模型的模型训练过程包括若干训练周期,每个训练周期对应于使用训练样本集中所有样本数据进行模型训练的过程;所述神经网络模型用于对输入的业务数据进行业务预测,所述方法包括:
    获取当前的第一训练周期的待训练神经网络模型;
    获取所述训练样本集中的第一样本数据和对应的第一标注数据,将所述第一样本数据输入所述待训练神经网络模型,并得到所述第一样本数据的第一预测数据;
    当所述第一训练周期不是第一个训练周期时,获取针对所述第一样本数据的第一目标预测数据;其中,所述第一目标预测数据基于对第一历史预测数据的累积而得到,所述第一历史预测数据包括所述第一训练周期之前的训练周期训练结束时得到的神经网络模型对所述第一样本数据的预测数据;
    根据所述第一标注数据和所述第一目标预测数据分别与所述第一预测数据之间的比较,确定第一预测损失;
    向使得所述第一预测损失减小的方向,更新所述待训练神经网络模型。
  2. 根据权利要求1所述的方法,还包括:
    检测所述第一样本数据是否为所述训练样本集中的最后一个样本数据;
    如果是,则将更新后的待训练神经网络模型确定为所述第一训练周期训练结束时得到的第一神经网络模型。
  3. 根据权利要求2所述的方法,还包括:
    将所述第一样本数据输入所述第一神经网络模型,得到第三预测数据;
    将所述第三预测数据与所述第一目标预测数据融合,得到下一训练周期时针对所述第一样本数据的目标预测数据。
  4. 根据权利要求1所述的方法,还包括:
    当所述第一训练周期是第一个训练周期时,直接根据所述第一标注数据和所述第一预测数据之间的比较,确定第二预测损失;
    向使得所述第二预测损失减小的方向,更新所述待训练神经网络模型。
  5. 根据权利要求1所述的方法,所述获取针对所述第一样本数据的第一目标预测数据的步骤,包括:
    获取第二神经网络模型针对所述第一样本数据确定的第二预测数据;其中,所述第二神经网络模型在第二训练周期训练结束时得到,所述第二训练周期为所述第一训练周 期的前一训练周期;
    当所述第二训练周期不是第一个训练周期时,获取针对所述第一样本数据的第二目标预测数据;其中,所述第二目标预测数据基于所述第二训练周期之前的训练周期训练结束时得到的神经网络模型对所述第一样本数据的预测数据的累积而得到;
    基于所述第二目标预测数据与所述第二预测数据的融合,确定针对所述第一样本数据的第一目标预测数据。
  6. 根据权利要求5所述的方法,所述基于所述第二目标预测数据与所述第二预测数据的融合,确定针对所述第一样本数据的第一目标预测数据的步骤,包括:
    获取所述第二目标预测数据的第一权重,以及所述第二预测数据的第二权重;
    基于所述第一权重和所述第二权重,对所述第二目标预测数据和所述第二预测数据进行加权平均,得到针对所述第一样本数据的第一目标预测数据。
  7. 根据权利要求6所述的方法,其中,所述第一权重小于所述第二权重。
  8. 根据权利要求5所述的方法,所述获取针对所述第一样本数据的第一目标预测数据的步骤,还包括:
    当所述第二训练周期是第一个训练周期时,基于所述第二预测数据,确定针对所述第一样本数据的第一目标预测数据。
  9. 根据权利要求1所述的方法,所述根据所述第一标注数据和所述第一目标预测数据分别与所述第一预测数据之间的比较,确定第一预测损失的步骤,包括:
    根据所述第一标注数据与所述第一预测数据之间的比较,确定第一子预测损失;
    根据所述第一目标预测数据与所述第一预测数据之间的比较,确定第二子预测损失;
    根据所述第一子预测损失和所述第二子预测损失的和值,确定第一预测损失。
  10. 根据权利要求9所述的方法,所述第一标注数据为标注值;所述根据所述第一标注数据与所述第一预测数据之间的比较,确定第一子预测损失的步骤,包括:
    采用平方误差函数、对数损失函数中的一种,对所述第一标注数据与所述第一预测数据进行比较,得到第一子预测损失。
  11. 根据权利要求9所述的方法,所述第一标注数据为标注分类;所述根据所述第一标注数据与所述第一预测数据之间的比较,确定第一子预测损失的步骤,包括:
    采用KL距离、交叉熵、JS距离中的一种,对所述第一标注数据与所述第一预测数据进行比较,得到第一子预测损失。
  12. 根据权利要求1所述的方法,所述待训练神经网络模型包括深度神经网络DNN、卷积神经网络CNN、循环神经网络RNN和BERT模型中的一种;
    所述业务数据包括:文本、图像、音频、对象数据中的至少一种。
  13. 一种神经网络模型的融合训练装置,部署在计算机中,所述神经网络模型的模型训练过程包括若干训练周期,每个训练周期对应于使用训练样本集中所有样本数据进行模型训练的过程;所述神经网络模型用于对输入的业务数据进行业务预测,所述装置包括:
    第一获取模块,配置为获取当前的第一训练周期的待训练神经网络模型;
    第二获取模块,配置为获取所述训练样本集中的第一样本数据和对应的第一标注数据,将所述第一样本数据输入所述待训练神经网络模型,并得到所述第一样本数据的第一预测数据;
    第三获取模块,配置为当所述第一训练周期不是第一个训练周期时,获取针对所述第一样本数据的第一目标预测数据;其中,所述第一目标预测数据基于对第一历史预测数据的累积而得到,所述第一历史预测数据包括所述第一训练周期之前的训练周期训练结束时得到的神经网络模型对所述第一样本数据的预测数据;
    第一确定模块,配置为根据所述第一标注数据和所述第一目标预测数据分别与所述第一预测数据之间的比较,确定第一预测损失;
    第一更新模块,配置为向使得所述第一预测损失减小的方向,更新所述待训练神经网络模型。
  14. 根据权利要求13所述的装置,还包括:
    第一检测模块,配置为检测所述第一样本数据是否为所述训练样本集中的最后一个样本数据;
    第二确定模块,配置为当所述第一样本数据是所述训练样本集中的最后一个样本数据时,将更新后的待训练神经网络模型确定为所述第一训练周期训练结束时得到的第一神经网络模型。
  15. 根据权利要求14所述的装置,还包括:
    第三确定模块,配置为将所述第一样本数据输入所述第一神经网络模型,得到第三预测数据;将所述第三预测数据与所述第一目标预测数据融合,得到下一训练周期时针对所述第一样本数据的目标预测数据。
  16. 根据权利要求13所述的装置,还包括:
    第四确定模块,配置为当所述第一训练周期是第一个训练周期时,直接根据所述第一标注数据和所述第一预测数据之间的比较,确定第二预测损失;
    第二更新模块,配置为向使得所述第二预测损失减小的方向,更新所述待训练神经 网络模型。
  17. 根据权利要求13所述的装置,所述第三获取模块,具体配置为:
    获取第二神经网络模型针对所述第一样本数据确定的第二预测数据;其中,所述第二神经网络模型在第二训练周期训练结束时得到,所述第二训练周期为所述第一训练周期的前一训练周期;
    当所述第二训练周期不是第一个训练周期时,获取针对所述第一样本数据的第二目标预测数据;其中,所述第二目标预测数据基于所述第二训练周期之前的训练周期训练结束时得到的神经网络模型对所述第一样本数据的预测数据的累积而得到;
    基于所述第二目标预测数据与所述第二预测数据的融合,确定针对所述第一样本数据的第一目标预测数据。
  18. 根据权利要求17所述的装置,所述第三获取模块,基于所述第二目标预测数据与所述第二预测数据的融合,确定针对所述第一样本数据的第一目标预测数据时,包括:
    获取所述第二目标预测数据的第一权重,以及所述第二预测数据的第二权重;
    基于所述第一权重和所述第二权重,对所述第二目标预测数据和所述第二预测数据进行加权平均,得到针对所述第一样本数据的第一目标预测数据。
  19. 根据权利要求18所述的装置,其中,所述第一权重小于所述第二权重。
  20. 根据权利要求17所述的装置,所述第三获取模块,还配置为:
    当所述第二训练周期是第一个训练周期时,基于所述第二预测数据,确定针对所述第一样本数据的第一目标预测数据。
  21. 根据权利要求13所述的装置,所述第一确定模块,具体配置为:
    根据所述第一标注数据与所述第一预测数据之间的比较,确定第一子预测损失;
    根据所述第一目标预测数据与所述第一预测数据之间的比较,确定第二子预测损失;
    根据所述第一子预测损失和所述第二子预测损失的和值,确定第一预测损失。
  22. 根据权利要求21所述的装置,所述第一标注数据为标注值;所述第一确定模块,根据所述第一标注数据与所述第一预测数据之间的比较,确定第一子预测损失时,包括:
    采用平方误差函数、对数损失函数中的一种,对所述第一标注数据与所述第一预测数据进行比较,得到第一子预测损失。
  23. 根据权利要求21所述的装置,所述第一标注数据为标注分类;所述第一确定模块,根据所述第一标注数据与所述第一预测数据之间的比较,确定第一子预测损失时, 包括:
    采用KL距离、交叉熵、JS距离中的一种,对所述第一标注数据与所述第一预测数据进行比较,得到第一子预测损失。
  24. 根据权利要求13所述的装置,所述待训练神经网络模型包括深度神经网络DNN、卷积神经网络CNN、循环神经网络RNN和BERT模型中的一种;
    所述业务数据包括:文本、图像、音频、对象数据中的至少一种。
  25. 一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行权利要求1-12中任一项所述的方法。
  26. 一种计算设备,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现权利要求1-12中任一项所述的方法。
PCT/CN2020/134777 2020-02-28 2020-12-09 神经网络模型的融合训练方法及装置 WO2021169478A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010131424.7A CN111291886B (zh) 2020-02-28 2020-02-28 神经网络模型的融合训练方法及装置
CN202010131424.7 2020-02-28

Publications (2)

Publication Number Publication Date
WO2021169478A1 true WO2021169478A1 (zh) 2021-09-02
WO2021169478A9 WO2021169478A9 (zh) 2021-10-28

Family

ID=71024581

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/134777 WO2021169478A1 (zh) 2020-02-28 2020-12-09 神经网络模型的融合训练方法及装置

Country Status (2)

Country Link
CN (1) CN111291886B (zh)
WO (1) WO2021169478A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291886B (zh) * 2020-02-28 2022-02-18 支付宝(杭州)信息技术有限公司 神经网络模型的融合训练方法及装置
CN112669078A (zh) * 2020-12-30 2021-04-16 上海众源网络有限公司 一种行为预测模型训练方法、装置、设备及存储介质

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180114110A1 (en) * 2016-10-26 2018-04-26 Samsung Electronics Co., Ltd. Method and apparatus to reduce neural network
CN109598331A (zh) * 2018-12-04 2019-04-09 北京芯盾时代科技有限公司 一种欺诈识别模型训练方法、欺诈识别方法及装置
CN109670588A (zh) * 2017-10-16 2019-04-23 优酷网络技术(北京)有限公司 神经网络预测方法及装置
US20190228312A1 (en) * 2018-01-25 2019-07-25 SparkCognition, Inc. Unsupervised model building for clustering and anomaly detection
CN110163368A (zh) * 2019-04-18 2019-08-23 腾讯科技(深圳)有限公司 基于混合精度的深度学习模型训练方法、装置及系统
US20190295305A1 (en) * 2018-03-20 2019-09-26 Adobe Inc. Retargeting skeleton motion sequences through cycle consistency adversarial training of a motion synthesis neural network with a forward kinematics layer
CN111144567A (zh) * 2019-12-31 2020-05-12 支付宝(杭州)信息技术有限公司 神经网络模型的训练方法及装置
CN111291886A (zh) * 2020-02-28 2020-06-16 支付宝(杭州)信息技术有限公司 神经网络模型的融合训练方法及装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108805258B (zh) * 2018-05-23 2021-10-12 北京图森智途科技有限公司 一种神经网络训练方法及其装置、计算机服务器
CN110399742B (zh) * 2019-07-29 2020-12-18 深圳前海微众银行股份有限公司 一种联邦迁移学习模型的训练、预测方法及装置
CN110674880B (zh) * 2019-09-27 2022-11-11 北京迈格威科技有限公司 用于知识蒸馏的网络训练方法、装置、介质与电子设备

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180114110A1 (en) * 2016-10-26 2018-04-26 Samsung Electronics Co., Ltd. Method and apparatus to reduce neural network
CN109670588A (zh) * 2017-10-16 2019-04-23 优酷网络技术(北京)有限公司 神经网络预测方法及装置
US20190228312A1 (en) * 2018-01-25 2019-07-25 SparkCognition, Inc. Unsupervised model building for clustering and anomaly detection
US20190295305A1 (en) * 2018-03-20 2019-09-26 Adobe Inc. Retargeting skeleton motion sequences through cycle consistency adversarial training of a motion synthesis neural network with a forward kinematics layer
CN109598331A (zh) * 2018-12-04 2019-04-09 北京芯盾时代科技有限公司 一种欺诈识别模型训练方法、欺诈识别方法及装置
CN110163368A (zh) * 2019-04-18 2019-08-23 腾讯科技(深圳)有限公司 基于混合精度的深度学习模型训练方法、装置及系统
CN111144567A (zh) * 2019-12-31 2020-05-12 支付宝(杭州)信息技术有限公司 神经网络模型的训练方法及装置
CN111291886A (zh) * 2020-02-28 2020-06-16 支付宝(杭州)信息技术有限公司 神经网络模型的融合训练方法及装置

Also Published As

Publication number Publication date
CN111291886A (zh) 2020-06-16
WO2021169478A9 (zh) 2021-10-28
CN111291886B (zh) 2022-02-18

Similar Documents

Publication Publication Date Title
CN110807515B (zh) 模型生成方法和装置
US11494660B2 (en) Latent code for unsupervised domain adaptation
US20190279088A1 (en) Training method, apparatus, chip, and system for neural network model
KR102582194B1 (ko) 선택적 역전파
US20200042825A1 (en) Neural network orchestration
Wei et al. TRUST: A TCP throughput prediction method in mobile networks
WO2021169478A1 (zh) 神经网络模型的融合训练方法及装置
CN110889463A (zh) 一种样本标注方法、装置、服务器及机器可读存储介质
CN110096617B (zh) 视频分类方法、装置、电子设备及计算机可读存储介质
CN111144567A (zh) 神经网络模型的训练方法及装置
CN113037783B (zh) 一种异常行为检测方法及系统
CN112052818A (zh) 无监督域适应的行人检测方法、系统及存储介质
WO2020164336A1 (zh) 通过强化学习提取主干词的方法及装置
CN113128671A (zh) 一种基于多模态机器学习的服务需求动态预测方法及系统
WO2021044192A1 (en) Detection, prediction and/or compensation of data drift in distributed clouds
US20240095529A1 (en) Neural Network Optimization Method and Apparatus
Elouali et al. Data transmission reduction formalization for cloud offloading-based IoT systems
CN110766086B (zh) 基于强化学习模型对多个分类模型进行融合的方法和装置
CN112541556A (zh) 模型构建优化方法、设备、介质及计算机程序产品
CN110390041B (zh) 在线学习方法及装置、计算机可读存储介质
CN110059743B (zh) 确定预测的可靠性度量的方法、设备和存储介质
CN112508178A (zh) 神经网络结构搜索方法、装置、电子设备及存储介质
CN113360772B (zh) 一种可解释性推荐模型训练方法与装置
US20200272883A1 (en) Reward-based updating of synpatic weights with a spiking neural network
CN114238658A (zh) 时序知识图谱的链接预测方法、装置及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20920978

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20920978

Country of ref document: EP

Kind code of ref document: A1