WO2022227400A1

WO2022227400A1 - Neural network training method and apparatus, device, and computer storage medium

Info

Publication number: WO2022227400A1
Application number: PCT/CN2021/121379
Authority: WO
Inventors: 葛艺潇; 蔡青琳; 张潇; 朱烽; 赵瑞; 李鸿升
Original assignee: 商汤集团有限公司; 博智感知交亙研究中心有限公司
Priority date: 2021-04-27
Filing date: 2021-09-28
Publication date: 2022-11-03
Also published as: CN113222139B; CN113222139A

Abstract

A neural network training method and apparatus, a device, and a computer storage medium. The method comprises: performing a cyclic process until a soft label satisfying a preset number of anchor samples is obtained (S100); training a neural network based at least on the soft label of the preset number of anchor samples, and the preset number of anchor samples (S110), wherein the current anchor sample and at least one knowledge transfer sample are determined from the current training sample set in each period when the cyclic process is executed; on the basis of the neural network, determining the similarity between the current anchor sample and each knowledge transfer sample, a prediction probability of the current anchor sample, and a prediction probability of each knowledge transfer sample (S102); and determining a soft label of the current anchor sample on the basis of the similarity between the current anchor sample and each knowledge transfer sample, the prediction probability of the current anchor sample, and the prediction probability of each knowledge transfer sample (S103). Knowledge integration under a self-distillation algorithm is achieved.

Description

Neural network training method and device, equipment, and computer storage medium

CROSS-REFERENCE TO RELATED APPLICATIONS

The present disclosure is based on the Chinese patent application with the application number of 202110462397.6, the application date of April 27, 2021, and the application name of "neural network training method and device, equipment, and computer storage medium", and requires the priority of the Chinese patent application The entire content of this Chinese patent application is incorporated herein by reference.

technical field

The present disclosure relates to the field of deep learning, and in particular, to a neural network training method, device, device, and computer storage medium.

Background technique

In recent years, edge devices such as mobile phones and wearable devices need to process deep learning-related tasks locally. However, edge devices are generally limited by limited resources and power consumption, as well as latency and cost. In order to recommend the application of products based on deep learning on edge devices, related technologies propose a model compression method called Knowledge Distillation (KD).

Among them, the model compression method based on knowledge distillation is to transfer the reasoning and prediction ability of the trained more complex "teacher" model to the simpler "student" model, that is, the "soft label" predicted by the "teacher" model is used as training supervision. to guide the training of the "student" model, thereby reducing the computing resources required by the "student" model on the edge device and improving its computing speed.

However, in order to further obtain a more accurate "soft label" to improve the network performance of the "student" model, related technologies often provide effective training supervision for the "student" model through knowledge integration distillation algorithms of multiple network models, which is highly complex. , which makes the training time and space cost large.

SUMMARY OF THE INVENTION

Embodiments of the present disclosure provide a neural network training method, apparatus, device, and computer storage medium.

The technical solution of the present disclosure is realized as follows:

Embodiments of the present disclosure provide a neural network training method, the method comprising:

Performing a looping process until soft labels satisfying a preset number of anchor samples are obtained; training the neural network based on at least the soft labels of the preset number of anchor samples and the preset number of anchor samples; wherein, The loop process includes the following steps: obtaining a current training sample set, and determining the current anchor sample and at least one knowledge transfer sample from the current training sample set in each cycle of executing the loop process; wherein the current anchor sample The fixed sample is any one of the current training sample set, and the at least one knowledge transfer sample is at least one other sample other than the current anchor sample in the current training sample set; based on the neural network, determine the The similarity between the current anchor sample and each of the knowledge transfer samples, and the predicted probability of the current anchor sample and the predicted probability of each of the knowledge transfer samples; based on the current anchor sample and each The similarity between the knowledge transfer samples, the predicted probability of the current anchor sample and the predicted probability of each of the knowledge transfer samples determine the soft label of the current anchor sample. In this way, for each training sample under the neural network, the similarity between other samples and the sample and the predicted probability of other samples can be used to assist the generation of the soft label of the training sample, and then based on the soft label of the training sample satisfying the preset number Labels perform efficient training supervision for neural networks.

Embodiments of the present disclosure provide a neural network training device, including:

The acquisition part is configured to perform a loop process until soft labels satisfying a preset number of anchor samples are obtained; wherein, the loop process includes the following steps: acquiring a current training sample set, and performing a loop process from the The current anchor sample and at least one knowledge transfer sample are determined in the current training sample set; wherein, the current anchor sample is any one of the current training sample set, and the at least one knowledge transfer sample is the current training sample At least one other sample other than the current anchor sample in the set; based on the neural network, determine the similarity between the current anchor sample and each of the knowledge transfer samples, and the similarity of the current anchor sample a predicted probability and a predicted probability for each of said knowledge transfer samples;

The training part is configured to train the neural network based on at least the soft labels of the preset number of anchor samples and the preset number of anchor samples.

An embodiment of the present disclosure provides a computer device, the computer device includes a processor and a memory storing instructions executable by the processor, when the instructions are executed by the processor, the above-mentioned neural network training is implemented method.

An embodiment of the present disclosure provides a computer-readable storage medium, on which a program is stored and applied to a computer device, and when the program is executed by a processor, the above-mentioned neural network training method is implemented.

Embodiments of the present disclosure provide a computer program, including computer-readable codes, which, when the computer-readable codes run in an electronic device and are executed by a processor in the electronic device, implement the above-mentioned neural network training method.

Embodiments of the present disclosure provide a computer program product that, when executed on a computer, enables the computer to execute the neural network training method as described above.

In the technical solutions proposed by the embodiments of the present disclosure, the computer device may perform a loop process until soft labels satisfying a preset number of anchor samples are obtained; at least based on the soft labels of the preset number of anchor samples and the preset number of anchor samples, training the neural network; wherein, the cyclic process includes the following steps: obtaining a current training sample set, and determining a current anchor sample and at least one knowledge transfer sample from the current training sample set in each cycle of executing the cyclic process; wherein, the current anchor sample The fixed sample is any one of the current training sample set, and at least one knowledge transfer sample is at least one other sample in the current training sample set and the current anchor sample; based on the neural network, determine the relationship between the current anchor sample and each knowledge transfer sample and the predicted probability of the current anchor sample and the predicted probability of each knowledge transfer sample; based on the similarity between the current anchor sample and each knowledge transfer sample, the predicted probability of the current anchor sample and each knowledge transfer sample Pass the predicted probability of the sample to determine the soft label of the current anchor sample. In this way, for each training sample under the neural network, the similarity between other samples and the sample and the predicted probability of other samples can be used to assist the generation of the soft label of the training sample, and then based on the soft label of the training sample that meets the preset number. Labels perform efficient training supervision for neural networks. It can be seen that the present disclosure replaces traditional cross-network knowledge integration with cross-sample knowledge integration under the same neural network, realizes knowledge integration based on similarity between samples and obtains effective soft labels on the basis of only using a single network.

Description of drawings

Fig. 1 is the principle schematic diagram of the knowledge integration distillation algorithm of multi-teacher model in the related art;

Fig. 2 is the principle schematic diagram of the knowledge integration distillation algorithm of multi-student model in the related art;

3 is a schematic diagram 1 of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure;

FIG. 4 is a second implementation flowchart of the neural network training method proposed by the embodiment of the present disclosure;

FIG. 5 is a schematic diagram 3 of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure;

FIG. 6 is a fourth schematic diagram of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure;

FIG. 7 is a schematic diagram five of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure;

FIG. 8 is a sixth schematic diagram of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure;

FIG. 9 is a seventh schematic diagram of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure;

FIG. 10 is a schematic diagram eight of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure;

FIG. 11 is a schematic diagram 9 of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure;

FIG. 12 is a schematic diagram ten of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure;

FIG. 13 is a schematic diagram eleventh of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure;

FIG. 14 is a schematic diagram 1 of the principle of a knowledge integration self-distillation algorithm proposed by an embodiment of the present disclosure;

FIG. 15 is a second schematic diagram of the principle of a knowledge integration self-distillation algorithm proposed by an embodiment of the present disclosure;

FIG. 16 is a schematic diagram of the composition and structure of a neural network training apparatus proposed by an embodiment of the present disclosure;

FIG. 17 is a schematic diagram of the composition and structure of a computer device according to an embodiment of the present disclosure.

Detailed ways

In order to make the purpose, technical solutions and advantages of the present disclosure clearer, the present disclosure will be further described in detail below with reference to the accompanying drawings. The described embodiments should not be regarded as limitations of the present disclosure, and those skilled in the art will not All other embodiments obtained under the premise of creative work fall within the protection scope of the present disclosure.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" can be the same or a different subset of all possible embodiments, and Can be combined with each other without conflict.

In the following description, the term "first\second\third" is only used to distinguish similar objects, and does not represent a specific ordering of objects. It is understood that "first\second\third" Where permitted, the specific order or sequence may be interchanged to enable the embodiments of the disclosure described herein to be practiced in sequences other than those illustrated or described herein.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. The terminology used herein is for the purpose of describing the embodiments of the present disclosure only and is not intended to limit the present disclosure.

Before further describing the embodiments of the present disclosure in detail, the terms and terms involved in the embodiments of the present disclosure are described, and the terms and terms involved in the embodiments of the present disclosure are suitable for the following explanations.

1) Knowledge distillation: adopt the teacher-student (Teacher-Student) model, take the complex and large model as the teacher model, namely the teacher, the student model student has a relatively simple structure, and use the teacher to assist the training of the student model. It aims to transfer the "dark" knowledge learned from the high-capacity teacher model to the student model through soft-targets, which can be class probabilities or feature representations output by the teacher, which contain more information than a single model. Labels are more complete.

Generally, a large model is often a single complex network or a collection of several networks, which has good performance and generalization ability, while a small model has limited expressive ability due to the small network size. Therefore, the knowledge learned by the large model can be used to guide the training of the small model, so that the small model has the same performance as the large model, but the number of parameters is greatly reduced, thereby achieving model compression and acceleration.

2) Self-distillation: One of the special cases of knowledge distillation, Self-Distillation refers to self-distillation to itself, which can be seen as: first perform integrated learning on two separate models F1 and F2, and then Distilled to F2. That is, the Teacher Model is an integrated version of the Student Model, called Self-Ensemble.

3) Knowledge integration: Soft labels are enhanced by integrating the knowledge of multiple pre-trained teacher models, such as the introduced multi-teacher version of knowledge distillation (Multi-Model Ensemble via Adversarial Learning, MEAL), multi-student version of knowledge distillation KDCL.

In recent years, deep neural networks have promoted the rapid development of computer vision, in which the task of image classification is regarded as one of the most basic and important tasks. There is a lot of work currently aimed at overcoming the bottleneck of performance improvement in image classification tasks, especially on large-scale datasets.

Recent studies have shown that the bottleneck of supervised image classification training comes from inaccurate "hard labels", that is, human-labeled single labels (one image, one category, one-hot label). This problem leads to imperfect learning goals. It is a key factor that hinders the further improvement of classification accuracy, and supervised learning has great limitations.

The proposal of the knowledge distillation algorithm provides a better solution to this problem, that is to use the soft probability vector predicted by a pre-trained teacher model, that is, the "soft label" as a training supervision, to guide the training of the student model.

On the other hand, edge devices, such as mobile phones, wearable devices, etc., that need to process deep learning-related tasks locally are generally limited by limited resources and power consumption, as well as latency and cost. The proposal of knowledge distillation algorithm can promote the wide application of deep learning-based products on edge devices.

It is known that ensembles of multiple networks often yield better predictions than a single network in the ensemble. Therefore, in state-of-the-art methods, multiple teachers or students are employed to encode complementary knowledge, such as by integrating the knowledge of multiple pre-trained teacher models to enhance soft labels, while their "synthetic" soft labels are more reliable Learning objectives, we call such algorithms as knowledge integration distillation algorithms.

Exemplarily, FIG. 1 is a schematic diagram of the knowledge integration distillation algorithm of the multi-teacher model in the related art. As shown in FIG. 1, the prediction probability of the teacher model #1 to the teacher model #N for the anchor sample is {1 ₁ , … , p _N }, the predicted probability of the student model for the anchored sample is p ^anchor , and knowledge integration is _performed on {p ₁ , . Migrate to student model.

Exemplarily, FIG. 2 is a schematic diagram of the knowledge integration distillation algorithm of the multi-student model in the related art. As shown in FIG. 2 , the predicted probability of the student model #1 to the student model #N for the anchor samples is {p ₁ , … , p _N }, perform knowledge integration on {p ₁ , ..., p _N }, and then use the obtained result as a soft label and transfer it to each student model by distillation.

However, although the knowledge integration distillation algorithm can provide effective training supervision, it has to rely on additional networks or branches, which has high complexity and greatly increases the training time and space cost. In view of this, how to obtain effective soft labels with less training time and space cost for more accurate network training supervision is an urgent problem to be solved, which is the content to be discussed in the embodiments of the present disclosure, and the following specific implementation will be combined example to illustrate.

Embodiments of the present disclosure provide a neural network training method, apparatus, device, and computer storage medium. For each training sample under the neural network, the similarity between other samples and the sample and the predicted probability of other samples can be used to generate The generation of soft labels for the training samples is assisted, and efficient training supervision is performed for the neural network based on soft labels satisfying a preset number of training samples. It can be seen that the present disclosure replaces traditional cross-network knowledge integration with cross-sample knowledge integration under the same neural network, realizes knowledge integration based on similarity between samples and obtains effective soft labels on the basis of only using a single network.

The neural network training method proposed in the embodiments of the present disclosure is applied to computer equipment. Exemplary applications of the computer equipment proposed by the embodiments of the present disclosure will be described below. The computer equipment proposed by the embodiments of the present disclosure may be implemented as mobile phone terminals, notebook computers, tablet computers, desktop computers, smart TVs, vehicle-mounted equipment, wearable devices, and industrial equipment. Wait.

Below, the technical solutions in the embodiments of the present disclosure will be clearly and completely described with reference to the accompanying drawings in the embodiments of the present disclosure.

An embodiment of the present disclosure provides a neural network training method, and FIG. 3 is a schematic diagram 1 of the implementation flow of the neural network training method proposed in the embodiment of the present disclosure. As shown in FIG. 3 , in the embodiment of the present disclosure, a computer device executes The method of neural network training can include the following steps:

S100, a loop process is performed until soft labels satisfying a preset number of anchor samples are obtained.

It is understandable that the more accurate the labels of the training samples, the better the model training effect. In order to overcome the bottlenecks and defects caused by artificial hard labels in model training, the knowledge distillation algorithm can be used to generate more robust soft labels for the training samples. , to improve the performance of the model with efficient training supervision through soft labels.

In some embodiments, the knowledge distillation algorithm can be used to generate its corresponding soft label for each training sample in the entire training sample set; Generate its corresponding soft label.

4 is a second schematic diagram of the implementation process of the neural network training method proposed by the embodiment of the present disclosure. As shown in FIG. 4 , the cyclic process provided by the embodiment of the present disclosure includes the following steps:

S101. Obtain a current training sample set, and determine a current anchor sample and at least one knowledge transfer sample from the current training sample set in each cycle of the execution cycle; wherein, the current anchor sample is any one of the current training sample set, at least A knowledge transfer sample is at least one other sample in the current training sample set other than the current anchor sample.

In some embodiments, the current training sample set may be all data sets used for neural network training, or may be any batch of training data sets in multiple batches of training data sets used for neural network training.

In some embodiments, the anchor sample refers to a training sample in the current training data set that needs to generate soft labels; the knowledge transfer sample refers to at least one other training sample in the training data set for acting on the soft label generation of the anchor sample.

In this embodiment of the present disclosure, the computer device may determine any interesting training sample from the training sample set that has not been previously determined as the anchor sample as the current anchor sample in each cycle, and use the current training sample set, except At least one other sample other than the current anchor sample is determined as a knowledge transfer sample.

Among them, when different training samples in the training sample set are used as the current anchor samples, the corresponding knowledge transfer samples are different. For example, the same batch of samples is {x ₁ , x ₂ , x ₃ , ..., x _R }, if x ₁ is determined as the current anchor sample, then the remaining {x ₂ , x ₃ , ..., x _R } are used as The knowledge transfer sample corresponding to x ₁ ; if x ₂ is determined as the current anchor sample, then the remaining {x ₁ , x ₃ , . . . , x _R } are used as the knowledge transfer sample corresponding to x ₂ .

In an implementation of the embodiment of the present disclosure, in the training sample set, each training sample has at least one other training sample with view similarity with it.

S102. Based on the neural network, determine the similarity between the current anchor sample and each knowledge transfer sample, as well as the predicted probability of the current anchor sample and the predicted probability of each knowledge transfer sample.

It should be understood that, in this embodiment of the present disclosure, in order to transfer and integrate knowledge between samples based on sample-based view similarity, at least one knowledge transfer sample corresponding to the current anchor sample and the anchor sample is determined from the training sample set. Afterwards, the sample similarity between the anchor sample and the knowledge transfer sample, and the "knowledge" used for transfer and integration based on the similarity between the anchor sample and the knowledge transfer sample can be determined.

In some embodiments, the similarity between the current anchor sample and each knowledge transfer sample can be determined by a neural network. Here, after inputting the current anchor sample and at least one knowledge transfer sample into the neural network, the neural network can determine the sample feature of the current anchor sample and the sample feature of each knowledge transfer sample respectively, and then based on the sample feature of the current anchor sample and The sample feature of each knowledge transfer sample determines the sample similarity between the current anchor sample and each knowledge transfer sample.

In some embodiments, the "knowledge" used for transfer and integration between anchor samples and knowledge transfer samples may be predicted probabilities of samples on tasks such as image classification, object detection, and image segmentation. For example, the probability that a sample belongs to a class on a classification task.

Among them, the predicted probability of the anchor sample and the predicted probability of each knowledge transfer sample can be determined through the neural network.

S103. Determine the soft label of the current anchor sample based on the similarity between the current anchor sample and each knowledge transfer sample, the predicted probability of the current anchor sample and the predicted probability of each knowledge transfer sample.

In the embodiment of the present disclosure, after the similarity between the anchor sample and each knowledge transfer sample and the respective prediction probability of the anchor sample and each knowledge transfer sample are determined through the neural network, the current anchor sample can be further based on the current anchor sample. The similarity between the sample and each knowledge transfer sample, the predicted probability of the current anchor sample, and the predicted probability of each knowledge transfer sample perform knowledge transfer and integration operations to act on the generation of the current anchor sample soft label.

It can be understood that, in this embodiment of the present disclosure, the degree of “knowledge” influence of each knowledge transfer sample on the anchor sample can be represented based on the similarity between the current anchor sample and each knowledge transfer sample, so that each The predicted probability of a knowledge transfer sample is a weighted transfer operation of knowledge according to its "knowledge" influence degree, and for at least one knowledge transfer sample corresponding to the current anchor sample, the knowledge of each knowledge transfer sample transferred with different degrees of influence is integrated. , which work together to generate the soft label of the current anchor sample.

It can be seen that the knowledge integration between different network models under the knowledge distillation algorithm to generate the soft labels of the training samples is not the knowledge integration between different samples under a single network model based on the self-distillation algorithm. In this way, multiple other network models are no longer required to generate soft labels, but under a single network model, for each training sample, the "dark knowledge" of other samples is carried out based on the similarity between the training sample and other samples. Pass and integrate to act on the generation of soft labels for this training sample.

S110. Train the neural network based on at least the soft labels of the preset number of anchor samples and the preset number of anchor samples.

In the embodiment of the present disclosure, after the soft labels satisfying the preset number of anchor samples are generated, the objective loss function of the neural network can be updated based on at least the preset number of anchor samples and the robust soft labels of the anchor samples, In order to realize the update optimization of the neural network.

It can be seen that in order to improve the generalization ability of the model and the accuracy of training supervision, the model training process is no longer based on the hard labels corresponding to the training samples, but the self-distillation method combining the knowledge integration between the above samples. robust soft labels for model training.

The embodiments of the present disclosure provide a neural network training method. For each training sample under the neural network, the similarity between other samples and the sample and the prediction probability of the other samples can be used to assist the generation of the soft label of the training sample, In turn, efficient training supervision is performed for the neural network based on soft labels satisfying a preset number of training samples. It can be seen that the present disclosure replaces traditional cross-network knowledge integration with cross-sample knowledge integration under the same neural network, realizes knowledge integration based on similarity between samples and obtains effective soft labels on the basis of only using a single network.

In an implementation of the embodiment of the present disclosure, FIG. 5 is a schematic diagram 3 of the implementation flow of the neural network training method proposed in the embodiment of the present disclosure. As shown in FIG. 5 , the computer device determines the current anchor sample and each anchor sample based on the neural network. The method for the similarity between knowledge transfer samples, the predicted probability of the current anchor sample and the predicted probability of each knowledge transfer sample further includes the following steps:

S102a, the neural network-based encoder determines the sample feature of the current anchor sample and the sample feature of each knowledge transfer sample.

S102b, determining the similarity between the current anchor sample and each knowledge transfer sample based on the sample feature of the current anchor sample and the sample feature of each knowledge transfer sample.

In the embodiment of the present disclosure, the neural network is provided with an encoder, and the encoder is configured to perform feature extraction and feature encoding on each training sample to obtain sample features represented in the form of vectors.

In some embodiments, feature extraction can be performed on the current anchor sample and each knowledge transfer sample respectively by an encoder of a neural network, and the sample features of the current anchor sample and the sample features of each knowledge transfer sample are obtained respectively, and Feature encoding is performed on its sample features, and the sample features are represented in the form of vectors. That is, the sample feature represented by the current anchor sample in the form of a vector and the sample feature represented by the vector form of each knowledge transfer sample are determined by the encoder of the neural network.

Here, the view similarity between the current anchor sample and each knowledge transfer sample, that is, the sample similarity, may be determined based on the sample features.

In the embodiment of the present disclosure, FIG. 6 is a fourth schematic diagram of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure. As shown in FIG. 6 , based on the sample characteristics of the current anchor sample and the sample characteristics of each knowledge transfer sample, The method for determining the similarity between the current anchor sample and each knowledge transfer sample includes the following steps:

S102b1. Perform normalization processing on the sample features of the current anchor sample to obtain the normalized sample features of the current anchor sample.

S102b2: Perform normalization processing on the sample features of each knowledge transfer sample to obtain the normalized features of each knowledge transfer sample.

S102b3: Perform dot product operation on the normalized sample feature of the current anchor sample and the normalized feature of each knowledge transfer sample to obtain the similarity between the current anchor sample and each knowledge transfer sample.

It can be understood that, before determining the similarity between samples based on the sample features, it is necessary to convert the sample features in the form of vectors to the same dimension, so that the calculation of the strict similarity has a unified standard.

Here, the computer device may firstly normalize the sample features of the current anchor sample and the sample features of each knowledge transfer sample to obtain the normalized sample features of the current anchor sample and the normalized sample features of each knowledge transfer sample. To convert the sample features represented in vector form to the same dimension.

Afterwards, by performing dot product operation on the normalized feature of the current anchor sample and the normalized feature of each knowledge transfer sample, the similarity between the current anchor sample and each knowledge transfer sample is obtained.

Among them, the calculation formula of similarity between samples is as follows:

A(i,j)=σ(F(x _i )) ^T σ(F(x _j )) (1)

Among them, in formula (1), F(x _i ) is the sample feature in the form of a vector of the ith anchor sample determined by the encoder of the neural network, and F(x _j ) is the ith anchor sample corresponding to The sample features of the jth knowledge transfer sample,

is the normalization formula with rule l ₂ .

In this way, based on formula (1), the similarity between the current anchor sample and each knowledge transfer sample, that is, the pairwise sample similarity between each pair, can be calculated, and the similarity result is stored in the affinity matrix A, assuming training If the number of samples in the sample set is N, the affinity matrix representing the similarity between all samples can be in A∈R ^N×N .

It can be seen that, in the embodiment of the present disclosure, a dot product operation can be performed on the sample features of the current anchor sample and each knowledge transfer sample that are normalized and converted to the same dimension, and then the composition between the two samples can be determined. to similarity.

S102c, the neural network-based classifier determines the predicted probability of the current anchor sample and the predicted probability of each knowledge transfer sample.

In the embodiment of the present disclosure, the neural network is further provided with a classifier, and the classifier is used to determine the prediction probability corresponding to each training sample.

For example, under the classification task, for the current anchor sample and each knowledge transfer sample, the classifier of the neural network can calculate the predicted probability of the current anchor sample and each knowledge transfer sample through the softmax function. Among them, the calculation formula of the sample prediction probability is as follows:

Among them, in formula (2),

Represents the probability that a training sample belongs to class k, where T is the temperature hyperparameter and K is the total number of classes.

is the logarithmic vector of any training sample in the training sample set,

is the sum of the logarithmic vectors of all training samples in the training sample set.

For example, under the classification task, the predicted probability of training samples in the training sample set can be expressed as

Its ith anchor sample satisfies

That is, the i-th anchor sample belongs to the first category, ..., and the probability sum of the K-th category is 1.

It can be seen that, in the embodiment of the present disclosure, the similarity between the current anchor sample and each knowledge transfer sample can be obtained based on the neural network encoder, and the predicted probability of the current anchor sample and each knowledge transfer sample can be obtained by the neural network based classifier. The predicted probability of the sample is passed to further realize the generation of the anchored sample soft label based on the similarity of the sample, the predicted probability of the current anchored sample and the predicted probability of the knowledge transfer sample.

In the embodiment of the present disclosure, FIG. 7 is a schematic diagram 5 of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure. As shown in FIG. 7 , based on the similarity between the current anchor sample and each of the knowledge transfer samples , the predicted probability of the current anchor sample and the predicted probability of each knowledge transfer sample, and the method for determining the soft label of the current anchor sample includes the following steps:

S103a, based on the similarity between the current anchor sample and each knowledge transfer sample, determine a knowledge transfer parameter of each knowledge transfer sample to the current anchor sample.

In the embodiment of the present disclosure, the predicted probability of each knowledge transfer sample will be transferred to the current anchor sample with different weight values, so as to be used for the generation of the soft label of the current anchor sample. Carry out standardization processing, and calculate the "knowledge" transfer weight value of each knowledge transfer sample to the current anchor sample, that is, the knowledge transfer parameter.

8 is a sixth schematic diagram of the implementation process of the neural network training method proposed by the embodiment of the present disclosure. As shown in FIG. 8 , each knowledge transfer sample is determined based on the similarity between the current anchor sample and each knowledge transfer sample. The method of transferring the knowledge transfer parameters of the sample to the current anchor sample includes the following steps:

S103a1: Accumulate at least one similarity between the current anchor sample and each knowledge transfer sample to obtain an accumulated similarity value.

S103a2, based on the similarity between the anchor sample and each knowledge transfer sample, and the accumulated value of the similarity, determine the knowledge transfer parameter of each knowledge transfer sample to the current anchor sample.

In this embodiment of the present disclosure, the computer device may first perform an accumulation and summation process on the similarity between the current anchor sample and each knowledge transfer sample to obtain the summation result, that is, the accumulated similarity value, and then for each knowledge transfer sample , combining the similarity between the current anchor sample and each knowledge transfer sample, and the above-mentioned cumulative value of the similarity, calculate the knowledge transfer parameter of each knowledge transfer sample to the current anchor sample through the softmax function.

The calculation formula of the knowledge transfer parameter is as follows:

Among them, in formula (3), exp(A(i,j)) represents the sample similarity between the ith anchor sample and the jth knowledge transfer sample, ∑ _j≠i exp(A(i,j) ) represents the accumulated similarity value of all knowledge transfer samples corresponding to the ith anchor sample.

After the standardization of the sample similarity in formula (3), the "knowledge" transfer weight value of the current anchor sample is accumulated for each knowledge transfer sample, and the accumulated value is 1, that is,

It can be seen that in the embodiment of the present disclosure, the "knowledge" transfer weight value of each knowledge transfer sample to the anchor sample can be obtained by standardizing the similarity of the samples, and then the predicted probability of each knowledge transfer sample is carried out according to the weight value. weighted transfer.

S103b: Determine the soft label of the current anchor sample based on the knowledge transfer parameter of each knowledge transfer sample to the current anchor sample, the predicted probability of the current anchor sample, and the predicted probability of each knowledge transfer sample.

In the embodiment of the present disclosure, after obtaining the knowledge transfer parameters of each knowledge transfer sample to the current anchor sample, the "knowledge" transfer weight value of each knowledge transfer sample to the current anchor sample, the value of the current anchor sample The predicted probability and the predicted probability of each knowledge transfer sample are used to determine the soft label of the current anchor sample, until the “dark knowledge” transferred by each knowledge transfer sample is completely integrated into the current anchor sample, the current anchor sample can be obtained. Sample accurate and robust soft labels.

9 is a seventh schematic diagram of the implementation process of the neural network training method proposed by the embodiment of the present disclosure. As shown in FIG. 9 , based on each knowledge transfer sample, the knowledge transfer parameters of the current anchor sample and the prediction probability of the current anchor sample are As well as the predicted probability of each knowledge transfer sample, the specific method for determining the soft label of the current anchor sample includes the following steps:

S103b1 , performing a knowledge transfer process on the knowledge transfer parameters of the current anchor sample and the predicted probability of each knowledge transfer sample based on each knowledge transfer sample, to obtain the initial knowledge transfer probability of the current anchor sample.

In the embodiment of the present disclosure, based on the knowledge transfer parameters of each knowledge transfer sample to the current anchor sample and the predicted probability of each knowledge transfer sample, a weighted integration transfer may be performed on the “dark knowledge” of at least one knowledge transfer sample. , and then obtain the initial knowledge transfer probability of the current anchor sample.

In some embodiments, the predicted probability of each knowledge transfer sample may be weighted based on the "knowledge" transfer weight value of each knowledge transfer sample to the current anchor sample. Here, the knowledge transfer probability of each knowledge transfer sample to the current anchor sample may be first determined based on the knowledge transfer parameter of each knowledge transfer sample to the current anchor sample and the predicted probability of each knowledge transfer sample.

In some embodiments, the "knowledge" that needs to be weighted and transferred to the current anchor sample for each knowledge transfer sample may be integrated and then transferred. Here, at least one knowledge transfer probability of the current anchor sample can be accumulated for each knowledge transfer sample to obtain an accumulated value of knowledge transfer probability, and then a knowledge transfer process is performed based on the accumulated value of knowledge transfer probability to obtain at least one knowledge transfer probability. The knowledge transfer sample is transferred to the current anchor sample "dark knowledge" for the first time, that is, the initial knowledge transfer probability.

Here, the calculation formula of knowledge transfer probability is as follows:

In formula (4),

is the knowledge transfer probability corresponding to the i-th anchor sample,

is the sample similarity between the ith anchor sample and the jth knowledge transfer sample,

Initial predicted probability for the jth knowledge transfer sample.

Based on the above, the predicted probability of the training samples in the training sample set can be expressed as

In order to avoid the enhancement of self-knowledge of the current anchor sample, we use the method of A=A⊙(1-I) to discard the diagonal entries of the above affinity matrix A, where I is the identity matrix, and ⊙ represents the element-wise phase. Multiply, that is, keep the diagonal of the affinity matrix A all 0. and then

middle

That is, when calculating the knowledge transfer probability of the current anchor sample,

The "dark knowledge" transferred in , retains only the weighted integrated predicted probabilities of at least one knowledge transfer sample.

Intuitively, if the ith anchor sample and the jth knowledge transfer sample have a high similarity, that is

The value is higher, then the predicted probability of the jth knowledge transfer sample

With a larger transfer weight value, transfer to

In the embodiment of the present disclosure, after calculating the knowledge transfer probability when each training sample in the same batch is used as an anchor sample based on formula (4), the “dark knowledge” in the same batch can be transferred in parallel, that is, parallel transfer Transfer the knowledge transfer probability of all samples in the same batch, that is

in,

represent the knowledge transfer parameters when each training sample is used as an anchor sample,

It can be seen that in the embodiment of the present disclosure, weighting and integration of the predicted probabilities of other samples can be implemented to transmit to the current anchor sample.

S103b2: Perform a knowledge fusion process based on the initial knowledge transfer probability and the prediction probability of the current anchor sample to obtain the initial soft label of the current anchor sample.

In the embodiment of the present disclosure, after the “dark knowledge” of at least one knowledge transfer sample is weighted, integrated and transferred, a knowledge fusion process can be performed based on the transferred “dark knowledge” and the current “knowledge” of the current anchor sample , to get the initial soft label of the current anchor sample.

Here, after a weighted integration transfer is performed on the predicted probability of at least one knowledge transfer sample, the initial knowledge transfer probability of the current anchor sample and the predicted probability of the current anchor sample can be obtained by performing knowledge fusion based on the at least one knowledge transfer sample. Initial soft label.

The generation function of the initial soft label of the current anchor sample is as follows:

Among them, in formula (5),

The initial soft label of the ith training sample, ω is a weighting factor, is a hyperparameter, ω∈[0,1].

It should be understood that in the case of parallel transfer of "dark knowledge" of samples in the same batch, the initial soft label generation function of all training samples in the same batch is as follows:

Among them, in formula (6), Q ^T is the initial soft label containing each training sample under the same batch,

S103b3: Based on the initial soft label of the current anchor sample, perform a loop process until the predicted probability of at least one knowledge transfer sample is less than a preset probability threshold, and obtain the soft label of the current anchor sample.

It should be ensured that, in order to enable the "dark knowledge" transferred by at least one knowledge transfer sample to fully act on the generation of the soft label of the current anchor sample, the transfer and integration of knowledge can be performed multiple times until at least one knowledge transfer sample transfers the knowledge. Fully fused to the current anchor sample.

Wherein, FIG. 10 is a schematic diagram 8 of the implementation flow of the neural network training method proposed by the embodiment of the disclosure. As shown in FIG. 10 , the cyclic process includes the following steps:

S103b31. In each cycle of the loop process, perform knowledge transfer processing based on the soft label of the current anchor sample obtained in the previous cycle and each knowledge transfer parameter to obtain the knowledge transfer probability of the current anchor sample.

S103b31: Perform knowledge fusion processing based on the knowledge transfer probability of the current anchor sample and the prediction probability of the current anchor sample, and obtain the soft label of the current anchor sample in the next cycle.

It should be understood that, in order to improve the accuracy of the soft label and to better improve the performance of the student model, the above-mentioned process of weighted transfer and integration of knowledge can be carried out multiple times until convergence, so as to achieve full integration of knowledge.

Among them, the knowledge transfer and fusion process is as follows:

Among them, in formula (7), t represents the iteration of the t-th transfer and integration,

is the soft label of the ith anchor sample in the previous cycle.

It can be seen that in each cycle of the loop process, a knowledge transfer process is first performed based on the soft label of the current anchor sample obtained in the previous cycle and each knowledge transfer parameter to obtain the knowledge transfer probability of the current anchor sample, that is

Then based on the knowledge transfer probability of the current anchor sample

and the predicted probability of the current anchor sample

Perform a knowledge fusion process to obtain the soft label of the current anchor sample in the next cycle

Here, in the case of parallel transfer of the same batch of training samples, the knowledge transfer and fusion process is as follows:

Among them, in formula (8), t represents the iteration of the t-th transfer and integration,

is the soft label of the training samples in the same batch of the previous cycle.

It can be seen that in each cycle of the loop process, firstly, based on the soft label of each training sample in the same batch obtained in the previous cycle, and then perform knowledge fusion processing with the predicted probability of each training sample to obtain each Soft labels for training samples.

Execute multiple times in the loop process, that is, the knowledge transfer process and the knowledge fusion process are iterated infinite times, and the predicted probability of at least one knowledge transfer sample of each training sample is less than the preset probability threshold. For example, the predicted probability of the knowledge transfer sample is infinitely small and approaches zero. in the case of,

at the same time,

And then obtain the soft label of each training sample.

Based on the multiple transfer and fusion process of the above knowledge, the generation function of the soft label of each training sample can be estimated as:

In this way, the "dark knowledge" transmitted by at least one knowledge transfer sample of each training sample has been completely integrated into each training sample, and the soft label corresponding to each training sample has high accuracy, and the accuracy is close to 100%

For all training samples in the same batch, since they are in the same dimension, it naturally satisfies

No additional normalization is required.

It can be seen that, in the embodiment of the present disclosure, for each training sample in the same batch, the similarity of the training sample from each other sample in the same batch of training samples can be combined to determine the "darkness" of each other sample. The knowledge” is weighted and integrated to the current training samples until the knowledge is fully integrated, and an accurate and robust soft label for each training sample is obtained.

In the embodiment of the present disclosure, FIG. 11 is a schematic diagram 9 of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure. As shown in FIG. 11 , the neural network training method further includes the following steps:

S120. Acquire a training data set, where the training data set includes at least one batch of training data subsets.

S130: Select a batch of training data subsets that have not been previously selected as the training sample set from the training data set, as the current training sample set.

In the embodiment of the present disclosure, a training data set, such as ImageNet (data set), can be obtained, but considering that the training data set is too large, it is often impossible to load the training data into the computing device at one time in practical application, then we can The training data set is divided into at least one smaller occupancy-capacity training data subset for at least one batch of neural network training.

In an implementation manner of the embodiments of the present disclosure, the neural network training may be performed in the form of multiple batches, that is, mini-batches. Furthermore, any batch of training subsets in the multiple batches of training data subsets may be determined as the above-mentioned training sample set, and the soft label of the anchor sample is obtained by performing the self-distillation algorithm of knowledge integration in S101-S103.

Wherein, when the above-mentioned training sample set is a batch of training data subsets, the self-distillation algorithm of the above-mentioned knowledge integration of S101-S103 may be performed for each training sample in the training sample set, to obtain each training sample in the training sample set The soft label corresponding to the sample.

Here, after performing the above S100-S110 with a certain batch of training data subsets as the current training sample set, another batch of training data subsets in the training data set that was not previously determined as the training sample set may be continued. It is determined as the training sample set for the next round, and the self-distillation algorithm of knowledge integration of S101-S103 and the neural network training method of S100-S110 are executed to improve the network performance.

In the embodiment of the present disclosure, FIG. 12 is a schematic diagram tenth of the implementation flow of the neural network training method proposed in the embodiment of the present disclosure. As shown in FIG. 12 , the neural network training method further includes the following steps:

S140. Perform random sampling processing on the training data set to obtain at least one piece of first training data.

S150. Determine the hard label of each first training data, and continue to perform similarity sampling processing on the remaining data in the training data set that is not selected as the first training data based on the hard label of each first training data, to obtain each At least one second training data corresponding to one training data.

S160. Use a batch of training data subsets constructed based on at least one first training data and at least one second training data corresponding to each first training data as a current training sample set.

It can be understood that, in order to implement the weighted transfer and integration of knowledge between samples according to the similarity between samples, it is first necessary to ensure that in the training samples, each sample has at least one other sample with view similarity with it.

In an implementation manner of the embodiments of the present disclosure, a type of data sampler may be provided, that is, to implement sampling of training samples based on view similarity based on a general random sampling mechanism.

Wherein, in the sampling process, the above-mentioned training data set ImageNet can be randomly sampled by this type of data sampler to obtain at least one first training data; and then the artificial hard label corresponding to each first training data is determined.

Then, based on the hard label, the similarity sampling process is performed from the remaining data of the training data set, that is, at least one second training data having the same view similarity as each first training data, that is, the same hard label, is sampled, and then A batch of training data subsets is formed by the at least one first training data and at least one second training data corresponding to each first training data, and the one batch of training data subsets is used as a training sample set.

In this way, based on the above method, multiple batches of training data subsets with view similarity between samples can be selected from the training data set. , the training data subset of each batch can be used as the current training sample set.

For example, the number of first training data obtained by random sampling is N, and M pieces of second training data with view similarity are selected for each first training data, then a batch of training data subset samples is finally formed The number is N×(M+1).

It can be seen that through the data sampling method based on the sample similarity, it can be ensured that each training sample in the current training sample set has at least one other sample that is visually similar to it, so that knowledge can be obtained between samples according to the similarity between samples. Weighted delivery.

In the embodiment of the present disclosure, FIG. 13 is a schematic diagram eleventh of the implementation flow of the neural network training method proposed in the embodiment of the present disclosure. As shown in FIG. 13 , at least based on a preset number of soft labels of anchor samples and a preset number of soft labels Anchoring the samples, the method for training the neural network also includes the following steps:

S111. Determine the relative entropy of each anchor sample based on the soft label of each anchor sample and the predicted probability corresponding to the anchor sample.

S112. Determine the cross-entropy of each anchor sample based on the hard label of each anchor sample and the predicted probability corresponding to the anchor sample.

S113. Train the neural network based on the cross-entropy of the preset number of anchor samples and the relative entropy of the preset number of anchor samples.

In the embodiment of the present disclosure, the relative entropy is the KL divergence (Kullback Leibler divergence, KLD), the cross entropy and the relative entropy are both used to describe the difference between the distribution of the actual result of the sample and the distribution of the predicted result.

In the embodiment of the present disclosure, when a preset number of soft labels of anchor samples are obtained, and the neural network is trained based on at least the preset number of anchor samples and the soft labels of the anchor samples, the training samples can be calculated by calculating The two types of differences to achieve the training of the neural network.

Among them, one is the distribution of real results with artificial hard labels as samples, and the cross-entropy is determined based on the difference between the artificial hard labels and predicted probabilities; the other is the distribution of "true" results with robust soft labels as samples , the relative entropy is determined based on the difference between soft labels and predicted probabilities.

It should be understood that the training of the neural network is to minimize the cross entropy and the relative entropy, that is, the distribution of the predicted results of the samples determined by the neural network approximates the distribution of the real results of the samples. Here, since robust soft labels satisfying a preset number of anchor samples are obtained through the above-mentioned cyclic process for neural network training, and the label accuracy is good, the performance of the corresponding trained neural network is also improved.

Here, the loss function looks like this:

Among them, y∈{1,...,K} in formula (10), p(y)=[p(1),...,p(K)] represents the hard label, p ^T is the prediction probability, λ is the weight coefficient, T is the temperature and D _KL (q ^T ||p ^T ) is the KL divergence.

It can be seen that the former part is the cross entropy determined according to the hard label and the initial prediction probability, and the latter part is the KL divergence determined according to the soft label and the initial prediction probability.

In the model training process, the cross entropy and KL divergence values are calculated based on formula (10), and further minimized to realize the training of the neural network and improve the performance of the network model.

It can be seen that, in the embodiment of the present disclosure, a model can be trained with smaller data and a higher learning rate through a highly accurate and robust soft label.

FIG. 14 is a schematic diagram 1 of the principle of the knowledge integration self-distillation algorithm proposed by the embodiment of the present disclosure. As shown in FIG. 14 , { _x ₁ , . The predicted probability of the student model for anchor samples is [ ^anchor , and the predicted probability for {x ₁ , ..., x _N } is {p ₁ , ..., p _N }.

Further, knowledge integration can be performed on the predicted _{probabilities} {p ₁ , .

It can be seen that, compared with Figure 1 and Figure 2, the knowledge integration proposed in this application only adopts one network, by collecting other samples other than the anchor samples in the same batch production, and by dynamically aggregating the data from different samples in the same batch” "Dark knowledge" to obtain the knowledge set, that is, the knowledge of {x ₁ , ..., x _N } to generate robust soft labels, integrate the knowledge into a single network, and save memory and time costs to a large extent.

FIG. 15 is a second schematic diagram of the principle of the knowledge integration self-distillation algorithm proposed by the embodiment of the present disclosure. As shown in FIG. 15 , the samples in the same batch of training samples include anchor samples and at least one knowledge transfer sample {x ₁ , . . . , x _N }, apply the encoder encoder to the anchor sample and each knowledge transfer sample to perform feature encoding respectively, and obtain the sample feature f ^anchor of the anchor sample and the sample feature of at least one knowledge transfer sample {f ₁ , f ₂ , f ₃ , ...}, estimate the similarity between the anchor sample and each knowledge transfer sample based on the sample features.

Further, the predicted probability corresponding to the sample is determined by the classifier of the current student model, including the predicted probability p ^anchor of the anchored sample and the predicted probability {p ₁ , ..., p _N } of the at least one knowledge transfer sample. For the anchored sample , based on the sample similarity, _weighted transfer and integration of {p ₁ , .

Table 1 is a comparison of the effectiveness of the knowledge integration distillation algorithms of the multi-teacher/student model, namely MEAL and KDCL, with the knowledge weighted transfer and integrated self-distillation algorithms proposed in the embodiments of the present disclosure:

Table 1

方法method	训练次数training times	标签准确性Label Accuracy	额外网络extra network
MEALMEAL	180180	78.278.2	ResNet-101&152ResNet-101&152
KDCLKDCL	200200	77.877.8	ResNet-18ResNet-18
本方案This program	100100	78,078,0	无none

Based on Table 1, it can be seen that the MEAL and KDCL methods shown in FIG. 1 and FIG. 2 both require additional network assistance, while the self-distillation algorithm for weighted knowledge transfer and integration proposed in the embodiments of the present disclosure does not require additional network assistance, and This scheme performs less training time under a single network, such as 100 times, and can obtain similar results to MEAL and KDCL methods in line pipe technology, and even half the training times of KDCL method can obtain similar label performance.

Table 2 is the effectiveness of the self-distillation algorithm of knowledge weighted transfer and integration proposed in the embodiment of the present disclosure on various network architectures:

Table 2

It can be seen from Table 2 that the related art refers to the training of traditional cross-entropy loss, and the self-distillation algorithm of knowledge weighted transfer and integration proposed in the embodiment of the present disclosure is widely used in architectures such as ResNet-5; evaluation of deeper/wider architectures such as ResNet- 152 and ResNeXt-152; and lighter architectures such as MobileNet-V2, both improve network performance with minimal computational overhead and require less Graphics Processing Unit (GPU) time.

For example, the ResNet-50 architecture is improved from 76.8 to 78.0, and it only takes 3.7% of the time.

Table 3 compares the effectiveness of the self-distillation algorithm of knowledge weighted transfer and integration proposed in the embodiment of the present disclosure and the self-distillation method in the related art:

table 3

As can be seen from Table 3, although the traditional self-distillation method and a series of label regularization algorithms such as Label smoothing, Tf-KD _reg KD _reg , BAN, CS-KD and Tf-KD _aelf are all based on a single network, they are different from traditional self-distillation methods. Compared with the distillation algorithm and the label regularization algorithm, the training results of the self-distillation algorithm with weighted knowledge transfer and integration proposed in the embodiments of the present disclosure on the dataset ImageNet all surpass the above-mentioned traditional self-distillation algorithm and label regularization algorithm. For example, the teacherless Tf-KD _reg regularization algorithm on the architecture ResNet-50, its label accuracy is 77.5%, but still lower than 0.5% of this scheme.

It can be seen that the self-distillation algorithm for weighted transfer and integration of knowledge proposed in the embodiments of the present disclosure can not only save memory and time by realizing knowledge integration in a single network, but also can aggregate knowledge from a group of samples in the same mini-batch to save memory and time. Generate equally powerful soft labels.

Based on the above embodiments, in an embodiment of the present disclosure, FIG. 16 is a schematic diagram of the composition and structure of a neural network training apparatus proposed in an embodiment of the present disclosure. As shown in FIG. 16 , the neural network training apparatus 10 includes an acquisition part 11. Training part 12, selection part 13, sampling part 14 and determination part 15.

The acquisition part 11 is configured to perform a looping process until soft labels satisfying a preset number of anchor samples are obtained; wherein, the looping process includes the following steps: acquiring the current training sample set, in each cycle of executing the looping process The current anchor sample and at least one knowledge transfer sample are determined from the current training sample set; wherein, the current anchor sample is any one of the current training sample set, and the at least one knowledge transfer sample is the At least one other sample other than the current anchor sample in the current training sample set; based on the neural network, determine the similarity between the current anchor sample and each of the knowledge transfer samples, and the current anchor The predicted probability of the anchor sample and the predicted probability of each of the knowledge transfer samples; based on the similarity between the current anchor sample and each of the knowledge transfer samples, the predicted probability of the current anchor sample and each of the knowledge transfer samples The predicted probability of the knowledge transfer sample determines the soft label of the current anchor sample.

The training part 12 is configured to train the neural network based on at least the soft labels of the preset number of anchor samples and the preset number of anchor samples.

In some embodiments, the training portion 12 is configured to determine the relative entropy of each of the anchor samples based on the soft label of each of the anchor samples and the predicted probability corresponding to the anchor sample; based on each of the anchor samples The hard label of the anchor sample and the predicted probability corresponding to the anchor sample determine the cross entropy of each anchor sample; based on the cross entropy of the preset number of anchor samples and the preset number of The relative entropy of the anchored samples trains the neural network.

In some embodiments, the neural network includes an encoder and a classifier, and the acquisition section 12 is configured to determine the sample characteristics of the current anchor sample and each of the knowledge based on the encoder of the neural network The sample feature of the transfer sample; based on the sample feature of the current anchor sample and the sample feature of each of the knowledge transfer samples, determine the similarity between the current anchor sample and each of the knowledge transfer samples; based on The classifier of the neural network determines the predicted probability of the current anchor sample and the predicted probability of each of the knowledge transfer samples.

In some embodiments, the acquiring part 11 is further configured to perform normalization processing on the sample features of the current anchor sample to obtain the normalized sample features of the current anchor sample; The sample features of the knowledge transfer samples are normalized to obtain the normalized features of each of the knowledge transfer samples; the normalized sample features of the current anchor samples and the normalized sample features of each of the knowledge transfer samples are normalized A dot product operation is performed on the transformed features to obtain the similarity between the current anchor sample and each of the knowledge transfer samples.

In some embodiments, the obtaining part 11 is further configured to determine, based on the similarity between the current anchor sample and each of the knowledge transfer samples, that each of the knowledge transfer samples corresponds to the current anchor knowledge transfer parameters of the samples; determine the Soft label for the current anchor sample.

In some embodiments, the acquiring part 11 is further configured to perform an accumulation process on at least one similarity between the current anchor sample and each of the knowledge transfer samples to obtain an accumulated similarity value; based on the The similarity between the anchor sample and each of the knowledge transfer samples and the accumulated value of the similarity determine the knowledge transfer parameter of each of the knowledge transfer samples to the current anchor sample.

In some embodiments, the acquiring part 11 is further configured to perform a knowledge transfer process on the knowledge transfer parameters of the current anchor sample and the predicted probability of each knowledge transfer sample based on each of the knowledge transfer samples , obtain the initial knowledge transfer probability of the current anchor sample; perform a knowledge fusion process based on the initial knowledge transfer probability and the predicted probability of the current anchor sample, and obtain the initial soft label of the current anchor sample; based on For the initial soft label of the current anchor sample, a loop process is performed until the predicted probability of the at least one knowledge transfer sample is less than a preset probability threshold, and the soft label of the current anchor sample is obtained; wherein, the loop The process includes: in each cycle of the cyclic process, performing knowledge transfer processing based on the soft label of the current anchor sample obtained in the previous cycle and each of the knowledge transfer parameters to obtain the knowledge transfer of the current anchor sample probability; perform knowledge fusion processing based on the knowledge transfer probability of the current anchor sample and the prediction probability of the current anchor sample to obtain the soft label of the current anchor sample in the next cycle.

In some embodiments, the acquiring part 11 is further configured to determine each knowledge transfer sample based on the knowledge transfer parameter of each of the knowledge transfer samples to the current anchor sample and the predicted probability of each of the knowledge transfer samples. 1. The knowledge transfer probability of the knowledge transfer sample to the current anchor sample; Accumulate at least one knowledge transfer probability of the current anchor sample for each knowledge transfer sample to obtain an accumulated value of the knowledge transfer probability; A knowledge transfer process is performed based on the accumulated value of the knowledge transfer probability to obtain the initial knowledge transfer probability of the current anchor sample.

In some embodiments, the obtaining part 11 is further configured to obtain a training data set, the training data set including at least one batch of training data subsets.

In some embodiments, the selection part 13 is configured to select a batch of the training data subsets that were not previously selected as the training sample set from the training data set, as the current training sample set.

In some embodiments, the sampling part 14 is configured to perform random sampling processing on the training data set to obtain at least one first training data.

In some embodiments, the determining part 15 is further configured to determine a target initial hard label corresponding to the first sample.

In some embodiments, the sampling part 14 is further configured to perform similarity sampling on the remaining data in the training data set that are not selected as the first training data based on the hard label of each of the first training data processing to obtain at least one second training data corresponding to each of the first training data.

In some embodiments, the determining part 15 is further configured to construct a batch of the said at least one first training data and at least one second training data corresponding to each said first training data The training data subset is used as the current training sample set.

In the embodiment of the present disclosure, further, FIG. 17 is a schematic diagram of the composition structure of the computer device proposed by the embodiment of the present disclosure. As shown in FIG. 17 , the computer device 20 proposed by the embodiment of the present disclosure may further include a processor 21, a storage There is a memory 22 in which the processor 21 can execute instructions. Further, the living body detection device 20 may further include a communication interface 23 and a bus 24 for connecting the processor 21 , the memory 22 and the communication interface 23 .

In the embodiment of the present disclosure, the above-mentioned processor 21 may be an application specific integrated circuit (ASIC), a digital signal processor (Digital Signal Processor, DSP), a digital signal processing device (Digital Signal Processing Device, DSPD) ), Programmable Logic Device (ProgRAMmable Logic Device, PLD), Field Programmable Gate Array (Field Prog RAMmable Gate Array, FPGA), Central Processing Unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor at least one of them. It can be understood that, for different devices, the electronic device used to implement the function of the processor may also be other, which is not specifically limited in the embodiment of the present disclosure. The living body detection device 20 may also include a memory 22, which may be connected to the processor 21, wherein the memory 22 is used to store executable program codes, which include computer operating instructions, and the memory 22 may include high-speed RAM memory, or may Also included is non-volatile memory, eg, at least two disk drives.

In the embodiment of the present disclosure, the bus 24 is used to connect the communication interface 23 , the processor 21 and the memory 22 and the mutual communication among these devices.

In the embodiment of the present disclosure, the memory 22 is used to store instructions and data.

Further, in the embodiment of the present disclosure, the above-mentioned processor 21 is configured to perform a loop process until a soft label satisfying a preset number of anchor samples is obtained; at least based on the soft label and the preset number of anchor samples The preset number of anchor samples is used to train the neural network; wherein, the cyclic process includes the following steps: acquiring a current training sample set, and determining the current training sample set from the current training sample set in each cycle of executing the cyclic process. The current anchor sample and at least one knowledge transfer sample; wherein, the current anchor sample is any one of the current training sample set, and the at least one knowledge transfer sample is the current training sample set, the current anchor sample at least one other sample other than the anchor sample; based on the neural network, determine the similarity between the current anchor sample and each of the knowledge transfer samples, and the predicted probability of the current anchor sample and each the predicted probability of the knowledge transfer sample; based on the similarity between the current anchor sample and each of the knowledge transfer samples, the predicted probability of the current anchor sample and the predicted probability of each of the knowledge transfer samples, A soft label for the current anchor sample is determined.

In practical applications, the above-mentioned memory 22 may be a volatile memory (volatile memory), such as a random access memory (Random-Access Memory, RAM); or a non-volatile memory (non-volatile memory), such as a read-only memory (Read-Only Memory, ROM), flash memory (flash memory), hard disk (Hard Disk Drive, HDD) or solid-state drive (Solid-State Drive, SSD); or a combination of the above types of memory, and send it to the processor 21 Provide instructions and data.

In addition, each functional module in this embodiment may be integrated into one recommendation unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of software function modules.

If the integrated unit is implemented in the form of software function modules and is not sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this embodiment is essentially or correct. Part of the contribution made by the prior art or all or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium, and includes several instructions to make a computer device (which can be a personal A computer, a server, or a network device, etc.) or a processor (processor) executes all or part of the steps of the method in this embodiment. The aforementioned storage medium includes: U disk, mobile hard disk, read only memory (Read Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes.

An embodiment of the present disclosure provides a computer device that can perform a loop process until soft labels satisfying a preset number of anchor samples are obtained; at least based on the soft labels of the preset number of anchor samples and the preset number The anchor samples are used to train the neural network; wherein, the cyclic process includes the following steps: obtaining the current training sample set, and determining the current anchor sample and at least one knowledge transfer sample from the current training sample set in each cycle of executing the cyclic process. ; wherein, the current anchor sample is any one of the current training sample set, and at least one knowledge transfer sample is at least one other sample other than the current training sample set and the current anchor sample; based on the neural network, determine the current anchor sample and each The similarity between knowledge transfer samples, the predicted probability of the current anchor sample and the predicted probability of each knowledge transfer sample; based on the similarity between the current anchor sample and each knowledge transfer sample, the prediction of the current anchor sample The probability and the predicted probability of each knowledge transfer sample determine the soft label of the current anchor sample.

In this way, for each training sample under the neural network, the similarity between other samples and the sample and the predicted probability of other samples can be used to assist the generation of the soft label of the training sample, and then based on the soft label of the training sample that meets the preset number. Labels perform efficient training supervision for neural networks. It can be seen that the present disclosure replaces traditional cross-network knowledge integration with cross-sample knowledge integration under the same neural network, realizes knowledge integration based on similarity between samples and obtains effective soft labels on the basis of only using a single network.

Embodiments of the present disclosure provide a computer-readable storage medium on which a program is stored, and when the program is executed by a processor, implements the above-described neural network training method.

Specifically, the program instructions corresponding to a neural network training method in this embodiment may be stored on a storage medium such as an optical disk, a hard disk, a U disk, etc. When the program instructions corresponding to a neural network training method in the storage medium When read or executed by an electronic device, it includes the following steps:

Perform a looping process until soft labels satisfying a preset number of anchor samples are obtained;

training the neural network based on at least the soft labels of the preset number of anchor samples and the preset number of anchor samples;

Wherein, the cycle process includes the following steps:

Obtain the current training sample set, and determine the current anchor sample and at least one knowledge transfer sample from the current training sample set in each cycle of the execution cycle; wherein, the current anchor sample is the current training sample Any one of the set, the at least one knowledge transfer sample is at least one other sample other than the current anchor sample in the current training sample set; based on the neural network, determine the current anchor sample and each similarity between the knowledge transfer samples, the predicted probability of the current anchor sample and the predicted probability of each of the knowledge transfer samples; based on the current anchor sample and each of the knowledge transfer samples The similarity, the predicted probability of the current anchor sample and the predicted probability of each of the knowledge transfer samples, determine the soft label of the current anchor sample.

Correspondingly, the embodiments of the present disclosure further provide a computer program product, where the computer program product includes computer-executable instructions for implementing the steps in the neural network training method proposed by the embodiments of the present disclosure.

As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied therein, including but not limited to disk storage, optical storage, and the like.

The present disclosure is described with reference to schematic flowchart illustrations and/or block diagrams of implementations of methods, apparatus (systems), and computer program products according to embodiments of the present disclosure. It will be understood that each process and/or block in the schematic flowchart illustrations and/or block diagrams, and combinations of processes and/or blocks in the schematic flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor or other programmable computing device to produce a machine such that the instructions executed by the processor of the computer or other programmable computing device produce a Means for implementing the functions specified in one or more of the flowcharts and/or one or more blocks of the block diagrams.

The computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable computer device to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instruction means Implements the functionality specified in the flow or flows of the implementation flow diagram and/or the block or blocks of the block diagram.

These computer program instructions may also be loaded onto a computer or other programmable computing device such that a series of operational steps are performed on the computer or other programmable device to produce a computer-implemented process, whereby the instructions to be executed on the computer or other programmable device Steps are provided for implementing the functions specified in the flow or blocks of the implementation flow diagram and/or the block or blocks of the block diagram.

The above descriptions are merely preferred embodiments of the present disclosure, and are not intended to limit the protection scope of the present disclosure.

Industrial Applicability

In the embodiment of the present disclosure, a loop process is performed until soft labels satisfying a preset number of anchor samples are obtained; the neural network is trained based on at least the soft labels of the preset number of anchor samples and the preset number of anchor samples ; wherein, in each cycle of the execution cycle process, the current anchor sample and at least one knowledge transfer sample are determined from the current training sample set; based on the neural network, the similarity between the current anchor sample and each knowledge transfer sample is determined, and the predicted probability of the current anchor sample and the predicted probability of each knowledge transfer sample; based on the similarity between the current anchor sample and each knowledge transfer sample, the predicted probability of the current anchor sample and the prediction of each knowledge transfer sample Probability to determine the soft label of the current anchor sample. The knowledge integration under the self-distillation algorithm is realized.

Claims

A method for training a neural network, the method comprising:

Perform a looping process until soft labels satisfying a preset number of anchor samples are obtained;

training the neural network based on at least the soft labels of the preset number of anchor samples and the preset number of anchor samples;

Wherein, the cycle process includes the following steps:

Obtain the current training sample set, and determine the current anchor sample and at least one knowledge transfer sample from the current training sample set in each cycle of the execution cycle; wherein, the current anchor sample is the current training sample Any one of the set, the at least one knowledge transfer sample is at least one other sample other than the current anchor sample in the current training sample set;

Based on the neural network, determining the similarity between the current anchor sample and each of the knowledge transfer samples, and the predicted probability of the current anchor sample and the predicted probability of each of the knowledge transfer samples;

Based on the similarity between the current anchor sample and each of the knowledge transfer samples, the predicted probability of the current anchor sample and the predicted probability of each of the knowledge transfer samples, determine the soft label.
The method according to claim 1, wherein the training of the neural network based on at least the soft labels of the preset number of anchor samples and the preset number of anchor samples comprises:

determining the relative entropy of each of the anchor samples based on the soft label of each of the anchor samples and the predicted probability corresponding to the anchor sample;

determining a cross-entropy for each of the anchor samples based on the hard label of each of the anchor samples and the predicted probability corresponding to the anchor sample;

The neural network is trained based on the cross-entropy of the preset number of anchor samples and the relative entropy of the preset number of anchor samples.
The method according to claim 1 or 2, wherein the neural network comprises an encoder and a classifier; and the determination between the current anchor sample and each of the knowledge transfer samples is based on the neural network. Similarity, and the predicted probability of the current anchor sample and the predicted probability of each of the knowledge transfer samples, including:

Based on the encoder of the neural network, determine the sample feature of the current anchor sample and the sample feature of each of the knowledge transfer samples;

determining the similarity between the current anchor sample and each of the knowledge transfer samples based on the sample feature of the current anchor sample and the sample feature of each of the knowledge transfer samples;

A classifier based on the neural network determines a predicted probability for the current anchor sample and a predicted probability for each of the knowledge transfer samples.
The method of claim 3, wherein the current anchor sample and each of the knowledge transfer samples are determined based on the sample feature of the current anchor sample and the sample feature of each of the knowledge transfer samples Similarities between, including:

normalizing the sample features of the current anchor sample to obtain the normalized sample features of the current anchor sample;

Normalizing the sample features of each of the knowledge transfer samples to obtain the normalized features of each of the knowledge transfer samples;

The normalized sample feature of the current anchor sample and the normalized feature of each of the knowledge transfer samples are subjected to a dot product operation to obtain the difference between the current anchor sample and each of the knowledge transfer samples. similarity.
The method according to any one of claims 1 to 4, wherein the predicted probability of the current anchor sample and each of the knowledge transfer samples are based on the similarity between the current anchor sample and each of the knowledge transfer samples. 1. The predicted probability of the knowledge transfer sample, to determine the soft label of the current anchor sample, including:

determining a knowledge transfer parameter of each of the knowledge transfer samples to the current anchor sample based on the similarity between the current anchor sample and each of the knowledge transfer samples;

Based on the knowledge transfer parameters of each of the knowledge transfer samples to the current anchor sample, the predicted probability of each of the knowledge transfer samples, and the predicted probability of the current anchor sample, determine the softness of the current anchor sample. Label.
6. The method of claim 5, wherein the determination of the similarity of each of the knowledge transfer samples to the current anchor sample is based on the similarity between the current anchor sample and each of the knowledge transfer samples Knowledge transfer parameters, including:

Accumulating at least one similarity between the current anchor sample and each of the knowledge transfer samples to obtain an accumulated similarity value;

Based on the similarity between the anchor sample and each of the knowledge transfer samples, and the accumulated similarity value, a knowledge transfer parameter of each of the knowledge transfer samples to the current anchor sample is determined.
The method according to claim 5 or 6, wherein the knowledge transfer parameters based on each of the knowledge transfer samples for the current anchor sample, the predicted probability of each of the knowledge transfer samples, and the current anchor The predicted probability of the anchored sample, and the soft label of the current anchored sample is determined, including:

Perform a knowledge transfer process on the knowledge transfer parameters of the current anchor sample and the predicted probability of each knowledge transfer sample based on each of the knowledge transfer samples, to obtain the initial knowledge transfer probability of the current anchor sample;

Perform a knowledge fusion process based on the initial knowledge transfer probability and the predicted probability of the current anchor sample to obtain the initial soft label of the current anchor sample;

Based on the initial soft label of the current anchor sample, a loop process is performed until the predicted probability of the at least one knowledge transfer sample is less than a preset probability threshold, and the soft label of the current anchor sample is obtained;

Wherein, the cycle process includes:

In each cycle of the cyclic process, a knowledge transfer process is performed based on the soft label of the current anchor sample obtained in the previous cycle and each of the knowledge transfer parameters, to obtain the knowledge transfer probability of the current anchor sample;

A knowledge fusion process is performed based on the knowledge transfer probability of the current anchor sample and the prediction probability of the current anchor sample to obtain the soft label of the current anchor sample in the next cycle.
The method according to claim 7, wherein the knowledge transfer process is performed once on the knowledge transfer parameter of the current anchor sample and the predicted probability of each knowledge transfer sample based on each of the knowledge transfer samples, to obtain The initial knowledge transfer probability of the current anchor sample, including:

Based on the knowledge transfer parameters of each of the knowledge transfer samples for the current anchor sample and the predicted probability of each of the knowledge transfer samples, the knowledge of each of the knowledge transfer samples on the current anchor sample is determined transmission probability;

Accumulating at least one knowledge transfer probability of the current anchor sample for each of the knowledge transfer samples to obtain an accumulated value of the knowledge transfer probability;

A knowledge transfer process is performed based on the accumulated value of the knowledge transfer probability to obtain the initial knowledge transfer probability of the current anchor sample.
The method according to any one of claims 1 to 8, wherein the method further comprises:

obtaining a training data set, the training data set including at least one batch of training data subsets;

A batch of the training data subset not previously selected as a training sample set is selected from the training data set as the current training sample set.
The method of claim 9, wherein the method further comprises:

Perform random sampling processing on the training data set to obtain at least one first training data;

Determine the hard label of each of the first training data, and perform similarity sampling processing on the remaining data in the training data set that is not selected as the first training data based on the hard label of each of the first training data to obtain at least one second training data corresponding to each of the first training data;

A batch of the training data subset constructed based on the at least one first training data and at least one second training data corresponding to each first training data is used as the current training sample set.
A training device for a neural network, the training device for a neural network comprising:

The acquisition part is configured to perform a loop process until soft labels satisfying a preset number of anchor samples are obtained; wherein, the loop process includes the following steps: acquiring a current training sample set, and performing a loop process from the The current anchor sample and at least one knowledge transfer sample are determined in the current training sample set; wherein, the current anchor sample is any one of the current training sample set, and the at least one knowledge transfer sample is the current training sample At least one other sample other than the current anchor sample in the set; based on the neural network, determine the similarity between the current anchor sample and each of the knowledge transfer samples, and the similarity of the current anchor sample a predicted probability and a predicted probability for each of said knowledge transfer samples;

The training part is configured to train the neural network based on at least the soft labels of the preset number of anchor samples and the preset number of anchor samples.
The training device of neural network according to claim 11, wherein,

The training part is further configured to determine the relative entropy of each anchor sample based on the soft label of each anchor sample and the predicted probability corresponding to the anchor sample; The hard label and the predicted probability corresponding to the anchor sample, determine the cross entropy of each anchor sample; based on the cross entropy of the preset number of anchor samples and the relative entropy of the preset number of anchor samples The neural network is trained.
The apparatus for training a neural network according to claim 11 or 12, wherein the neural network comprises an encoder and a classifier,

The acquisition part is further configured to determine the sample feature of the current anchor sample and the sample feature of each of the knowledge transfer samples based on the encoder of the neural network; based on the sample feature of the current anchor sample and The sample characteristics of each of the knowledge transfer samples determine the similarity between the current anchor sample and each of the knowledge transfer samples; the classifier based on the neural network determines the prediction of the current anchor sample probabilities and predicted probabilities for each of said knowledge transfer samples.
The training device of neural network according to claim 13, wherein,

The acquisition part is further configured to perform normalization processing on the sample features of the current anchor samples to obtain the normalized sample features of the current anchor samples; perform sample features on each of the knowledge transfer samples. Normalization processing to obtain the normalized features of each of the knowledge transfer samples; performing dot product operation processing on the normalized sample features of the current anchor sample and the normalized features of each of the knowledge transfer samples , to obtain the similarity between the current anchor sample and each of the knowledge transfer samples.
The training device of neural network according to any one of claims 11 to 14, wherein,

The acquiring part is further configured to determine, based on the similarity between the current anchor sample and each of the knowledge transfer samples, a knowledge transfer parameter of each of the knowledge transfer samples to the current anchor sample; based on The knowledge transfer parameters of each of the knowledge transfer samples to the current anchor sample, the predicted probability of the current anchor sample, and the predicted probability of each of the knowledge transfer samples, determine the soft label of the current anchor sample .
The training device of neural network according to claim 15, wherein,

The acquisition part is further configured to perform accumulation processing on at least one similarity between the current anchor sample and each of the knowledge transfer samples to obtain an accumulated similarity value; based on the anchor sample and each The similarity between the knowledge transfer samples and the accumulated value of the similarity are determined to determine the knowledge transfer parameter of each knowledge transfer sample to the current anchor sample.
The training device of neural network according to claim 15 or 16, wherein,

The acquiring part is further configured to perform a knowledge transfer process on the knowledge transfer parameters of the current anchor sample and the predicted probability of each knowledge transfer sample based on each of the knowledge transfer samples, to obtain the current anchor The initial knowledge transfer probability of the sample; perform a knowledge fusion process based on the initial knowledge transfer probability and the predicted probability of the current anchor sample to obtain the initial soft label of the current anchor sample; For the initial soft label, a loop process is performed until the predicted probability of the at least one knowledge transfer sample is less than a preset probability threshold, and the soft label of the current anchor sample is obtained; wherein, the loop process includes: in the loop process In each cycle, the knowledge transfer process is performed based on the soft label of the current anchor sample obtained in the previous cycle and each of the knowledge transfer parameters to obtain the knowledge transfer probability of the current anchor sample; based on the current anchor sample The knowledge fusion process is performed on the knowledge transfer probability of the fixed sample and the prediction probability of the current anchor sample to obtain the soft label of the current anchor sample in the next cycle.
The training device of neural network according to claim 17, wherein,

The acquiring part is further configured to determine each pair of the knowledge transfer samples based on the knowledge transfer parameters of each of the knowledge transfer samples to the current anchor sample and the predicted probability of each of the knowledge transfer samples The knowledge transfer probability of the current anchor sample; for each of the knowledge transfer samples, at least one knowledge transfer probability of the current anchor sample is accumulated to obtain an accumulated value of the knowledge transfer probability; based on the knowledge transfer probability accumulation A knowledge transfer process is performed on the value of the current anchor sample to obtain the initial knowledge transfer probability of the current anchor sample.
The neural network training device according to any one of claims 11 to 18, wherein the neural network training device further comprises a selection part,

The obtaining part is further configured to obtain a training data set, the training data set including at least one batch of training data subsets.

The selection part is configured to select a batch of the training data subsets that were not previously selected as the training sample set from the training data set, as the current training sample set.
The training device of the neural network according to claim 19, wherein the training device of the neural network further comprises a sampling part and a determination part,

The sampling part is configured to perform random sampling processing on the training data set to obtain at least one first training data.

The determining part is configured to determine the target initial hard label corresponding to the first sample.

The sampling part is further configured to perform similarity sampling processing on the remaining data in the training data set that are not selected as the first training data based on the hard label of each of the first training data, to obtain each of the first training data. At least one second training data corresponding to the first training data.

The determining part is further configured to use a batch of the training data subset constructed based on the at least one first training data and at least one second training data corresponding to each of the first training data as the the current training sample set.
A computer device comprising a processor and a memory storing instructions executable by the processor, when the instructions are executed by the processor, the implementation of any one of claims 1 to 10 is implemented method.
A computer-readable storage medium, on which a program is stored, should be configured as a computer device, and when the program is executed by a processor, the method according to any one of claims 1 to 10 is implemented.
A computer program, comprising computer-readable codes, in the case that the computer-readable codes are executed in an electronic device and executed by a processor in the electronic device, to implement the method described in any one of claims 1 to 10 Training methods for neural networks.
A computer program product which, when run on a computer, causes the computer to execute the method for training a neural network as claimed in any one of claims 1 to 10.