WO2022227400A1 - Neural network training method and apparatus, device, and computer storage medium - Google Patents

Neural network training method and apparatus, device, and computer storage medium Download PDF

Info

Publication number
WO2022227400A1
WO2022227400A1 PCT/CN2021/121379 CN2021121379W WO2022227400A1 WO 2022227400 A1 WO2022227400 A1 WO 2022227400A1 CN 2021121379 W CN2021121379 W CN 2021121379W WO 2022227400 A1 WO2022227400 A1 WO 2022227400A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample
knowledge transfer
samples
current anchor
training
Prior art date
Application number
PCT/CN2021/121379
Other languages
French (fr)
Chinese (zh)
Inventor
葛艺潇
蔡青琳
张潇
朱烽
赵瑞
李鸿升
Original Assignee
商汤集团有限公司
博智感知交亙研究中心有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 商汤集团有限公司, 博智感知交亙研究中心有限公司 filed Critical 商汤集团有限公司
Publication of WO2022227400A1 publication Critical patent/WO2022227400A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks

Definitions

  • the present disclosure relates to the field of deep learning, and in particular, to a neural network training method, device, device, and computer storage medium.
  • edge devices such as mobile phones and wearable devices need to process deep learning-related tasks locally.
  • edge devices are generally limited by limited resources and power consumption, as well as latency and cost.
  • KD Knowledge Distillation
  • the model compression method based on knowledge distillation is to transfer the reasoning and prediction ability of the trained more complex "teacher” model to the simpler “student” model, that is, the "soft label” predicted by the "teacher” model is used as training supervision. to guide the training of the "student” model, thereby reducing the computing resources required by the "student” model on the edge device and improving its computing speed.
  • Embodiments of the present disclosure provide a neural network training method, apparatus, device, and computer storage medium.
  • Embodiments of the present disclosure provide a neural network training method, the method comprising:
  • the loop process includes the following steps: obtaining a current training sample set, and determining the current anchor sample and at least one knowledge transfer sample from the current training sample set in each cycle of executing the loop process; wherein the current anchor sample The fixed sample is any one of the current training sample set, and the at least one knowledge transfer sample is at least one other sample other than the current anchor sample in the current training sample set; based on the neural network, determine the The similarity between the current anchor sample and each of the knowledge transfer samples, and the predicted probability of the current anchor sample and the predicted probability of each of the knowledge transfer samples; based on the current anchor sample and each The similarity between the knowledge transfer samples, the predicted probability of the current anchor sample and the predicted probability of each of the knowledge transfer samples determine the soft label of the current anchor sample.
  • the similarity between other samples and the sample and the predicted probability of other samples can be used to assist the generation of the soft label of the training sample, and then based on the soft label of the training sample satisfying the preset number Labels perform efficient training supervision for neural networks.
  • Embodiments of the present disclosure provide a neural network training device, including:
  • the acquisition part is configured to perform a loop process until soft labels satisfying a preset number of anchor samples are obtained; wherein, the loop process includes the following steps: acquiring a current training sample set, and performing a loop process from the The current anchor sample and at least one knowledge transfer sample are determined in the current training sample set; wherein, the current anchor sample is any one of the current training sample set, and the at least one knowledge transfer sample is the current training sample At least one other sample other than the current anchor sample in the set; based on the neural network, determine the similarity between the current anchor sample and each of the knowledge transfer samples, and the similarity of the current anchor sample a predicted probability and a predicted probability for each of said knowledge transfer samples;
  • the training part is configured to train the neural network based on at least the soft labels of the preset number of anchor samples and the preset number of anchor samples.
  • An embodiment of the present disclosure provides a computer device, the computer device includes a processor and a memory storing instructions executable by the processor, when the instructions are executed by the processor, the above-mentioned neural network training is implemented method.
  • An embodiment of the present disclosure provides a computer-readable storage medium, on which a program is stored and applied to a computer device, and when the program is executed by a processor, the above-mentioned neural network training method is implemented.
  • Embodiments of the present disclosure provide a computer program, including computer-readable codes, which, when the computer-readable codes run in an electronic device and are executed by a processor in the electronic device, implement the above-mentioned neural network training method.
  • Embodiments of the present disclosure provide a computer program product that, when executed on a computer, enables the computer to execute the neural network training method as described above.
  • the computer device may perform a loop process until soft labels satisfying a preset number of anchor samples are obtained; at least based on the soft labels of the preset number of anchor samples and the preset number of anchor samples, training the neural network; wherein, the cyclic process includes the following steps: obtaining a current training sample set, and determining a current anchor sample and at least one knowledge transfer sample from the current training sample set in each cycle of executing the cyclic process; wherein, the current anchor sample The fixed sample is any one of the current training sample set, and at least one knowledge transfer sample is at least one other sample in the current training sample set and the current anchor sample; based on the neural network, determine the relationship between the current anchor sample and each knowledge transfer sample and the predicted probability of the current anchor sample and the predicted probability of each knowledge transfer sample; based on the similarity between the current anchor sample and each knowledge transfer sample, the predicted probability of the current anchor sample and each knowledge transfer sample Pass the predicted probability of the sample to determine the soft label of the current anchor sample.
  • the similarity between other samples and the sample and the predicted probability of other samples can be used to assist the generation of the soft label of the training sample, and then based on the soft label of the training sample that meets the preset number.
  • Labels perform efficient training supervision for neural networks. It can be seen that the present disclosure replaces traditional cross-network knowledge integration with cross-sample knowledge integration under the same neural network, realizes knowledge integration based on similarity between samples and obtains effective soft labels on the basis of only using a single network.
  • Fig. 1 is the principle schematic diagram of the knowledge integration distillation algorithm of multi-teacher model in the related art
  • Fig. 2 is the principle schematic diagram of the knowledge integration distillation algorithm of multi-student model in the related art
  • FIG. 3 is a schematic diagram 1 of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure
  • FIG. 4 is a second implementation flowchart of the neural network training method proposed by the embodiment of the present disclosure.
  • FIG. 5 is a schematic diagram 3 of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure.
  • FIG. 6 is a fourth schematic diagram of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure.
  • FIG. 7 is a schematic diagram five of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure.
  • FIG. 8 is a sixth schematic diagram of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure.
  • FIG. 9 is a seventh schematic diagram of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure.
  • FIG. 10 is a schematic diagram eight of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure.
  • FIG. 11 is a schematic diagram 9 of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure.
  • FIG. 12 is a schematic diagram ten of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure.
  • FIG. 13 is a schematic diagram eleventh of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure.
  • FIG. 14 is a schematic diagram 1 of the principle of a knowledge integration self-distillation algorithm proposed by an embodiment of the present disclosure
  • FIG. 15 is a second schematic diagram of the principle of a knowledge integration self-distillation algorithm proposed by an embodiment of the present disclosure
  • FIG. 16 is a schematic diagram of the composition and structure of a neural network training apparatus proposed by an embodiment of the present disclosure.
  • FIG. 17 is a schematic diagram of the composition and structure of a computer device according to an embodiment of the present disclosure.
  • first ⁇ second ⁇ third is only used to distinguish similar objects, and does not represent a specific ordering of objects. It is understood that “first ⁇ second ⁇ third” Where permitted, the specific order or sequence may be interchanged to enable the embodiments of the disclosure described herein to be practiced in sequences other than those illustrated or described herein.
  • a large model is often a single complex network or a collection of several networks, which has good performance and generalization ability, while a small model has limited expressive ability due to the small network size. Therefore, the knowledge learned by the large model can be used to guide the training of the small model, so that the small model has the same performance as the large model, but the number of parameters is greatly reduced, thereby achieving model compression and acceleration.
  • Self-Distillation One of the special cases of knowledge distillation, Self-Distillation refers to self-distillation to itself, which can be seen as: first perform integrated learning on two separate models F1 and F2, and then Distilled to F2. That is, the Teacher Model is an integrated version of the Student Model, called Self-Ensemble.
  • Soft labels are enhanced by integrating the knowledge of multiple pre-trained teacher models, such as the introduced multi-teacher version of knowledge distillation (Multi-Model Ensemble via Adversarial Learning, MEAL), multi-student version of knowledge distillation KDCL.
  • MEAL Multi-Model Ensemble via Adversarial Learning
  • KDCL multi-student version of knowledge distillation
  • the proposal of the knowledge distillation algorithm provides a better solution to this problem, that is to use the soft probability vector predicted by a pre-trained teacher model, that is, the "soft label” as a training supervision, to guide the training of the student model.
  • edge devices such as mobile phones, wearable devices, etc.
  • edge devices that need to process deep learning-related tasks locally are generally limited by limited resources and power consumption, as well as latency and cost.
  • the proposal of knowledge distillation algorithm can promote the wide application of deep learning-based products on edge devices.
  • FIG. 1 is a schematic diagram of the knowledge integration distillation algorithm of the multi-teacher model in the related art.
  • the prediction probability of the teacher model #1 to the teacher model #N for the anchor sample is ⁇ 1 1 , ... , p N ⁇
  • the predicted probability of the student model for the anchored sample is p anchor
  • knowledge integration is performed on ⁇ p 1 , . Migrate to student model.
  • FIG. 2 is a schematic diagram of the knowledge integration distillation algorithm of the multi-student model in the related art.
  • the predicted probability of the student model #1 to the student model #N for the anchor samples is ⁇ p 1 , ... , p N ⁇ , perform knowledge integration on ⁇ p 1 , ..., p N ⁇ , and then use the obtained result as a soft label and transfer it to each student model by distillation.
  • Embodiments of the present disclosure provide a neural network training method, apparatus, device, and computer storage medium.
  • the similarity between other samples and the sample and the predicted probability of other samples can be used to generate
  • the generation of soft labels for the training samples is assisted, and efficient training supervision is performed for the neural network based on soft labels satisfying a preset number of training samples. It can be seen that the present disclosure replaces traditional cross-network knowledge integration with cross-sample knowledge integration under the same neural network, realizes knowledge integration based on similarity between samples and obtains effective soft labels on the basis of only using a single network.
  • the neural network training method proposed in the embodiments of the present disclosure is applied to computer equipment. Exemplary applications of the computer equipment proposed by the embodiments of the present disclosure will be described below.
  • the computer equipment proposed by the embodiments of the present disclosure may be implemented as mobile phone terminals, notebook computers, tablet computers, desktop computers, smart TVs, vehicle-mounted equipment, wearable devices, and industrial equipment. Wait.
  • FIG. 3 is a schematic diagram 1 of the implementation flow of the neural network training method proposed in the embodiment of the present disclosure.
  • a computer device executes The method of neural network training can include the following steps:
  • the knowledge distillation algorithm can be used to generate more robust soft labels for the training samples. , to improve the performance of the model with efficient training supervision through soft labels.
  • the knowledge distillation algorithm can be used to generate its corresponding soft label for each training sample in the entire training sample set; Generate its corresponding soft label.
  • FIG. 4 is a second schematic diagram of the implementation process of the neural network training method proposed by the embodiment of the present disclosure. As shown in FIG. 4 , the cyclic process provided by the embodiment of the present disclosure includes the following steps:
  • the current training sample set may be all data sets used for neural network training, or may be any batch of training data sets in multiple batches of training data sets used for neural network training.
  • the anchor sample refers to a training sample in the current training data set that needs to generate soft labels; the knowledge transfer sample refers to at least one other training sample in the training data set for acting on the soft label generation of the anchor sample.
  • the computer device may determine any interesting training sample from the training sample set that has not been previously determined as the anchor sample as the current anchor sample in each cycle, and use the current training sample set, except At least one other sample other than the current anchor sample is determined as a knowledge transfer sample.
  • the corresponding knowledge transfer samples are different.
  • the same batch of samples is ⁇ x 1 , x 2 , x 3 , ..., x R ⁇ , if x 1 is determined as the current anchor sample, then the remaining ⁇ x 2 , x 3 , ..., x R ⁇ are used as The knowledge transfer sample corresponding to x 1 ; if x 2 is determined as the current anchor sample, then the remaining ⁇ x 1 , x 3 , . . . , x R ⁇ are used as the knowledge transfer sample corresponding to x 2 .
  • each training sample in the training sample set, has at least one other training sample with view similarity with it.
  • At least one knowledge transfer sample corresponding to the current anchor sample and the anchor sample is determined from the training sample set. Afterwards, the sample similarity between the anchor sample and the knowledge transfer sample, and the "knowledge" used for transfer and integration based on the similarity between the anchor sample and the knowledge transfer sample can be determined.
  • the similarity between the current anchor sample and each knowledge transfer sample can be determined by a neural network.
  • the neural network can determine the sample feature of the current anchor sample and the sample feature of each knowledge transfer sample respectively, and then based on the sample feature of the current anchor sample and The sample feature of each knowledge transfer sample determines the sample similarity between the current anchor sample and each knowledge transfer sample.
  • the "knowledge" used for transfer and integration between anchor samples and knowledge transfer samples may be predicted probabilities of samples on tasks such as image classification, object detection, and image segmentation. For example, the probability that a sample belongs to a class on a classification task.
  • the predicted probability of the anchor sample and the predicted probability of each knowledge transfer sample can be determined through the neural network.
  • the current anchor sample can be further based on the current anchor sample.
  • the similarity between the sample and each knowledge transfer sample, the predicted probability of the current anchor sample, and the predicted probability of each knowledge transfer sample perform knowledge transfer and integration operations to act on the generation of the current anchor sample soft label.
  • the degree of “knowledge” influence of each knowledge transfer sample on the anchor sample can be represented based on the similarity between the current anchor sample and each knowledge transfer sample, so that each The predicted probability of a knowledge transfer sample is a weighted transfer operation of knowledge according to its "knowledge" influence degree, and for at least one knowledge transfer sample corresponding to the current anchor sample, the knowledge of each knowledge transfer sample transferred with different degrees of influence is integrated. , which work together to generate the soft label of the current anchor sample.
  • the knowledge integration between different network models under the knowledge distillation algorithm to generate the soft labels of the training samples is not the knowledge integration between different samples under a single network model based on the self-distillation algorithm. In this way, multiple other network models are no longer required to generate soft labels, but under a single network model, for each training sample, the "dark knowledge" of other samples is carried out based on the similarity between the training sample and other samples. Pass and integrate to act on the generation of soft labels for this training sample.
  • the objective loss function of the neural network can be updated based on at least the preset number of anchor samples and the robust soft labels of the anchor samples, In order to realize the update optimization of the neural network.
  • model training process is no longer based on the hard labels corresponding to the training samples, but the self-distillation method combining the knowledge integration between the above samples. robust soft labels for model training.
  • the embodiments of the present disclosure provide a neural network training method. For each training sample under the neural network, the similarity between other samples and the sample and the prediction probability of the other samples can be used to assist the generation of the soft label of the training sample, In turn, efficient training supervision is performed for the neural network based on soft labels satisfying a preset number of training samples. It can be seen that the present disclosure replaces traditional cross-network knowledge integration with cross-sample knowledge integration under the same neural network, realizes knowledge integration based on similarity between samples and obtains effective soft labels on the basis of only using a single network.
  • FIG. 5 is a schematic diagram 3 of the implementation flow of the neural network training method proposed in the embodiment of the present disclosure.
  • the computer device determines the current anchor sample and each anchor sample based on the neural network.
  • the method for the similarity between knowledge transfer samples, the predicted probability of the current anchor sample and the predicted probability of each knowledge transfer sample further includes the following steps:
  • the neural network-based encoder determines the sample feature of the current anchor sample and the sample feature of each knowledge transfer sample.
  • the neural network is provided with an encoder, and the encoder is configured to perform feature extraction and feature encoding on each training sample to obtain sample features represented in the form of vectors.
  • feature extraction can be performed on the current anchor sample and each knowledge transfer sample respectively by an encoder of a neural network, and the sample features of the current anchor sample and the sample features of each knowledge transfer sample are obtained respectively, and Feature encoding is performed on its sample features, and the sample features are represented in the form of vectors. That is, the sample feature represented by the current anchor sample in the form of a vector and the sample feature represented by the vector form of each knowledge transfer sample are determined by the encoder of the neural network.
  • the view similarity between the current anchor sample and each knowledge transfer sample may be determined based on the sample features.
  • FIG. 6 is a fourth schematic diagram of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure.
  • the method for determining the similarity between the current anchor sample and each knowledge transfer sample includes the following steps:
  • S102b Perform normalization processing on the sample features of the current anchor sample to obtain the normalized sample features of the current anchor sample.
  • S102b2 Perform normalization processing on the sample features of each knowledge transfer sample to obtain the normalized features of each knowledge transfer sample.
  • S102b3 Perform dot product operation on the normalized sample feature of the current anchor sample and the normalized feature of each knowledge transfer sample to obtain the similarity between the current anchor sample and each knowledge transfer sample.
  • the computer device may firstly normalize the sample features of the current anchor sample and the sample features of each knowledge transfer sample to obtain the normalized sample features of the current anchor sample and the normalized sample features of each knowledge transfer sample. To convert the sample features represented in vector form to the same dimension.
  • F(x i ) is the sample feature in the form of a vector of the ith anchor sample determined by the encoder of the neural network
  • F(x j ) is the ith anchor sample corresponding to The sample features of the jth knowledge transfer sample, is the normalization formula with rule l 2 .
  • the similarity between the current anchor sample and each knowledge transfer sample that is, the pairwise sample similarity between each pair
  • the similarity result is stored in the affinity matrix A, assuming training If the number of samples in the sample set is N, the affinity matrix representing the similarity between all samples can be in A ⁇ R N ⁇ N .
  • a dot product operation can be performed on the sample features of the current anchor sample and each knowledge transfer sample that are normalized and converted to the same dimension, and then the composition between the two samples can be determined. to similarity.
  • the neural network-based classifier determines the predicted probability of the current anchor sample and the predicted probability of each knowledge transfer sample.
  • the neural network is further provided with a classifier, and the classifier is used to determine the prediction probability corresponding to each training sample.
  • the classifier of the neural network can calculate the predicted probability of the current anchor sample and each knowledge transfer sample through the softmax function.
  • the calculation formula of the sample prediction probability is as follows:
  • Equation (2) Represents the probability that a training sample belongs to class k, where T is the temperature hyperparameter and K is the total number of classes. is the logarithmic vector of any training sample in the training sample set, is the sum of the logarithmic vectors of all training samples in the training sample set.
  • the predicted probability of training samples in the training sample set can be expressed as Its ith anchor sample satisfies That is, the i-th anchor sample belongs to the first category, ..., and the probability sum of the K-th category is 1.
  • the similarity between the current anchor sample and each knowledge transfer sample can be obtained based on the neural network encoder, and the predicted probability of the current anchor sample and each knowledge transfer sample can be obtained by the neural network based classifier.
  • the predicted probability of the sample is passed to further realize the generation of the anchored sample soft label based on the similarity of the sample, the predicted probability of the current anchored sample and the predicted probability of the knowledge transfer sample.
  • FIG. 7 is a schematic diagram 5 of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure.
  • the predicted probability of the current anchor sample and the predicted probability of each knowledge transfer sample, and the method for determining the soft label of the current anchor sample includes the following steps:
  • the predicted probability of each knowledge transfer sample will be transferred to the current anchor sample with different weight values, so as to be used for the generation of the soft label of the current anchor sample.
  • each knowledge transfer sample is determined based on the similarity between the current anchor sample and each knowledge transfer sample.
  • the method of transferring the knowledge transfer parameters of the sample to the current anchor sample includes the following steps:
  • S103a1 Accumulate at least one similarity between the current anchor sample and each knowledge transfer sample to obtain an accumulated similarity value.
  • the computer device may first perform an accumulation and summation process on the similarity between the current anchor sample and each knowledge transfer sample to obtain the summation result, that is, the accumulated similarity value, and then for each knowledge transfer sample , combining the similarity between the current anchor sample and each knowledge transfer sample, and the above-mentioned cumulative value of the similarity, calculate the knowledge transfer parameter of each knowledge transfer sample to the current anchor sample through the softmax function.
  • exp(A(i,j)) represents the sample similarity between the ith anchor sample and the jth knowledge transfer sample
  • ⁇ j ⁇ i exp(A(i,j) ) represents the accumulated similarity value of all knowledge transfer samples corresponding to the ith anchor sample.
  • the "knowledge" transfer weight value of the current anchor sample is accumulated for each knowledge transfer sample, and the accumulated value is 1, that is,
  • the "knowledge" transfer weight value of each knowledge transfer sample to the anchor sample can be obtained by standardizing the similarity of the samples, and then the predicted probability of each knowledge transfer sample is carried out according to the weight value. weighted transfer.
  • S103b Determine the soft label of the current anchor sample based on the knowledge transfer parameter of each knowledge transfer sample to the current anchor sample, the predicted probability of the current anchor sample, and the predicted probability of each knowledge transfer sample.
  • the "knowledge" transfer weight value of each knowledge transfer sample to the current anchor sample the value of the current anchor sample
  • the predicted probability and the predicted probability of each knowledge transfer sample are used to determine the soft label of the current anchor sample, until the “dark knowledge” transferred by each knowledge transfer sample is completely integrated into the current anchor sample, the current anchor sample can be obtained. Sample accurate and robust soft labels.
  • the knowledge transfer parameters of the current anchor sample and the prediction probability of the current anchor sample are As well as the predicted probability of each knowledge transfer sample, the specific method for determining the soft label of the current anchor sample includes the following steps:
  • a weighted integration transfer may be performed on the “dark knowledge” of at least one knowledge transfer sample. , and then obtain the initial knowledge transfer probability of the current anchor sample.
  • the predicted probability of each knowledge transfer sample may be weighted based on the "knowledge" transfer weight value of each knowledge transfer sample to the current anchor sample.
  • the knowledge transfer probability of each knowledge transfer sample to the current anchor sample may be first determined based on the knowledge transfer parameter of each knowledge transfer sample to the current anchor sample and the predicted probability of each knowledge transfer sample.
  • the "knowledge" that needs to be weighted and transferred to the current anchor sample for each knowledge transfer sample may be integrated and then transferred.
  • at least one knowledge transfer probability of the current anchor sample can be accumulated for each knowledge transfer sample to obtain an accumulated value of knowledge transfer probability, and then a knowledge transfer process is performed based on the accumulated value of knowledge transfer probability to obtain at least one knowledge transfer probability.
  • the knowledge transfer sample is transferred to the current anchor sample "dark knowledge" for the first time, that is, the initial knowledge transfer probability.
  • the predicted probability of the training samples in the training sample set can be expressed as
  • A A ⁇ (1-I) to discard the diagonal entries of the above affinity matrix A, where I is the identity matrix, and ⁇ represents the element-wise phase. Multiply, that is, keep the diagonal of the affinity matrix A all 0. and then middle That is, when calculating the knowledge transfer probability of the current anchor sample, The "dark knowledge" transferred in , retains only the weighted integrated predicted probabilities of at least one knowledge transfer sample.
  • the “dark knowledge” in the same batch can be transferred in parallel, that is, parallel transfer Transfer the knowledge transfer probability of all samples in the same batch, that is in, represent the knowledge transfer parameters when each training sample is used as an anchor sample,
  • weighting and integration of the predicted probabilities of other samples can be implemented to transmit to the current anchor sample.
  • S103b2 Perform a knowledge fusion process based on the initial knowledge transfer probability and the prediction probability of the current anchor sample to obtain the initial soft label of the current anchor sample.
  • a knowledge fusion process can be performed based on the transferred “dark knowledge” and the current “knowledge” of the current anchor sample , to get the initial soft label of the current anchor sample.
  • the initial knowledge transfer probability of the current anchor sample and the predicted probability of the current anchor sample can be obtained by performing knowledge fusion based on the at least one knowledge transfer sample.
  • Initial soft label After a weighted integration transfer is performed on the predicted probability of at least one knowledge transfer sample, the initial knowledge transfer probability of the current anchor sample and the predicted probability of the current anchor sample can be obtained by performing knowledge fusion based on the at least one knowledge transfer sample. Initial soft label.
  • the initial soft label of the ith training sample, ⁇ is a weighting factor, is a hyperparameter, ⁇ [0,1].
  • Q T is the initial soft label containing each training sample under the same batch
  • S103b3 Based on the initial soft label of the current anchor sample, perform a loop process until the predicted probability of at least one knowledge transfer sample is less than a preset probability threshold, and obtain the soft label of the current anchor sample.
  • FIG. 10 is a schematic diagram 8 of the implementation flow of the neural network training method proposed by the embodiment of the disclosure. As shown in FIG. 10 , the cyclic process includes the following steps:
  • S103b31 Perform knowledge fusion processing based on the knowledge transfer probability of the current anchor sample and the prediction probability of the current anchor sample, and obtain the soft label of the current anchor sample in the next cycle.
  • t represents the iteration of the t-th transfer and integration, is the soft label of the ith anchor sample in the previous cycle.
  • a knowledge transfer process is first performed based on the soft label of the current anchor sample obtained in the previous cycle and each knowledge transfer parameter to obtain the knowledge transfer probability of the current anchor sample, that is Then based on the knowledge transfer probability of the current anchor sample and the predicted probability of the current anchor sample Perform a knowledge fusion process to obtain the soft label of the current anchor sample in the next cycle
  • t represents the iteration of the t-th transfer and integration, is the soft label of the training samples in the same batch of the previous cycle.
  • Execute multiple times in the loop process that is, the knowledge transfer process and the knowledge fusion process are iterated infinite times, and the predicted probability of at least one knowledge transfer sample of each training sample is less than the preset probability threshold.
  • the predicted probability of the knowledge transfer sample is infinitely small and approaches zero. in the case of, at the same time, And then obtain the soft label of each training sample.
  • the generation function of the soft label of each training sample can be estimated as:
  • the "dark knowledge" transmitted by at least one knowledge transfer sample of each training sample has been completely integrated into each training sample, and the soft label corresponding to each training sample has high accuracy, and the accuracy is close to 100%
  • the similarity of the training sample from each other sample in the same batch of training samples can be combined to determine the "darkness" of each other sample.
  • the knowledge is weighted and integrated to the current training samples until the knowledge is fully integrated, and an accurate and robust soft label for each training sample is obtained.
  • FIG. 11 is a schematic diagram 9 of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure. As shown in FIG. 11 , the neural network training method further includes the following steps:
  • S130 Select a batch of training data subsets that have not been previously selected as the training sample set from the training data set, as the current training sample set.
  • a training data set such as ImageNet (data set)
  • ImageNet data set
  • the training data set is divided into at least one smaller occupancy-capacity training data subset for at least one batch of neural network training.
  • the neural network training may be performed in the form of multiple batches, that is, mini-batches. Furthermore, any batch of training subsets in the multiple batches of training data subsets may be determined as the above-mentioned training sample set, and the soft label of the anchor sample is obtained by performing the self-distillation algorithm of knowledge integration in S101-S103.
  • the self-distillation algorithm of the above-mentioned knowledge integration of S101-S103 may be performed for each training sample in the training sample set, to obtain each training sample in the training sample set The soft label corresponding to the sample.
  • FIG. 12 is a schematic diagram tenth of the implementation flow of the neural network training method proposed in the embodiment of the present disclosure. As shown in FIG. 12 , the neural network training method further includes the following steps:
  • S140 Perform random sampling processing on the training data set to obtain at least one piece of first training data.
  • S160 Use a batch of training data subsets constructed based on at least one first training data and at least one second training data corresponding to each first training data as a current training sample set.
  • each sample has at least one other sample with view similarity with it.
  • a type of data sampler may be provided, that is, to implement sampling of training samples based on view similarity based on a general random sampling mechanism.
  • the above-mentioned training data set ImageNet can be randomly sampled by this type of data sampler to obtain at least one first training data; and then the artificial hard label corresponding to each first training data is determined.
  • the similarity sampling process is performed from the remaining data of the training data set, that is, at least one second training data having the same view similarity as each first training data, that is, the same hard label, is sampled, and then A batch of training data subsets is formed by the at least one first training data and at least one second training data corresponding to each first training data, and the one batch of training data subsets is used as a training sample set.
  • multiple batches of training data subsets with view similarity between samples can be selected from the training data set.
  • the training data subset of each batch can be used as the current training sample set.
  • the number of first training data obtained by random sampling is N, and M pieces of second training data with view similarity are selected for each first training data, then a batch of training data subset samples is finally formed
  • the number is N ⁇ (M+1).
  • each training sample in the current training sample set has at least one other sample that is visually similar to it, so that knowledge can be obtained between samples according to the similarity between samples. Weighted delivery.
  • FIG. 13 is a schematic diagram eleventh of the implementation flow of the neural network training method proposed in the embodiment of the present disclosure.
  • the method for training the neural network also includes the following steps:
  • the relative entropy is the KL divergence (Kullback Leibler divergence, KLD), the cross entropy and the relative entropy are both used to describe the difference between the distribution of the actual result of the sample and the distribution of the predicted result.
  • the training samples can be calculated by calculating The two types of differences to achieve the training of the neural network.
  • one is the distribution of real results with artificial hard labels as samples, and the cross-entropy is determined based on the difference between the artificial hard labels and predicted probabilities; the other is the distribution of "true" results with robust soft labels as samples , the relative entropy is determined based on the difference between soft labels and predicted probabilities.
  • the training of the neural network is to minimize the cross entropy and the relative entropy, that is, the distribution of the predicted results of the samples determined by the neural network approximates the distribution of the real results of the samples.
  • robust soft labels satisfying a preset number of anchor samples are obtained through the above-mentioned cyclic process for neural network training, and the label accuracy is good, the performance of the corresponding trained neural network is also improved.
  • the loss function looks like this:
  • p T is the prediction probability
  • is the weight coefficient
  • T is the temperature
  • p T ) is the KL divergence.
  • the former part is the cross entropy determined according to the hard label and the initial prediction probability
  • the latter part is the KL divergence determined according to the soft label and the initial prediction probability
  • the cross entropy and KL divergence values are calculated based on formula (10), and further minimized to realize the training of the neural network and improve the performance of the network model.
  • a model can be trained with smaller data and a higher learning rate through a highly accurate and robust soft label.
  • FIG. 14 is a schematic diagram 1 of the principle of the knowledge integration self-distillation algorithm proposed by the embodiment of the present disclosure. As shown in FIG. 14 , ⁇ x 1 , .
  • the predicted probability of the student model for anchor samples is [ anchor , and the predicted probability for ⁇ x 1 , ..., x N ⁇ is ⁇ p 1 , ..., p N ⁇ .
  • knowledge integration can be performed on the predicted probabilities ⁇ p 1 , .
  • the knowledge integration proposed in this application only adopts one network, by collecting other samples other than the anchor samples in the same batch production, and by dynamically aggregating the data from different samples in the same batch” "Dark knowledge" to obtain the knowledge set, that is, the knowledge of ⁇ x 1 , ..., x N ⁇ to generate robust soft labels, integrate the knowledge into a single network, and save memory and time costs to a large extent.
  • FIG. 15 is a second schematic diagram of the principle of the knowledge integration self-distillation algorithm proposed by the embodiment of the present disclosure.
  • the samples in the same batch of training samples include anchor samples and at least one knowledge transfer sample ⁇ x 1 , . . . , x N ⁇ , apply the encoder encoder to the anchor sample and each knowledge transfer sample to perform feature encoding respectively, and obtain the sample feature f anchor of the anchor sample and the sample feature of at least one knowledge transfer sample ⁇ f 1 , f 2 , f 3 , ... ⁇ , estimate the similarity between the anchor sample and each knowledge transfer sample based on the sample features.
  • the predicted probability corresponding to the sample is determined by the classifier of the current student model, including the predicted probability p anchor of the anchored sample and the predicted probability ⁇ p 1 , ..., p N ⁇ of the at least one knowledge transfer sample.
  • the predicted probability corresponding to the sample is determined by the classifier of the current student model, including the predicted probability p anchor of the anchored sample and the predicted probability ⁇ p 1 , ..., p N ⁇ of the at least one knowledge transfer sample.
  • the predicted probability corresponding to the sample is determined by the classifier of the current student model, including the predicted probability p anchor of the anchored sample and the predicted probability ⁇ p 1 , ..., p N ⁇ of the at least one knowledge transfer sample.
  • Table 1 is a comparison of the effectiveness of the knowledge integration distillation algorithms of the multi-teacher/student model, namely MEAL and KDCL, with the knowledge weighted transfer and integrated self-distillation algorithms proposed in the embodiments of the present disclosure:
  • Table 2 is the effectiveness of the self-distillation algorithm of knowledge weighted transfer and integration proposed in the embodiment of the present disclosure on various network architectures:
  • the ResNet-50 architecture is improved from 76.8 to 78.0, and it only takes 3.7% of the time.
  • Table 3 compares the effectiveness of the self-distillation algorithm of knowledge weighted transfer and integration proposed in the embodiment of the present disclosure and the self-distillation method in the related art:
  • the traditional self-distillation method and a series of label regularization algorithms such as Label smoothing, Tf-KD reg KD reg , BAN, CS-KD and Tf-KD aelf are all based on a single network, they are different from traditional self-distillation methods.
  • the training results of the self-distillation algorithm with weighted knowledge transfer and integration proposed in the embodiments of the present disclosure on the dataset ImageNet all surpass the above-mentioned traditional self-distillation algorithm and label regularization algorithm.
  • the teacherless Tf-KD reg regularization algorithm on the architecture ResNet-50 its label accuracy is 77.5%, but still lower than 0.5% of this scheme.
  • the self-distillation algorithm for weighted transfer and integration of knowledge proposed in the embodiments of the present disclosure can not only save memory and time by realizing knowledge integration in a single network, but also can aggregate knowledge from a group of samples in the same mini-batch to save memory and time. Generate equally powerful soft labels.
  • FIG. 16 is a schematic diagram of the composition and structure of a neural network training apparatus proposed in an embodiment of the present disclosure.
  • the neural network training apparatus 10 includes an acquisition part 11. Training part 12, selection part 13, sampling part 14 and determination part 15.
  • the acquisition part 11 is configured to perform a looping process until soft labels satisfying a preset number of anchor samples are obtained; wherein, the looping process includes the following steps: acquiring the current training sample set, in each cycle of executing the looping process The current anchor sample and at least one knowledge transfer sample are determined from the current training sample set; wherein, the current anchor sample is any one of the current training sample set, and the at least one knowledge transfer sample is the At least one other sample other than the current anchor sample in the current training sample set; based on the neural network, determine the similarity between the current anchor sample and each of the knowledge transfer samples, and the current anchor The predicted probability of the anchor sample and the predicted probability of each of the knowledge transfer samples; based on the similarity between the current anchor sample and each of the knowledge transfer samples, the predicted probability of the current anchor sample and each of the knowledge transfer samples The predicted probability of the knowledge transfer sample determines the soft label of the current anchor sample.
  • the training part 12 is configured to train the neural network based on at least the soft labels of the preset number of anchor samples and the preset number of anchor samples.
  • the training portion 12 is configured to determine the relative entropy of each of the anchor samples based on the soft label of each of the anchor samples and the predicted probability corresponding to the anchor sample; based on each of the anchor samples The hard label of the anchor sample and the predicted probability corresponding to the anchor sample determine the cross entropy of each anchor sample; based on the cross entropy of the preset number of anchor samples and the preset number of The relative entropy of the anchored samples trains the neural network.
  • the neural network includes an encoder and a classifier
  • the acquisition section 12 is configured to determine the sample characteristics of the current anchor sample and each of the knowledge based on the encoder of the neural network The sample feature of the transfer sample; based on the sample feature of the current anchor sample and the sample feature of each of the knowledge transfer samples, determine the similarity between the current anchor sample and each of the knowledge transfer samples; based on The classifier of the neural network determines the predicted probability of the current anchor sample and the predicted probability of each of the knowledge transfer samples.
  • the acquiring part 11 is further configured to perform normalization processing on the sample features of the current anchor sample to obtain the normalized sample features of the current anchor sample;
  • the sample features of the knowledge transfer samples are normalized to obtain the normalized features of each of the knowledge transfer samples;
  • the normalized sample features of the current anchor samples and the normalized sample features of each of the knowledge transfer samples are normalized
  • a dot product operation is performed on the transformed features to obtain the similarity between the current anchor sample and each of the knowledge transfer samples.
  • the obtaining part 11 is further configured to determine, based on the similarity between the current anchor sample and each of the knowledge transfer samples, that each of the knowledge transfer samples corresponds to the current anchor knowledge transfer parameters of the samples; determine the Soft label for the current anchor sample.
  • the acquiring part 11 is further configured to perform an accumulation process on at least one similarity between the current anchor sample and each of the knowledge transfer samples to obtain an accumulated similarity value; based on the The similarity between the anchor sample and each of the knowledge transfer samples and the accumulated value of the similarity determine the knowledge transfer parameter of each of the knowledge transfer samples to the current anchor sample.
  • the acquiring part 11 is further configured to perform a knowledge transfer process on the knowledge transfer parameters of the current anchor sample and the predicted probability of each knowledge transfer sample based on each of the knowledge transfer samples , obtain the initial knowledge transfer probability of the current anchor sample; perform a knowledge fusion process based on the initial knowledge transfer probability and the predicted probability of the current anchor sample, and obtain the initial soft label of the current anchor sample; based on For the initial soft label of the current anchor sample, a loop process is performed until the predicted probability of the at least one knowledge transfer sample is less than a preset probability threshold, and the soft label of the current anchor sample is obtained; wherein, the loop The process includes: in each cycle of the cyclic process, performing knowledge transfer processing based on the soft label of the current anchor sample obtained in the previous cycle and each of the knowledge transfer parameters to obtain the knowledge transfer of the current anchor sample probability; perform knowledge fusion processing based on the knowledge transfer probability of the current anchor sample and the prediction probability of the current anchor sample to obtain the soft label of the current anchor sample in the next cycle.
  • the acquiring part 11 is further configured to determine each knowledge transfer sample based on the knowledge transfer parameter of each of the knowledge transfer samples to the current anchor sample and the predicted probability of each of the knowledge transfer samples. 1. The knowledge transfer probability of the knowledge transfer sample to the current anchor sample; Accumulate at least one knowledge transfer probability of the current anchor sample for each knowledge transfer sample to obtain an accumulated value of the knowledge transfer probability; A knowledge transfer process is performed based on the accumulated value of the knowledge transfer probability to obtain the initial knowledge transfer probability of the current anchor sample.
  • the obtaining part 11 is further configured to obtain a training data set, the training data set including at least one batch of training data subsets.
  • the selection part 13 is configured to select a batch of the training data subsets that were not previously selected as the training sample set from the training data set, as the current training sample set.
  • the sampling part 14 is configured to perform random sampling processing on the training data set to obtain at least one first training data.
  • the determining part 15 is further configured to determine a target initial hard label corresponding to the first sample.
  • the sampling part 14 is further configured to perform similarity sampling on the remaining data in the training data set that are not selected as the first training data based on the hard label of each of the first training data processing to obtain at least one second training data corresponding to each of the first training data.
  • the determining part 15 is further configured to construct a batch of the said at least one first training data and at least one second training data corresponding to each said first training data The training data subset is used as the current training sample set.
  • FIG. 17 is a schematic diagram of the composition structure of the computer device proposed by the embodiment of the present disclosure.
  • the computer device 20 proposed by the embodiment of the present disclosure may further include a processor 21, a storage There is a memory 22 in which the processor 21 can execute instructions.
  • the living body detection device 20 may further include a communication interface 23 and a bus 24 for connecting the processor 21 , the memory 22 and the communication interface 23 .
  • the above-mentioned processor 21 may be an application specific integrated circuit (ASIC), a digital signal processor (Digital Signal Processor, DSP), a digital signal processing device (Digital Signal Processing Device, DSPD) ), Programmable Logic Device (ProgRAMmable Logic Device, PLD), Field Programmable Gate Array (Field Prog RAMmable Gate Array, FPGA), Central Processing Unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor at least one of them.
  • ASIC application specific integrated circuit
  • DSP Digital Signal Processor
  • DSPD digital signal processing device
  • PLD Programmable Logic Device
  • Field Programmable Gate Array Field Prog RAMmable Gate Array
  • FPGA Field Prog RAMmable Gate Array
  • CPU Central Processing Unit
  • controller microcontroller, microprocessor at least one of them.
  • the living body detection device 20 may also include a memory 22, which may be connected to the processor 21, wherein the memory 22 is used to store executable program codes, which include computer operating instructions, and the memory 22 may include high-speed RAM memory, or may Also included is non-volatile memory, eg, at least two disk drives.
  • the bus 24 is used to connect the communication interface 23 , the processor 21 and the memory 22 and the mutual communication among these devices.
  • the memory 22 is used to store instructions and data.
  • the above-mentioned processor 21 is configured to perform a loop process until a soft label satisfying a preset number of anchor samples is obtained; at least based on the soft label and the preset number of anchor samples The preset number of anchor samples is used to train the neural network; wherein, the cyclic process includes the following steps: acquiring a current training sample set, and determining the current training sample set from the current training sample set in each cycle of executing the cyclic process.
  • the current anchor sample and at least one knowledge transfer sample wherein, the current anchor sample is any one of the current training sample set, and the at least one knowledge transfer sample is the current training sample set, the current anchor sample at least one other sample other than the anchor sample; based on the neural network, determine the similarity between the current anchor sample and each of the knowledge transfer samples, and the predicted probability of the current anchor sample and each the predicted probability of the knowledge transfer sample; based on the similarity between the current anchor sample and each of the knowledge transfer samples, the predicted probability of the current anchor sample and the predicted probability of each of the knowledge transfer samples, A soft label for the current anchor sample is determined.
  • the above-mentioned memory 22 may be a volatile memory (volatile memory), such as a random access memory (Random-Access Memory, RAM); or a non-volatile memory (non-volatile memory), such as a read-only memory (Read-Only Memory, ROM), flash memory (flash memory), hard disk (Hard Disk Drive, HDD) or solid-state drive (Solid-State Drive, SSD); or a combination of the above types of memory, and send it to the processor 21 Provide instructions and data.
  • volatile memory such as a random access memory (Random-Access Memory, RAM)
  • non-volatile memory such as a read-only memory (Read-Only Memory, ROM), flash memory (flash memory), hard disk (Hard Disk Drive, HDD) or solid-state drive (Solid-State Drive, SSD); or a combination of the above types of memory, and send it to the processor 21 Provide instructions and data.
  • each functional module in this embodiment may be integrated into one recommendation unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of software function modules.
  • the integrated unit is implemented in the form of software function modules and is not sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of this embodiment is essentially or correct. Part of the contribution made by the prior art or all or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium, and includes several instructions to make a computer device (which can be a personal A computer, a server, or a network device, etc.) or a processor (processor) executes all or part of the steps of the method in this embodiment.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read only memory (Read Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes.
  • An embodiment of the present disclosure provides a computer device that can perform a loop process until soft labels satisfying a preset number of anchor samples are obtained; at least based on the soft labels of the preset number of anchor samples and the preset number
  • the anchor samples are used to train the neural network; wherein, the cyclic process includes the following steps: obtaining the current training sample set, and determining the current anchor sample and at least one knowledge transfer sample from the current training sample set in each cycle of executing the cyclic process.
  • the current anchor sample is any one of the current training sample set, and at least one knowledge transfer sample is at least one other sample other than the current training sample set and the current anchor sample; based on the neural network, determine the current anchor sample and each The similarity between knowledge transfer samples, the predicted probability of the current anchor sample and the predicted probability of each knowledge transfer sample; based on the similarity between the current anchor sample and each knowledge transfer sample, the prediction of the current anchor sample The probability and the predicted probability of each knowledge transfer sample determine the soft label of the current anchor sample.
  • the similarity between other samples and the sample and the predicted probability of other samples can be used to assist the generation of the soft label of the training sample, and then based on the soft label of the training sample that meets the preset number.
  • Labels perform efficient training supervision for neural networks. It can be seen that the present disclosure replaces traditional cross-network knowledge integration with cross-sample knowledge integration under the same neural network, realizes knowledge integration based on similarity between samples and obtains effective soft labels on the basis of only using a single network.
  • Embodiments of the present disclosure provide a computer-readable storage medium on which a program is stored, and when the program is executed by a processor, implements the above-described neural network training method.
  • the program instructions corresponding to a neural network training method in this embodiment may be stored on a storage medium such as an optical disk, a hard disk, a U disk, etc.
  • a storage medium such as an optical disk, a hard disk, a U disk, etc.
  • the cycle process includes the following steps:
  • the current anchor sample is the current training sample Any one of the set, the at least one knowledge transfer sample is at least one other sample other than the current anchor sample in the current training sample set; based on the neural network, determine the current anchor sample and each similarity between the knowledge transfer samples, the predicted probability of the current anchor sample and the predicted probability of each of the knowledge transfer samples; based on the current anchor sample and each of the knowledge transfer samples The similarity, the predicted probability of the current anchor sample and the predicted probability of each of the knowledge transfer samples, determine the soft label of the current anchor sample.
  • the embodiments of the present disclosure further provide a computer program product, where the computer program product includes computer-executable instructions for implementing the steps in the neural network training method proposed by the embodiments of the present disclosure.
  • embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied therein, including but not limited to disk storage, optical storage, and the like.
  • the computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable computer device to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instruction means Implements the functionality specified in the flow or flows of the implementation flow diagram and/or the block or blocks of the block diagram.
  • These computer program instructions may also be loaded onto a computer or other programmable computing device such that a series of operational steps are performed on the computer or other programmable device to produce a computer-implemented process, whereby the instructions to be executed on the computer or other programmable device Steps are provided for implementing the functions specified in the flow or blocks of the implementation flow diagram and/or the block or blocks of the block diagram.
  • a loop process is performed until soft labels satisfying a preset number of anchor samples are obtained; the neural network is trained based on at least the soft labels of the preset number of anchor samples and the preset number of anchor samples ; wherein, in each cycle of the execution cycle process, the current anchor sample and at least one knowledge transfer sample are determined from the current training sample set; based on the neural network, the similarity between the current anchor sample and each knowledge transfer sample is determined, and the predicted probability of the current anchor sample and the predicted probability of each knowledge transfer sample; based on the similarity between the current anchor sample and each knowledge transfer sample, the predicted probability of the current anchor sample and the prediction of each knowledge transfer sample Probability to determine the soft label of the current anchor sample.
  • the knowledge integration under the self-distillation algorithm is realized.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A neural network training method and apparatus, a device, and a computer storage medium. The method comprises: performing a cyclic process until a soft label satisfying a preset number of anchor samples is obtained (S100); training a neural network based at least on the soft label of the preset number of anchor samples, and the preset number of anchor samples (S110), wherein the current anchor sample and at least one knowledge transfer sample are determined from the current training sample set in each period when the cyclic process is executed; on the basis of the neural network, determining the similarity between the current anchor sample and each knowledge transfer sample, a prediction probability of the current anchor sample, and a prediction probability of each knowledge transfer sample (S102); and determining a soft label of the current anchor sample on the basis of the similarity between the current anchor sample and each knowledge transfer sample, the prediction probability of the current anchor sample, and the prediction probability of each knowledge transfer sample (S103). Knowledge integration under a self-distillation algorithm is achieved.

Description

神经网络训练方法和装置、设备,及计算机存储介质Neural network training method and device, equipment, and computer storage medium
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本公开基于申请号为202110462397.6、申请日为2021年04月27日、申请名称为“神经网络训练方法和装置、设备,及计算机存储介质”的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本公开作为参考。The present disclosure is based on the Chinese patent application with the application number of 202110462397.6, the application date of April 27, 2021, and the application name of "neural network training method and device, equipment, and computer storage medium", and requires the priority of the Chinese patent application The entire content of this Chinese patent application is incorporated herein by reference.
技术领域technical field
本公开涉及深度学习领域,尤其涉及一种神经网络训练方法及装置、设备、计算机存储介质。The present disclosure relates to the field of deep learning, and in particular, to a neural network training method, device, device, and computer storage medium.
背景技术Background technique
近年来,边缘设备例如移动电话、可穿戴设备等,都需要在本地处理深度学习相关的任务,然而边缘设备却普遍受限于有限的资源和功耗以及时延和成本。为了推荐基于深度学习的产品在边缘设备上的应用,相关技术提出了知识蒸馏(Knowledge Distillation,KD)这一模型压缩方法。In recent years, edge devices such as mobile phones and wearable devices need to process deep learning-related tasks locally. However, edge devices are generally limited by limited resources and power consumption, as well as latency and cost. In order to recommend the application of products based on deep learning on edge devices, related technologies propose a model compression method called Knowledge Distillation (KD).
其中,基于知识蒸馏的模型压缩方法是将训练好的较复杂的“教师”模型的推理预测能力迁移到较简单的“学生”模型,即将“教师”模型预测的“软标签”作为训练监督,来指导“学生”模型的训练,进而减少“学生”模型在边缘设备端所需的计算资源以及提高其运算速度。Among them, the model compression method based on knowledge distillation is to transfer the reasoning and prediction ability of the trained more complex "teacher" model to the simpler "student" model, that is, the "soft label" predicted by the "teacher" model is used as training supervision. to guide the training of the "student" model, thereby reducing the computing resources required by the "student" model on the edge device and improving its computing speed.
然而,为了进一步获得更为准确的“软标签”以提升“学生”模型的网络性能,相关技术常通过多网络模型的知识整合蒸馏算法以为“学生”模型提供有效的训练监督,复杂性较高,使得训练时间和空间成本较大。However, in order to further obtain a more accurate "soft label" to improve the network performance of the "student" model, related technologies often provide effective training supervision for the "student" model through knowledge integration distillation algorithms of multiple network models, which is highly complex. , which makes the training time and space cost large.
发明内容SUMMARY OF THE INVENTION
本公开实施例提供一种神经网络训练方法和装置、设备,及计算机存储介质。Embodiments of the present disclosure provide a neural network training method, apparatus, device, and computer storage medium.
本公开的技术方案是这样实现的:The technical solution of the present disclosure is realized as follows:
本公开实施例提供一种神经网络训练方法,所述方法包括:Embodiments of the present disclosure provide a neural network training method, the method comprising:
执行循环过程直至获得满足预设数量的锚定样本的软标签;至少基于所述预设数量的锚定样本的软标签和所述预设数量的锚定样本,对神经网络进行训练;其中,所述循环过程包括以下步骤:获取当前训练样本集,在执行循环过程的每个周期中从所述当前训练样本集中确定所述当前锚定样本和至少一个知识传递样本;其中,所述当前锚定样本为所述当前训练样本集中的任意一个,所述至少一个知识传递样本为所述当前训练样本集中、所述当前锚定样本以外的至少一个其他样本;基于所述神经网络,确定所述当前锚定样本与每一所述知识传递样本之间的相似度、以及所述当前锚定样本的预测概率和每一所述知识传递样本的预测概率;基于所述当前锚定样本与每一所述知识传递样本之间的相似度,所述当前锚定样本的预测概率和每一所述知识传递样本的预测概率,确定所述当前锚定样本的软标签。这样,针对神经网络下的每个训练样本,可利用其他样本与该样本间的相似度以及其他样本的预测概率来辅助该训练样本软标签的生成,进而基于满足预设数量的训练样本的软标签为神经网络执行高效的训练监督。Performing a looping process until soft labels satisfying a preset number of anchor samples are obtained; training the neural network based on at least the soft labels of the preset number of anchor samples and the preset number of anchor samples; wherein, The loop process includes the following steps: obtaining a current training sample set, and determining the current anchor sample and at least one knowledge transfer sample from the current training sample set in each cycle of executing the loop process; wherein the current anchor sample The fixed sample is any one of the current training sample set, and the at least one knowledge transfer sample is at least one other sample other than the current anchor sample in the current training sample set; based on the neural network, determine the The similarity between the current anchor sample and each of the knowledge transfer samples, and the predicted probability of the current anchor sample and the predicted probability of each of the knowledge transfer samples; based on the current anchor sample and each The similarity between the knowledge transfer samples, the predicted probability of the current anchor sample and the predicted probability of each of the knowledge transfer samples determine the soft label of the current anchor sample. In this way, for each training sample under the neural network, the similarity between other samples and the sample and the predicted probability of other samples can be used to assist the generation of the soft label of the training sample, and then based on the soft label of the training sample satisfying the preset number Labels perform efficient training supervision for neural networks.
本公开实施例提供一种神经网络训练装置,包括:Embodiments of the present disclosure provide a neural network training device, including:
获取部分,配置为执行循环过程直至获得满足预设数量的锚定样本的软标签;其中,所述循环过程包括以下步骤:获取当前训练样本集,在执行循环过程的每个周期中从所述当前训练样本集中确定所述当前锚定样本和至少一个知识传递样本;其中,所述当前锚定 样本为所述当前训练样本集中的任意一个,所述至少一个知识传递样本为所述当前训练样本集中、所述当前锚定样本以外的至少一个其他样本;基于所述神经网络,确定所述当前锚定样本与每一所述知识传递样本之间的相似度、以及所述当前锚定样本的预测概率和每一所述知识传递样本的预测概率;The acquisition part is configured to perform a loop process until soft labels satisfying a preset number of anchor samples are obtained; wherein, the loop process includes the following steps: acquiring a current training sample set, and performing a loop process from the The current anchor sample and at least one knowledge transfer sample are determined in the current training sample set; wherein, the current anchor sample is any one of the current training sample set, and the at least one knowledge transfer sample is the current training sample At least one other sample other than the current anchor sample in the set; based on the neural network, determine the similarity between the current anchor sample and each of the knowledge transfer samples, and the similarity of the current anchor sample a predicted probability and a predicted probability for each of said knowledge transfer samples;
训练部分,配置为至少基于所述预设数量的锚定样本的软标签和所述预设数量的锚定样本,对神经网络进行训练。The training part is configured to train the neural network based on at least the soft labels of the preset number of anchor samples and the preset number of anchor samples.
本公开实施例提供一种计算机设备,所述计算机设备包括处理器、存储有所述处理器可执行指令的存储器,当所述指令被所述处理器执行时,实现如上所述的神经网络训练方法。An embodiment of the present disclosure provides a computer device, the computer device includes a processor and a memory storing instructions executable by the processor, when the instructions are executed by the processor, the above-mentioned neural network training is implemented method.
本公开实施例提供一种计算机可读存储介质,其上存储有程序,应用于计算机设备中,所述程序被处理器执行时,实现如上所述的神经网络训练方法。An embodiment of the present disclosure provides a computer-readable storage medium, on which a program is stored and applied to a computer device, and when the program is executed by a processor, the above-mentioned neural network training method is implemented.
本公开实施例提供一种计算机程序,包括计算机可读代码,在所述计算机可读代码在电子设备中运行,被所述电子设备中的处理器执行的情况下,实现如上所述的神经网络训练方法。Embodiments of the present disclosure provide a computer program, including computer-readable codes, which, when the computer-readable codes run in an electronic device and are executed by a processor in the electronic device, implement the above-mentioned neural network training method.
本公开实施例提供一种计算机程序产品,当其在计算机上运行时,使得计算机执行如上所述的神经网络训练方法。Embodiments of the present disclosure provide a computer program product that, when executed on a computer, enables the computer to execute the neural network training method as described above.
本公开实施例提出的技术方案,计算机设备可以执行循环过程直至获得满足预设数量的锚定样本的软标签;至少基于预设数量的锚定样本的软标签和预设数量的锚定样本,对神经网络进行训练;其中,循环过程包括以下步骤:获取当前训练样本集,在执行循环过程的每个周期中从当前训练样本集中确定当前锚定样本和至少一个知识传递样本;其中,当前锚定样本为当前训练样本集中的任意一个,至少一个知识传递样本为当前训练样本集中、当前锚定样本以外的至少一个其他样本;基于神经网络,确定当前锚定样本与每一知识传递样本之间的相似度、以及当前锚定样本的预测概率和每一知识传递样本的预测概率;基于当前锚定样本与每一知识传递样本之间的相似度,当前锚定样本的预测概率和每一知识传递样本的预测概率,确定当前锚定样本的软标签。如此,针对神经网络下的每个训练样本,可利用其他样本与该样本间的相似度以及其他样本的预测概率来辅助该训练样本软标签的生成,进而基于满足预设数量的训练样本的软标签为神经网络执行高效的训练监督。可见,本公开以同一神经网络下跨样本的知识整合代替传统的跨网络的知识整合,在仅利用单个网络的基础上实现了基于样本间相似度的知识整合并获得有效的软标签。In the technical solutions proposed by the embodiments of the present disclosure, the computer device may perform a loop process until soft labels satisfying a preset number of anchor samples are obtained; at least based on the soft labels of the preset number of anchor samples and the preset number of anchor samples, training the neural network; wherein, the cyclic process includes the following steps: obtaining a current training sample set, and determining a current anchor sample and at least one knowledge transfer sample from the current training sample set in each cycle of executing the cyclic process; wherein, the current anchor sample The fixed sample is any one of the current training sample set, and at least one knowledge transfer sample is at least one other sample in the current training sample set and the current anchor sample; based on the neural network, determine the relationship between the current anchor sample and each knowledge transfer sample and the predicted probability of the current anchor sample and the predicted probability of each knowledge transfer sample; based on the similarity between the current anchor sample and each knowledge transfer sample, the predicted probability of the current anchor sample and each knowledge transfer sample Pass the predicted probability of the sample to determine the soft label of the current anchor sample. In this way, for each training sample under the neural network, the similarity between other samples and the sample and the predicted probability of other samples can be used to assist the generation of the soft label of the training sample, and then based on the soft label of the training sample that meets the preset number. Labels perform efficient training supervision for neural networks. It can be seen that the present disclosure replaces traditional cross-network knowledge integration with cross-sample knowledge integration under the same neural network, realizes knowledge integration based on similarity between samples and obtains effective soft labels on the basis of only using a single network.
附图说明Description of drawings
图1为相关技术中多教师模型的知识整合蒸馏算法的原理示意图;Fig. 1 is the principle schematic diagram of the knowledge integration distillation algorithm of multi-teacher model in the related art;
图2为相关技术中多学生模型的知识整合蒸馏算法的原理示意图;Fig. 2 is the principle schematic diagram of the knowledge integration distillation algorithm of multi-student model in the related art;
图3为本公开实施例提出的神经网络训练方法的实现流程示意图一;3 is a schematic diagram 1 of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure;
图4为本公开实施例提出的神经网络训练方法的实现流程示意图二;FIG. 4 is a second implementation flowchart of the neural network training method proposed by the embodiment of the present disclosure;
图5为本公开实施例提出的神经网络训练方法的实现流程示意图三;FIG. 5 is a schematic diagram 3 of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure;
图6为本公开实施例提出的神经网络训练方法的实现流程示意图四;FIG. 6 is a fourth schematic diagram of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure;
图7为本公开实施例提出的神经网络训练方法的实现流程示意图五;FIG. 7 is a schematic diagram five of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure;
图8为本公开实施例提出的神经网络训练方法的实现流程示意图六;FIG. 8 is a sixth schematic diagram of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure;
图9为本公开实施例提出的神经网络训练方法的实现流程示意图七;FIG. 9 is a seventh schematic diagram of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure;
图10为本公开实施例提出的神经网络训练方法的实现流程示意图八;FIG. 10 is a schematic diagram eight of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure;
图11为本公开实施例提出的神经网络训练方法的实现流程示意图九;FIG. 11 is a schematic diagram 9 of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure;
图12为本公开实施例提出的神经网络训练方法的实现流程示意图十;FIG. 12 is a schematic diagram ten of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure;
图13为本公开实施例提出的神经网络训练方法的实现流程示意图十一;FIG. 13 is a schematic diagram eleventh of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure;
图14为本公开实施例提出的知识整合自蒸馏算法的原理示意图一;FIG. 14 is a schematic diagram 1 of the principle of a knowledge integration self-distillation algorithm proposed by an embodiment of the present disclosure;
图15为本公开实施例提出的知识整合自蒸馏算法的原理示意图二;FIG. 15 is a second schematic diagram of the principle of a knowledge integration self-distillation algorithm proposed by an embodiment of the present disclosure;
图16为本公开实施例提出的神经网络训练装置的组成结构示意图;FIG. 16 is a schematic diagram of the composition and structure of a neural network training apparatus proposed by an embodiment of the present disclosure;
图17为本公开实施例提出的计算机设备的组成结构示意图。FIG. 17 is a schematic diagram of the composition and structure of a computer device according to an embodiment of the present disclosure.
具体实施方式Detailed ways
为了使本公开的目的、技术方案和优点更加清楚,下面将结合附图对本公开作进一步地详细描述,所描述的实施例不应视为对本公开的限制,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本公开保护的范围。In order to make the purpose, technical solutions and advantages of the present disclosure clearer, the present disclosure will be further described in detail below with reference to the accompanying drawings. The described embodiments should not be regarded as limitations of the present disclosure, and those skilled in the art will not All other embodiments obtained under the premise of creative work fall within the protection scope of the present disclosure.
在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" can be the same or a different subset of all possible embodiments, and Can be combined with each other without conflict.
在以下的描述中,所涉及的术语“第一\第二\第三”仅仅是是区别类似的对象,不代表针对对象的特定排序,可以理解地,“第一\第二\第三”在允许的情况下可以互换特定的顺序或先后次序,以使这里描述的本公开实施例能够以除了在这里图示或描述的以外的顺序实施。In the following description, the term "first\second\third" is only used to distinguish similar objects, and does not represent a specific ordering of objects. It is understood that "first\second\third" Where permitted, the specific order or sequence may be interchanged to enable the embodiments of the disclosure described herein to be practiced in sequences other than those illustrated or described herein.
除非另有定义,本文所使用的所有的技术和科学术语与属于本公开的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本公开实施例的目的,不是旨在限制本公开。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. The terminology used herein is for the purpose of describing the embodiments of the present disclosure only and is not intended to limit the present disclosure.
对本公开实施例进行进一步详细说明之前,对本公开实施例中涉及的名词和术语进行说明,本发明实施例中涉及的名词和术语适用于如下的解释。Before further describing the embodiments of the present disclosure in detail, the terms and terms involved in the embodiments of the present disclosure are described, and the terms and terms involved in the embodiments of the present disclosure are suitable for the following explanations.
1)知识蒸馏:采取教师-学生(Teacher-Student)模式,将复杂且大的模型作为教师模型即Teacher,学生模型Student结构较为简单,用Teacher来辅助Student模型的训练。旨在通过软标签(soft-target)将从高容量教师模型中学到的“暗”知识转移到学生模型,软标签可以是班级概率,也可以是老师输出的特征表示,其中包含的信息比单标签更完整。1) Knowledge distillation: adopt the teacher-student (Teacher-Student) model, take the complex and large model as the teacher model, namely the teacher, the student model student has a relatively simple structure, and use the teacher to assist the training of the student model. It aims to transfer the "dark" knowledge learned from the high-capacity teacher model to the student model through soft-targets, which can be class probabilities or feature representations output by the teacher, which contain more information than a single model. Labels are more complete.
一般地,大模型往往是单个复杂网络或者是若干网络的集合,拥有良好的性能和泛化能力,而小模型因为网络规模较小,表达能力有限。因此,可以利用大模型学习到的知识去指导小模型训练,使得小模型具有与大模型相当的性能,但是参数数量大幅降低,从而实现模型压缩与加速。Generally, a large model is often a single complex network or a collection of several networks, which has good performance and generalization ability, while a small model has limited expressive ability due to the small network size. Therefore, the knowledge learned by the large model can be used to guide the training of the small model, so that the small model has the same performance as the large model, but the number of parameters is greatly reduced, thereby achieving model compression and acceleration.
2)自蒸馏:知识蒸馏的其中一种特殊情况,自蒸馏(Self-Distillation)则指的是自己蒸馏到自己,可以看作是:首先对两个单独的模型F1,F2进行集成学习,然后蒸馏成F2。即Teacher Model就是Student Model的集成版本,称为自集成(Self-Ensemble)。2) Self-distillation: One of the special cases of knowledge distillation, Self-Distillation refers to self-distillation to itself, which can be seen as: first perform integrated learning on two separate models F1 and F2, and then Distilled to F2. That is, the Teacher Model is an integrated version of the Student Model, called Self-Ensemble.
3)知识整合:通过整合多个预先训练的教师模型的知识来增强软标签,如引入的多老师版的知识蒸馏(Multi-Model Ensemble via Adversarial Learning,MEAL)、多学生版的知识蒸馏KDCL。3) Knowledge integration: Soft labels are enhanced by integrating the knowledge of multiple pre-trained teacher models, such as the introduced multi-teacher version of knowledge distillation (Multi-Model Ensemble via Adversarial Learning, MEAL), multi-student version of knowledge distillation KDCL.
近年来,深度神经网络推动了计算机视觉的快速发展,其中图像分类任务被视为最基本也是最重要的任务之一。目前有大量的工作旨在攻克图像分类任务性能提升的瓶颈,尤其是在大规模的数据集上。In recent years, deep neural networks have promoted the rapid development of computer vision, in which the task of image classification is regarded as one of the most basic and important tasks. There is a lot of work currently aimed at overcoming the bottleneck of performance improvement in image classification tasks, especially on large-scale datasets.
最近研究表明,有监督的图像分类训练的瓶颈来源于不够准确的“硬标签”,即人为标注的单标签(一张图片一个类别,one-hot标签),这一问题导致的学习目标不完善是阻碍分类准确性进一步提高,监督学习存在极大局限性的关键因素。Recent studies have shown that the bottleneck of supervised image classification training comes from inaccurate "hard labels", that is, human-labeled single labels (one image, one category, one-hot label). This problem leads to imperfect learning goals. It is a key factor that hinders the further improvement of classification accuracy, and supervised learning has great limitations.
知识蒸馏算法的提出为该问题提供了一个较好的解决方案,即利用一个预训练的教师模型所预测的软概率矢量,即“软标签”作为训练监督,来指导学生模型的训练。The proposal of the knowledge distillation algorithm provides a better solution to this problem, that is to use the soft probability vector predicted by a pre-trained teacher model, that is, the "soft label" as a training supervision, to guide the training of the student model.
另一方面,需要在本地处理深度学习相关的任务边缘设备,例如移动电话、可穿戴设备等,都普遍受限于有限的资源和功耗以及时延和成本。知识蒸馏算法的提出可以促使基于深度学习的产品在边缘设备上的广泛应用。On the other hand, edge devices, such as mobile phones, wearable devices, etc., that need to process deep learning-related tasks locally are generally limited by limited resources and power consumption, as well as latency and cost. The proposal of knowledge distillation algorithm can promote the wide application of deep learning-based products on edge devices.
众所周知,多个网络的集合通常比集合中的单个网络产生更好的预测。因此,在最先进的方法中,采用多个老师或学生对互补知识进行编码,如通过整合多个预先训练的教师模型的知识来增强软标签,而他们的“综合”软标签是更可靠的学习目标,我们将此类算法称之为知识整合蒸馏算法。It is known that ensembles of multiple networks often yield better predictions than a single network in the ensemble. Therefore, in state-of-the-art methods, multiple teachers or students are employed to encode complementary knowledge, such as by integrating the knowledge of multiple pre-trained teacher models to enhance soft labels, while their "synthetic" soft labels are more reliable Learning objectives, we call such algorithms as knowledge integration distillation algorithms.
示例性的,图1为相关技术中多教师模型的知识整合蒸馏算法的原理示意图,如图1所示,教师模型#1至教师模型#N针对锚定样本的预测概率为{1 1,…,p N},学生模型针对锚定样本的预测概率为p anchor,对{p 1,…,p N}进行知识整合,如加权平均,进而将获得的结果作为软标签,并通过蒸馏的方式迁移至学生模型。 Exemplarily, FIG. 1 is a schematic diagram of the knowledge integration distillation algorithm of the multi-teacher model in the related art. As shown in FIG. 1, the prediction probability of the teacher model #1 to the teacher model #N for the anchor sample is {1 1 ,  … , p N }, the predicted probability of the student model for the anchored sample is p anchor , and knowledge integration is performed on {p 1 , . Migrate to student model.
示例性的,图2为相关技术中多学生模型的知识整合蒸馏算法的原理示意图,如图2所示,学生模型#1至学生模型#N针对锚定样本的预测概率为{p 1,…,p N},对{p 1,…,p N}进行知识整合,进而将获得的结果作为软标签,并通过蒸馏的方式迁移至每个学生模型。 Exemplarily, FIG. 2 is a schematic diagram of the knowledge integration distillation algorithm of the multi-student model in the related art. As shown in FIG. 2 , the predicted probability of the student model #1 to the student model #N for the anchor samples is {p 1 ,  … , p N }, perform knowledge integration on {p 1 , ..., p N }, and then use the obtained result as a soft label and transfer it to each student model by distillation.
然而,虽然知识整合蒸馏算法可以提供有效的训练监督,但是其不得不依赖额外的网络或分支,复杂性较高,大大增大了训练时间和空间成本。鉴于此,如何通过较少的训练时间和空间成本获得有效的软标签,以进行更为准确的网络训练监督是亟待解决的问题,是本公开实施例所要讨论的内容,下面将结合以下具体实施例进行阐述。However, although the knowledge integration distillation algorithm can provide effective training supervision, it has to rely on additional networks or branches, which has high complexity and greatly increases the training time and space cost. In view of this, how to obtain effective soft labels with less training time and space cost for more accurate network training supervision is an urgent problem to be solved, which is the content to be discussed in the embodiments of the present disclosure, and the following specific implementation will be combined example to illustrate.
本公开实施例提供一种神经网络训练方法和装置、设备,及计算机存储介质,通过针对神经网络下的每个训练样本,可利用其他样本与该样本间的相似度以及其他样本的预测概率来辅助该训练样本软标签的生成,进而基于满足预设数量的训练样本的软标签为神经网络执行高效的训练监督。可见,本公开以同一神经网络下跨样本的知识整合代替传统的跨网络的知识整合,在仅利用单个网络的基础上实现了基于样本间相似度的知识整合并获得有效的软标签。Embodiments of the present disclosure provide a neural network training method, apparatus, device, and computer storage medium. For each training sample under the neural network, the similarity between other samples and the sample and the predicted probability of other samples can be used to generate The generation of soft labels for the training samples is assisted, and efficient training supervision is performed for the neural network based on soft labels satisfying a preset number of training samples. It can be seen that the present disclosure replaces traditional cross-network knowledge integration with cross-sample knowledge integration under the same neural network, realizes knowledge integration based on similarity between samples and obtains effective soft labels on the basis of only using a single network.
本公开实施例提出的神经网络训练方法应用于计算机设备中。下面说明本公开实施例提出的计算机设备的示例性应用,本公开实施例提出的计算机设备可以实施为手机终端、笔记本电脑,平板电脑,台式计算机,智能电视、车载设备、可穿戴设备、工业设备等。The neural network training method proposed in the embodiments of the present disclosure is applied to computer equipment. Exemplary applications of the computer equipment proposed by the embodiments of the present disclosure will be described below. The computer equipment proposed by the embodiments of the present disclosure may be implemented as mobile phone terminals, notebook computers, tablet computers, desktop computers, smart TVs, vehicle-mounted equipment, wearable devices, and industrial equipment. Wait.
下面,将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述。Below, the technical solutions in the embodiments of the present disclosure will be clearly and completely described with reference to the accompanying drawings in the embodiments of the present disclosure.
本公开一实施例提供了一种神经网络训练方法,图3为本公开实施例提出的神经网络训练方法的实现流程示意图一,如图3所示,在本公开的实施例中,计算机设备执行神经网络训练的方法可以包括以下步骤:An embodiment of the present disclosure provides a neural network training method, and FIG. 3 is a schematic diagram 1 of the implementation flow of the neural network training method proposed in the embodiment of the present disclosure. As shown in FIG. 3 , in the embodiment of the present disclosure, a computer device executes The method of neural network training can include the following steps:
S100、执行循环过程直至获得满足预设数量的锚定样本的软标签。S100, a loop process is performed until soft labels satisfying a preset number of anchor samples are obtained.
可以理解的是,训练样本的标签越准确,模型训练效果越好,为了克服人为硬标签在模型训练上带来的瓶颈和缺陷,可以通过知识蒸馏算法为训练样本生成更为鲁棒的软标签,以通过软标签进行高效的训练监督来提升模型的性能。It is understandable that the more accurate the labels of the training samples, the better the model training effect. In order to overcome the bottlenecks and defects caused by artificial hard labels in model training, the knowledge distillation algorithm can be used to generate more robust soft labels for the training samples. , to improve the performance of the model with efficient training supervision through soft labels.
在一些实施例中,可以通过知识蒸馏算法为全部训练样本集中的每个训练样本生成其对应的软标签;或者通过知识蒸馏算法为全部训练样本集中的部分训练样本如某一批次训练样本,生成其对应的软标签。In some embodiments, the knowledge distillation algorithm can be used to generate its corresponding soft label for each training sample in the entire training sample set; Generate its corresponding soft label.
其中,图4为本公开实施例提出的神经网络训练方法的实现流程示意图二,如图4所示,本公开实施例提供的循环过程包括以下步骤:4 is a second schematic diagram of the implementation process of the neural network training method proposed by the embodiment of the present disclosure. As shown in FIG. 4 , the cyclic process provided by the embodiment of the present disclosure includes the following steps:
S101、获取当前训练样本集,在执行循环过程的每个周期中从当前训练样本集中确定当前锚定样本和至少一个知识传递样本;其中,当前锚定样本为当前训练样本集中的任意一个,至少一个知识传递样本为当前训练样本集中、当前锚定样本以外的至少一个其他样本。S101. Obtain a current training sample set, and determine a current anchor sample and at least one knowledge transfer sample from the current training sample set in each cycle of the execution cycle; wherein, the current anchor sample is any one of the current training sample set, at least A knowledge transfer sample is at least one other sample in the current training sample set other than the current anchor sample.
在一些实施例中,当前训练样本集可以是用于进行神经网络训练的全部数据集,也可以是用于进行神经网络训练的多批次训练数据集中的任意一个批次的训练数据集。In some embodiments, the current training sample set may be all data sets used for neural network training, or may be any batch of training data sets in multiple batches of training data sets used for neural network training.
在一些实施例中,锚定样本指当前训练数据集中需要进行软标签生成的训练样本;知识传递样本指训练数据集中、用于作用于锚定样本软标签生成的至少一个其他训练样本。In some embodiments, the anchor sample refers to a training sample in the current training data set that needs to generate soft labels; the knowledge transfer sample refers to at least one other training sample in the training data set for acting on the soft label generation of the anchor sample.
在本公开实施例中,计算机设备在每个周期可以从训练样本集中确定任意一个感兴趣的、之前未被确定为锚定样本的训练样本作为当前锚定样本,并将当前训练样本集中、除当前锚定样本之外的至少一个其他样本确定为知识传递样本。In this embodiment of the present disclosure, the computer device may determine any interesting training sample from the training sample set that has not been previously determined as the anchor sample as the current anchor sample in each cycle, and use the current training sample set, except At least one other sample other than the current anchor sample is determined as a knowledge transfer sample.
其中,在将训练样本集中不同的训练样本作为当前锚定样本时,其对应的知识传递样本是不相同的。例如,同一批次样本为{x 1,x 2,x 3,…,x R},如果将x 1确定为当前锚定样本,那么剩余的{x 2,x 3,…,x R}作为x 1对应的知识传递样本;如果将x 2确定为当前锚定样本,那么剩余的{x 1,x 3,…,x R}作为x 2对应的知识传递样本。 Among them, when different training samples in the training sample set are used as the current anchor samples, the corresponding knowledge transfer samples are different. For example, the same batch of samples is {x 1 , x 2 , x 3 , ..., x R }, if x 1 is determined as the current anchor sample, then the remaining {x 2 , x 3 , ..., x R } are used as The knowledge transfer sample corresponding to x 1 ; if x 2 is determined as the current anchor sample, then the remaining {x 1 , x 3 , . . . , x R } are used as the knowledge transfer sample corresponding to x 2 .
在本公开实施例的一实施方式中,在训练样本集中,每个训练样本都存在与其具有视图相似性的至少一个其他训练样本。In an implementation of the embodiment of the present disclosure, in the training sample set, each training sample has at least one other training sample with view similarity with it.
S102、基于神经网络,确定当前锚定样本与每一知识传递样本之间的相似度、以及当前锚定样本的预测概率和每一知识传递样本的预测概率。S102. Based on the neural network, determine the similarity between the current anchor sample and each knowledge transfer sample, as well as the predicted probability of the current anchor sample and the predicted probability of each knowledge transfer sample.
应理解,在本公开实施例中,为了实现基于样本的视图相似性在样本之间传递和整合知识,在从训练样本集中确定出当前锚定样本和该锚定样本对应的至少一个知识传递样本之后,可以确定锚定样本与知识传递样本之间的样本相似度,以及基于锚定样本与知识传递样本之间的相似度用来进行传递和整合的“知识”。It should be understood that, in this embodiment of the present disclosure, in order to transfer and integrate knowledge between samples based on sample-based view similarity, at least one knowledge transfer sample corresponding to the current anchor sample and the anchor sample is determined from the training sample set. Afterwards, the sample similarity between the anchor sample and the knowledge transfer sample, and the "knowledge" used for transfer and integration based on the similarity between the anchor sample and the knowledge transfer sample can be determined.
在一些实施例中,可以通过神经网络确定当前锚定样本与每一知识传递样本之间的相似度。这里,将当前锚定样本和至少一个知识传递样本输入神经网络之后,神经网络可以分别确定当前锚定样本的样本特征和每一知识传递样本的样本特征,进而基于当前锚定样本的样本特征和每一知识传递样本的样本特征确定出当前锚定样本与每一知识传递样本之间的样本相似度。In some embodiments, the similarity between the current anchor sample and each knowledge transfer sample can be determined by a neural network. Here, after inputting the current anchor sample and at least one knowledge transfer sample into the neural network, the neural network can determine the sample feature of the current anchor sample and the sample feature of each knowledge transfer sample respectively, and then based on the sample feature of the current anchor sample and The sample feature of each knowledge transfer sample determines the sample similarity between the current anchor sample and each knowledge transfer sample.
在一些实施例中,锚定样本与知识传递样本之间用来进行传递和整合的“知识”可以是在图像分类、目标检测、图像分割等任务上的样本的预测概率。例如,在分类任务上样本属于某个类别的概率。In some embodiments, the "knowledge" used for transfer and integration between anchor samples and knowledge transfer samples may be predicted probabilities of samples on tasks such as image classification, object detection, and image segmentation. For example, the probability that a sample belongs to a class on a classification task.
其中,可以通过神经网络确定锚定样本的预测概率与每一知识传递样本的预测概率。Among them, the predicted probability of the anchor sample and the predicted probability of each knowledge transfer sample can be determined through the neural network.
S103、基于当前锚定样本与每一知识传递样本之间的相似度,当前锚定样本的预测概率和每一知识传递样本的预测概率,确定当前锚定样本的软标签。S103. Determine the soft label of the current anchor sample based on the similarity between the current anchor sample and each knowledge transfer sample, the predicted probability of the current anchor sample and the predicted probability of each knowledge transfer sample.
在本公开实施例中,通过神经网络确定出锚定样本与每一知识传递样本之间的相似度,以及锚定样本与每一知识传递样本各自的预测概率之后,便可以进一步基于当前锚定样本与每一知识传递样本之间的相似度、当前锚定样本的预测概率以及每一知识传递样本的预测概率执行知识的传递和整合操作,以作用于当前锚定样本软标签的生成。In the embodiment of the present disclosure, after the similarity between the anchor sample and each knowledge transfer sample and the respective prediction probability of the anchor sample and each knowledge transfer sample are determined through the neural network, the current anchor sample can be further based on the current anchor sample. The similarity between the sample and each knowledge transfer sample, the predicted probability of the current anchor sample, and the predicted probability of each knowledge transfer sample perform knowledge transfer and integration operations to act on the generation of the current anchor sample soft label.
可以理解的是,在本公开实施例中,可以基于当前锚定样本与每一知识传递样本之间的相似度,表征每一知识传递样本对锚定样本的“知识”影响程度,从而对每一知识传递样本的预测概率按其“知识”影响程度执行知识的加权传递操作,针对当前锚定样本对应的至少一个知识传递样本,对以不同影响程度传递的每一知识传递样本的知识进行整合,共同作用于当前锚定样本软标签的生成。It can be understood that, in this embodiment of the present disclosure, the degree of “knowledge” influence of each knowledge transfer sample on the anchor sample can be represented based on the similarity between the current anchor sample and each knowledge transfer sample, so that each The predicted probability of a knowledge transfer sample is a weighted transfer operation of knowledge according to its "knowledge" influence degree, and for at least one knowledge transfer sample corresponding to the current anchor sample, the knowledge of each knowledge transfer sample transferred with different degrees of influence is integrated. , which work together to generate the soft label of the current anchor sample.
可见,生成训练样本软标签的不再知识蒸馏算法下不同网络模型间的知识整合,而是基于自蒸馏算法的单一网络模型下,不同样本间的知识整合。这样,不再需要多个其他网络模型作用软标签的生成,而是在单一网络模型下,针对每个训练样本,基于该训练样本与其他样本间的相似度对其他样本的“暗知识”进行传递和整合,以作用于该训练样本软标签的生成。It can be seen that the knowledge integration between different network models under the knowledge distillation algorithm to generate the soft labels of the training samples is not the knowledge integration between different samples under a single network model based on the self-distillation algorithm. In this way, multiple other network models are no longer required to generate soft labels, but under a single network model, for each training sample, the "dark knowledge" of other samples is carried out based on the similarity between the training sample and other samples. Pass and integrate to act on the generation of soft labels for this training sample.
S110、至少基于预设数量的锚定样本的软标签和预设数量的锚定样本,对神经网络进行训练。S110. Train the neural network based on at least the soft labels of the preset number of anchor samples and the preset number of anchor samples.
在本公开实施例中,生成满足预设数量的锚定样本的软标签之后,便可以至少基于预设数量的锚定样本以及该锚定样本鲁棒的软标签更新神经网络的目标损失函数,以实现对神经网络的更新优化。In the embodiment of the present disclosure, after the soft labels satisfying the preset number of anchor samples are generated, the objective loss function of the neural network can be updated based on at least the preset number of anchor samples and the robust soft labels of the anchor samples, In order to realize the update optimization of the neural network.
可见,为了提高模型的泛化能力以及训练监督的准确性,模型监督训练过程中的不再是基于训练样本对应的硬标签进行模型训练,而是结合上述样本间的知识整合的自蒸馏方法获得的鲁棒的软标签,以进行模型训练。It can be seen that in order to improve the generalization ability of the model and the accuracy of training supervision, the model training process is no longer based on the hard labels corresponding to the training samples, but the self-distillation method combining the knowledge integration between the above samples. robust soft labels for model training.
本公开实施例提供一种神经网络训练方法,通过针对神经网络下的每个训练样本,可利用其他样本与该样本间的相似度以及其他样本的预测概率来辅助该训练样本软标签的生成,进而基于满足预设数量的训练样本的软标签为神经网络执行高效的训练监督。可见,本公开以同一神经网络下跨样本的知识整合代替传统的跨网络的知识整合,在仅利用单个网络的基础上实现了基于样本间相似度的知识整合并获得有效的软标签。The embodiments of the present disclosure provide a neural network training method. For each training sample under the neural network, the similarity between other samples and the sample and the prediction probability of the other samples can be used to assist the generation of the soft label of the training sample, In turn, efficient training supervision is performed for the neural network based on soft labels satisfying a preset number of training samples. It can be seen that the present disclosure replaces traditional cross-network knowledge integration with cross-sample knowledge integration under the same neural network, realizes knowledge integration based on similarity between samples and obtains effective soft labels on the basis of only using a single network.
在本公开实施例的一实施方式中,图5为本公开实施例提出的神经网络训练方法的实现流程示意图三,如图5所示,计算机设备基于神经网络,确定当前锚定样本与每一知识传递样本之间的相似度、以及当前锚定样本的预测概率和每一知识传递样本的预测概率的方法还包括以下步骤:In an implementation of the embodiment of the present disclosure, FIG. 5 is a schematic diagram 3 of the implementation flow of the neural network training method proposed in the embodiment of the present disclosure. As shown in FIG. 5 , the computer device determines the current anchor sample and each anchor sample based on the neural network. The method for the similarity between knowledge transfer samples, the predicted probability of the current anchor sample and the predicted probability of each knowledge transfer sample further includes the following steps:
S102a、基于神经网络的编码器,确定当前锚定样本的样本特征和每一知识传递样本的样本特征。S102a, the neural network-based encoder determines the sample feature of the current anchor sample and the sample feature of each knowledge transfer sample.
S102b、基于当前锚定样本的样本特征和每一知识传递样本的样本特征,确定当前锚定样本与每一知识传递样本之间的相似度。S102b, determining the similarity between the current anchor sample and each knowledge transfer sample based on the sample feature of the current anchor sample and the sample feature of each knowledge transfer sample.
在本公开实施例中,神经网络设置有编码器,编码器用于对每个训练样本进行特征提取并进行特征编码,得到以向量形式表征的样本特征。In the embodiment of the present disclosure, the neural network is provided with an encoder, and the encoder is configured to perform feature extraction and feature encoding on each training sample to obtain sample features represented in the form of vectors.
在一些实施例中,可以通过神经网络的编码器对当前锚定样本和每一知识传递样本分别进行特征提取,分别获得当前锚定样本的样本特征和每一知识传递样本的样本特征,并分别对其样本特征进行特征编码,将样本特征表示为向量的形式。即通过神经网络的编码器确定出当前锚定样本以向量形式表征的样本特征与每一知识传递样本以向量形式表征的样本特征。In some embodiments, feature extraction can be performed on the current anchor sample and each knowledge transfer sample respectively by an encoder of a neural network, and the sample features of the current anchor sample and the sample features of each knowledge transfer sample are obtained respectively, and Feature encoding is performed on its sample features, and the sample features are represented in the form of vectors. That is, the sample feature represented by the current anchor sample in the form of a vector and the sample feature represented by the vector form of each knowledge transfer sample are determined by the encoder of the neural network.
这里,可以基于样本特征确定当前锚定样本与每一知识传递样本之间的视图相似性,即样本相似度。Here, the view similarity between the current anchor sample and each knowledge transfer sample, that is, the sample similarity, may be determined based on the sample features.
在本公开实施例中,图6为本公开实施例提出的神经网络训练方法的实现流程示意图四,如图6所示,基于当前锚定样本的样本特征和每一知识传递样本的样本特征,确定当前锚定样本与每一知识传递样本之间的相似度的方法包括以下步骤:In the embodiment of the present disclosure, FIG. 6 is a fourth schematic diagram of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure. As shown in FIG. 6 , based on the sample characteristics of the current anchor sample and the sample characteristics of each knowledge transfer sample, The method for determining the similarity between the current anchor sample and each knowledge transfer sample includes the following steps:
S102b1、对当前锚定样本的样本特征进行归一化处理,得到当前锚定样本的归一化样本特征。S102b1. Perform normalization processing on the sample features of the current anchor sample to obtain the normalized sample features of the current anchor sample.
S102b2、对每一知识传递样本的样本特征进行归一化处理,得到每一知识传递样本的归一化特征。S102b2: Perform normalization processing on the sample features of each knowledge transfer sample to obtain the normalized features of each knowledge transfer sample.
S102b3、对当前锚定样本的归一化样本特征和每一知识传递样本的归一化特征进行点积运算处理,得到当前锚定样本与每一知识传递样本之间的相似度。S102b3: Perform dot product operation on the normalized sample feature of the current anchor sample and the normalized feature of each knowledge transfer sample to obtain the similarity between the current anchor sample and each knowledge transfer sample.
可以理解的是,在基于样本特征确定样本之间的相似度之前,需要先将使用向量形式的样本特征转换至同一量纲下,使得计算严格本相似度拥有统一的标准。It can be understood that, before determining the similarity between samples based on the sample features, it is necessary to convert the sample features in the form of vectors to the same dimension, so that the calculation of the strict similarity has a unified standard.
这里,计算机设备可以分别对当前锚定样本的样本特征和每一知识传递样本的样本特征先进行归一化处理,获得当前锚定样本的归一化样本特征和每一知识传递样本的归一化特征,以实现将使用向量形式表征的样本特征转换至同一量纲下。Here, the computer device may firstly normalize the sample features of the current anchor sample and the sample features of each knowledge transfer sample to obtain the normalized sample features of the current anchor sample and the normalized sample features of each knowledge transfer sample. To convert the sample features represented in vector form to the same dimension.
之后,通过对当前锚定样本的归一化特征和每一知识传递样本的归一化特征进行点积运算处理,进而得到当前锚定样本与每一知识传递样本之间的相似度。Afterwards, by performing dot product operation on the normalized feature of the current anchor sample and the normalized feature of each knowledge transfer sample, the similarity between the current anchor sample and each knowledge transfer sample is obtained.
其中,样本之间相似度的计算公式如下所示:Among them, the calculation formula of similarity between samples is as follows:
A(i,j)=σ(F(x i)) Tσ(F(x j))    (1) A(i,j)=σ(F(x i )) T σ(F(x j )) (1)
其中,公式(1)中,F(x i)为通过神经网络的编码器确定的第i个锚定样本的以向量形式表征的样本特征,F(x j)为第i个锚定样本对应的第j个知识传递样本的样本特征,
Figure PCTCN2021121379-appb-000001
为规则为l 2的归一化公式。
Among them, in formula (1), F(x i ) is the sample feature in the form of a vector of the ith anchor sample determined by the encoder of the neural network, and F(x j ) is the ith anchor sample corresponding to The sample features of the jth knowledge transfer sample,
Figure PCTCN2021121379-appb-000001
is the normalization formula with rule l 2 .
如此,基于公式(1)便可以计算出当前锚定样本与每一知识传递样本,即两两之间的成对样本相似度,并将该相似度结果存储于亲和矩阵A中,假设训练样本集中样本的个数为N,则表征全部样本间相似度的亲和矩阵可以为A∈R N×N中。 In this way, based on formula (1), the similarity between the current anchor sample and each knowledge transfer sample, that is, the pairwise sample similarity between each pair, can be calculated, and the similarity result is stored in the affinity matrix A, assuming training If the number of samples in the sample set is N, the affinity matrix representing the similarity between all samples can be in A∈R N×N .
可见,在本公开实施例中,可以对当前锚定样本与每一知识传递样本的经归一化转换至同一量纲下的样本特征进行点积运算,进而确定出两两样本之间的成对相似度。It can be seen that, in the embodiment of the present disclosure, a dot product operation can be performed on the sample features of the current anchor sample and each knowledge transfer sample that are normalized and converted to the same dimension, and then the composition between the two samples can be determined. to similarity.
S102c、基于神经网络的分类器,确定当前锚定样本的预测概率和每一知识传递样本的预测概率。S102c, the neural network-based classifier determines the predicted probability of the current anchor sample and the predicted probability of each knowledge transfer sample.
在本公开实施例中,神经网络还设置有分类器,分类器用于确定每个训练样本对应的预测概率。In the embodiment of the present disclosure, the neural network is further provided with a classifier, and the classifier is used to determine the prediction probability corresponding to each training sample.
例如,分类任务下,对于当前锚定样本和每一知识传递样本,神经网络的分类器可以通过softmax函数计算当前锚定样本和每一知识传递样本的预测概率。其中,样本预测概率的计算公式如下所示:For example, under the classification task, for the current anchor sample and each knowledge transfer sample, the classifier of the neural network can calculate the predicted probability of the current anchor sample and each knowledge transfer sample through the softmax function. Among them, the calculation formula of the sample prediction probability is as follows:
Figure PCTCN2021121379-appb-000002
Figure PCTCN2021121379-appb-000002
其中,公式(2)中,
Figure PCTCN2021121379-appb-000003
表征训练样本属于类别k的概率,T是温度超参数,K是总的类别数。
Figure PCTCN2021121379-appb-000004
为训练样本集中的任意一个训练样本的对数向量,
Figure PCTCN2021121379-appb-000005
为训练样本集中全部训练样本的对数向量的求和结果。
Among them, in formula (2),
Figure PCTCN2021121379-appb-000003
Represents the probability that a training sample belongs to class k, where T is the temperature hyperparameter and K is the total number of classes.
Figure PCTCN2021121379-appb-000004
is the logarithmic vector of any training sample in the training sample set,
Figure PCTCN2021121379-appb-000005
is the sum of the logarithmic vectors of all training samples in the training sample set.
例如,分类任务下,训练样本集中训练样本的预测概率可以表示为
Figure PCTCN2021121379-appb-000006
Figure PCTCN2021121379-appb-000007
其第i个锚定样本满足
Figure PCTCN2021121379-appb-000008
即第i个锚定样本属于第一类别,…,第K类别的概率和为1。
For example, under the classification task, the predicted probability of training samples in the training sample set can be expressed as
Figure PCTCN2021121379-appb-000006
Figure PCTCN2021121379-appb-000007
Its ith anchor sample satisfies
Figure PCTCN2021121379-appb-000008
That is, the i-th anchor sample belongs to the first category, ..., and the probability sum of the K-th category is 1.
可见,在本公开实施例中,可以基于神经网络的编码器获得当前锚定样本与每一知识传递样本之间相似度,基于神经网络的分类器获得当前锚定样本的预测概率和每一知识传递样本的预测概率,以进一步实现基于样本相似度、当前锚定样本的预测概率和知识传递样本的预测概率作用于锚定样本软标签的生成。It can be seen that, in the embodiment of the present disclosure, the similarity between the current anchor sample and each knowledge transfer sample can be obtained based on the neural network encoder, and the predicted probability of the current anchor sample and each knowledge transfer sample can be obtained by the neural network based classifier. The predicted probability of the sample is passed to further realize the generation of the anchored sample soft label based on the similarity of the sample, the predicted probability of the current anchored sample and the predicted probability of the knowledge transfer sample.
在本公开实施例中,图7为本公开实施例提出的神经网络训练方法的实现流程示意图五,如图7所示,基于当前锚定样本与每一所述知识传递样本之间的相似度,当前锚定样本的预测概率和每一知识传递样本的预测概率,确定当前锚定样本的软标签的方法包括以下步骤:In the embodiment of the present disclosure, FIG. 7 is a schematic diagram 5 of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure. As shown in FIG. 7 , based on the similarity between the current anchor sample and each of the knowledge transfer samples , the predicted probability of the current anchor sample and the predicted probability of each knowledge transfer sample, and the method for determining the soft label of the current anchor sample includes the following steps:
S103a、基于当前锚定样本与每一知识传递样本之间的相似度,确定每一知识传递样本对当前锚定样本的知识传递参数。S103a, based on the similarity between the current anchor sample and each knowledge transfer sample, determine a knowledge transfer parameter of each knowledge transfer sample to the current anchor sample.
在本公开实施例中,每一知识传递样本的预测概率将以不同的权重值传递至当前锚定样本,以用于当前锚定样本软标签的生成,计算机设备可以通过softmax函数对样本相似度进行标准化处理,计算出每一个知识传递样本对当前锚定样本的“知识”传递权重值,即知识传递参数。In the embodiment of the present disclosure, the predicted probability of each knowledge transfer sample will be transferred to the current anchor sample with different weight values, so as to be used for the generation of the soft label of the current anchor sample. Carry out standardization processing, and calculate the "knowledge" transfer weight value of each knowledge transfer sample to the current anchor sample, that is, the knowledge transfer parameter.
其中,图8为本公开实施例提出的神经网络训练方法的实现流程示意图六,如图8所示,基于当前锚定样本与每一知识传递样本之间的相似度,确定每一所述知识传递样本对当前锚定样本的知识传递参数的方法包括以下步骤:8 is a sixth schematic diagram of the implementation process of the neural network training method proposed by the embodiment of the present disclosure. As shown in FIG. 8 , each knowledge transfer sample is determined based on the similarity between the current anchor sample and each knowledge transfer sample. The method of transferring the knowledge transfer parameters of the sample to the current anchor sample includes the following steps:
S103a1、对当前锚定样本与每一知识传递样本之间的至少一个相似度进行累加处理,得到相似度累加值。S103a1: Accumulate at least one similarity between the current anchor sample and each knowledge transfer sample to obtain an accumulated similarity value.
S103a2、基于锚定样本与每一知识传递样本之间的相似度、相似度累加值,确定每一 知识传递样本对当前锚定样本的知识传递参数。S103a2, based on the similarity between the anchor sample and each knowledge transfer sample, and the accumulated value of the similarity, determine the knowledge transfer parameter of each knowledge transfer sample to the current anchor sample.
在本公开实施例中,计算机设备可以先对当前锚定样本与每一个知识传递样本之间的相似度进行累加求和处理,获得求和结果即相似度累加值,然后针对每一知识传递样本,结合当前锚定样本与每一知识传递样本之间的相似度、上述相似度累计值,通过softmax函数计算每一知识传递样本对当前锚定样本的知识传递参数。In this embodiment of the present disclosure, the computer device may first perform an accumulation and summation process on the similarity between the current anchor sample and each knowledge transfer sample to obtain the summation result, that is, the accumulated similarity value, and then for each knowledge transfer sample , combining the similarity between the current anchor sample and each knowledge transfer sample, and the above-mentioned cumulative value of the similarity, calculate the knowledge transfer parameter of each knowledge transfer sample to the current anchor sample through the softmax function.
知识传递参数的计算公式如下所示:The calculation formula of the knowledge transfer parameter is as follows:
Figure PCTCN2021121379-appb-000009
Figure PCTCN2021121379-appb-000009
其中,公式(3)中,exp(A(i,j))表征第i个锚定样本与第j个知识传递样本之间的样本相似度,∑ j≠iexp(A(i,j))表征第i个锚定样本对应的全部知识传递样本的相似度累加值。 Among them, in formula (3), exp(A(i,j)) represents the sample similarity between the ith anchor sample and the jth knowledge transfer sample, ∑ j≠i exp(A(i,j) ) represents the accumulated similarity value of all knowledge transfer samples corresponding to the ith anchor sample.
经公式(3)样本相似度的标准化处理之后,对每一知识传递样本对当前锚定样本的“知识”传递权重值进行累加处理,累加值为1,即
Figure PCTCN2021121379-appb-000010
After the standardization of the sample similarity in formula (3), the "knowledge" transfer weight value of the current anchor sample is accumulated for each knowledge transfer sample, and the accumulated value is 1, that is,
Figure PCTCN2021121379-appb-000010
可见,在本公开实施例中,可以通过对样本相似度的标准化处理获得每一知识传递样本对锚定样本的“知识”传递权重值,进而根据该权重值进行每一知识传递样本其预测概率的加权传递。It can be seen that in the embodiment of the present disclosure, the "knowledge" transfer weight value of each knowledge transfer sample to the anchor sample can be obtained by standardizing the similarity of the samples, and then the predicted probability of each knowledge transfer sample is carried out according to the weight value. weighted transfer.
S103b、基于每一知识传递样本对当前锚定样本的知识传递参数、当前锚定样本的预测概率以及每一知识传递样本的预测概率,确定当前锚定样本的软标签。S103b: Determine the soft label of the current anchor sample based on the knowledge transfer parameter of each knowledge transfer sample to the current anchor sample, the predicted probability of the current anchor sample, and the predicted probability of each knowledge transfer sample.
在本公开实施例中,得到每一知识传递样本对当前锚定样本的知识传递参数之后,可以基于该每一知识传递样本对当前锚定样本的“知识”传递权重值、当前锚定样本的预测概率以及每一知识传递样本的预测概率进行当前锚定样本的软标签的确定,直至将每个知识传递样本的传递的“暗知识”完全融合至当前锚定样本,便可以获得当前锚定样本准确且鲁棒的软标签。In the embodiment of the present disclosure, after obtaining the knowledge transfer parameters of each knowledge transfer sample to the current anchor sample, the "knowledge" transfer weight value of each knowledge transfer sample to the current anchor sample, the value of the current anchor sample The predicted probability and the predicted probability of each knowledge transfer sample are used to determine the soft label of the current anchor sample, until the “dark knowledge” transferred by each knowledge transfer sample is completely integrated into the current anchor sample, the current anchor sample can be obtained. Sample accurate and robust soft labels.
其中,图9为本公开实施例提出的神经网络训练方法的实现流程示意图七,如图9所示,基于每一知识传递样本对当前锚定样本的知识传递参数、当前锚定样本的预测概率以及每一知识传递样本的预测概率,确定当前锚定样本的软标签的具体方法包括以下步骤:9 is a seventh schematic diagram of the implementation process of the neural network training method proposed by the embodiment of the present disclosure. As shown in FIG. 9 , based on each knowledge transfer sample, the knowledge transfer parameters of the current anchor sample and the prediction probability of the current anchor sample are As well as the predicted probability of each knowledge transfer sample, the specific method for determining the soft label of the current anchor sample includes the following steps:
S103b1、基于每一知识传递样本对当前锚定样本的知识传递参数和每一知识传递样本的预测概率执行一次知识传递处理,得到当前锚定样本的初始知识传递概率。S103b1 , performing a knowledge transfer process on the knowledge transfer parameters of the current anchor sample and the predicted probability of each knowledge transfer sample based on each knowledge transfer sample, to obtain the initial knowledge transfer probability of the current anchor sample.
在本公开实施例中,可以先基于每一知识传递样本对当前锚定样本的知识传递参数和每一知识传递样本的预测概率,对至少一个知识传递样本的“暗知识”进行一次加权整合传递,进而获得当前锚定样本的初始知识传递概率。In the embodiment of the present disclosure, based on the knowledge transfer parameters of each knowledge transfer sample to the current anchor sample and the predicted probability of each knowledge transfer sample, a weighted integration transfer may be performed on the “dark knowledge” of at least one knowledge transfer sample. , and then obtain the initial knowledge transfer probability of the current anchor sample.
在一些实施例中,可以基于每一知识传递样本对当前锚定样本的“知识”传递权重值对每一知识传递样本的预测概率进行加权传递。这里,可以先基于每一知识传递样本对当前锚定样本的知识传递参数和每一知识传递样本的预测概率,先确定每一知识传递样本对当前锚定样本的知识传递概率。In some embodiments, the predicted probability of each knowledge transfer sample may be weighted based on the "knowledge" transfer weight value of each knowledge transfer sample to the current anchor sample. Here, the knowledge transfer probability of each knowledge transfer sample to the current anchor sample may be first determined based on the knowledge transfer parameter of each knowledge transfer sample to the current anchor sample and the predicted probability of each knowledge transfer sample.
在一些实施例中,可以对每一知识传递样本需加权传递至当前锚定样本的“知识”进行整合后再进行传递。这里,可以先对每一知识传递样本对当前锚定样本的至少一个知识传递概率进行累加处理,得到知识传递概率累加值,然后基于知识传递概率累加值先进行一次知识传递处理,进而得到至少一个知识传递样本第一次传递至当前锚定样本“暗知识”,即初始知识传递概率。In some embodiments, the "knowledge" that needs to be weighted and transferred to the current anchor sample for each knowledge transfer sample may be integrated and then transferred. Here, at least one knowledge transfer probability of the current anchor sample can be accumulated for each knowledge transfer sample to obtain an accumulated value of knowledge transfer probability, and then a knowledge transfer process is performed based on the accumulated value of knowledge transfer probability to obtain at least one knowledge transfer probability. The knowledge transfer sample is transferred to the current anchor sample "dark knowledge" for the first time, that is, the initial knowledge transfer probability.
这里,知识传递概率的计算公式如下所示:Here, the calculation formula of knowledge transfer probability is as follows:
Figure PCTCN2021121379-appb-000011
Figure PCTCN2021121379-appb-000011
公式(4)中,
Figure PCTCN2021121379-appb-000012
为第i个锚定样本对应的知识传递概率,
Figure PCTCN2021121379-appb-000013
为第i个锚定样本与第j个知识传递样本的样本相似度,
Figure PCTCN2021121379-appb-000014
为第j个知识传递样本的初始预测概率。
In formula (4),
Figure PCTCN2021121379-appb-000012
is the knowledge transfer probability corresponding to the i-th anchor sample,
Figure PCTCN2021121379-appb-000013
is the sample similarity between the ith anchor sample and the jth knowledge transfer sample,
Figure PCTCN2021121379-appb-000014
Initial predicted probability for the jth knowledge transfer sample.
基于上述可知,训练样本集中训练样本的预测概率可以表示为
Figure PCTCN2021121379-appb-000015
中,为了避免当前锚定样本自我知识的增强,我们使用A=A⊙(1-I)的方法时丢弃矩阵 上述亲和矩阵A的对角项,其中I为单位矩阵,⊙表示逐元素相乘,即保持亲和矩阵A的对角线均为0。进而
Figure PCTCN2021121379-appb-000016
中的
Figure PCTCN2021121379-appb-000017
即计算当前锚定样本的知识传递概率时,
Figure PCTCN2021121379-appb-000018
中传递的“暗知识”仅保留至少一个知识传递样本的加权整合后的预测概率。
Based on the above, the predicted probability of the training samples in the training sample set can be expressed as
Figure PCTCN2021121379-appb-000015
In order to avoid the enhancement of self-knowledge of the current anchor sample, we use the method of A=A⊙(1-I) to discard the diagonal entries of the above affinity matrix A, where I is the identity matrix, and ⊙ represents the element-wise phase. Multiply, that is, keep the diagonal of the affinity matrix A all 0. and then
Figure PCTCN2021121379-appb-000016
middle
Figure PCTCN2021121379-appb-000017
That is, when calculating the knowledge transfer probability of the current anchor sample,
Figure PCTCN2021121379-appb-000018
The "dark knowledge" transferred in , retains only the weighted integrated predicted probabilities of at least one knowledge transfer sample.
直观的,如果第i个锚定样本和第j个知识传递样本相似度较高,即
Figure PCTCN2021121379-appb-000019
值较高,那么该第j个知识传递样本其预测概率
Figure PCTCN2021121379-appb-000020
具有更大的传递权重值,传递至
Figure PCTCN2021121379-appb-000021
Intuitively, if the ith anchor sample and the jth knowledge transfer sample have a high similarity, that is
Figure PCTCN2021121379-appb-000019
The value is higher, then the predicted probability of the jth knowledge transfer sample
Figure PCTCN2021121379-appb-000020
With a larger transfer weight value, transfer to
Figure PCTCN2021121379-appb-000021
在本公开实施例中,基于公式(4)计算同一批次中的每个训练样本作为锚定样本时的知识传递概率之后,可以对同一批次中的“暗知识”进行并行传递,即并行传递同一批次所有样本的知识传递概率,即
Figure PCTCN2021121379-appb-000022
其中,
Figure PCTCN2021121379-appb-000023
表征每一训练样本作为锚定样本时的知识传递参数,
Figure PCTCN2021121379-appb-000024
In the embodiment of the present disclosure, after calculating the knowledge transfer probability when each training sample in the same batch is used as an anchor sample based on formula (4), the “dark knowledge” in the same batch can be transferred in parallel, that is, parallel transfer Transfer the knowledge transfer probability of all samples in the same batch, that is
Figure PCTCN2021121379-appb-000022
in,
Figure PCTCN2021121379-appb-000023
represent the knowledge transfer parameters when each training sample is used as an anchor sample,
Figure PCTCN2021121379-appb-000024
可见,在本公开实施例中可以实现对其他样本预测概率的进行加权和整合以传递至当前锚定样本。It can be seen that in the embodiment of the present disclosure, weighting and integration of the predicted probabilities of other samples can be implemented to transmit to the current anchor sample.
S103b2、基于初始知识传递概率和当前锚定样本的预测概率进行一次知识融合处理,得到当前锚定样本的初始软标签。S103b2: Perform a knowledge fusion process based on the initial knowledge transfer probability and the prediction probability of the current anchor sample to obtain the initial soft label of the current anchor sample.
在本公开实施例中,将至少一个知识传递样本的“暗知识”进行加权整合传递之后,便可以基于传递的“暗知识”和当前锚定样本当前已有的“知识”进行一次知识融合处理,以得到当前锚定样本的初始软标签。In the embodiment of the present disclosure, after the “dark knowledge” of at least one knowledge transfer sample is weighted, integrated and transferred, a knowledge fusion process can be performed based on the transferred “dark knowledge” and the current “knowledge” of the current anchor sample , to get the initial soft label of the current anchor sample.
这里,在对至少一个知识传递样本的预测概率进行一次加权整合传递之后,便可以基于至少一个知识传递样本对当前锚定样本的初始知识传递概率和当前锚定样本的预测概率执行知识融合处理获得初始软标签。Here, after a weighted integration transfer is performed on the predicted probability of at least one knowledge transfer sample, the initial knowledge transfer probability of the current anchor sample and the predicted probability of the current anchor sample can be obtained by performing knowledge fusion based on the at least one knowledge transfer sample. Initial soft label.
当前锚定样本初始软标签的生成函数如下:The generation function of the initial soft label of the current anchor sample is as follows:
Figure PCTCN2021121379-appb-000025
Figure PCTCN2021121379-appb-000025
其中,公式(5)中,
Figure PCTCN2021121379-appb-000026
第i个训练样本的初始软标签,ω为加权因子,是超参数,ω∈[0,1]。
Among them, in formula (5),
Figure PCTCN2021121379-appb-000026
The initial soft label of the ith training sample, ω is a weighting factor, is a hyperparameter, ω∈[0,1].
应理解,在对同一批次中样本的“暗知识”进行并行传递的情况下,同一批次下所有训练样本的初始软标签生成函数如下:It should be understood that in the case of parallel transfer of "dark knowledge" of samples in the same batch, the initial soft label generation function of all training samples in the same batch is as follows:
Figure PCTCN2021121379-appb-000027
Figure PCTCN2021121379-appb-000027
其中,公式(6)中,Q T为包含同一批次下每一训练样本的初始软标签,
Figure PCTCN2021121379-appb-000028
Figure PCTCN2021121379-appb-000029
Among them, in formula (6), Q T is the initial soft label containing each training sample under the same batch,
Figure PCTCN2021121379-appb-000028
Figure PCTCN2021121379-appb-000029
S103b3、基于当前锚定样本的初始软标签,执行循环过程直至至少一个知识传递样本的预测概率小于预设概率阈值的情况下,得到当前锚定样本的软标签。S103b3: Based on the initial soft label of the current anchor sample, perform a loop process until the predicted probability of at least one knowledge transfer sample is less than a preset probability threshold, and obtain the soft label of the current anchor sample.
应保证,为了使至少一个知识传递样本传递的“暗知识”能够完全作用于当前锚定样本软标签的生成,可以对知识的传递和整合进行多次,直至将至少一个知识传递样本传递的知识完全融合至当前锚定样本。It should be ensured that, in order to enable the "dark knowledge" transferred by at least one knowledge transfer sample to fully act on the generation of the soft label of the current anchor sample, the transfer and integration of knowledge can be performed multiple times until at least one knowledge transfer sample transfers the knowledge. Fully fused to the current anchor sample.
其中,图10为本公开实施例提出的神经网络训练方法的实现流程示意图八,如图10所示,该循环过程包括以下步骤:Wherein, FIG. 10 is a schematic diagram 8 of the implementation flow of the neural network training method proposed by the embodiment of the disclosure. As shown in FIG. 10 , the cyclic process includes the following steps:
S103b31、在循环过程的每个周期中,基于上一周期得到的当前锚定样本的软标签和每一知识传递参数执行知识传递处理,获得当前锚定样本的知识传递概率。S103b31. In each cycle of the loop process, perform knowledge transfer processing based on the soft label of the current anchor sample obtained in the previous cycle and each knowledge transfer parameter to obtain the knowledge transfer probability of the current anchor sample.
S103b31、基于当前锚定样本的知识传递概率和当前锚定样本的预测概率执行知识融合处理,获得下一周期的当前锚定样本的软标签。S103b31: Perform knowledge fusion processing based on the knowledge transfer probability of the current anchor sample and the prediction probability of the current anchor sample, and obtain the soft label of the current anchor sample in the next cycle.
应理解,为了提高软标签准确性,以更好的提升学生模型的性能,上述知识加权传递和整合的过程可以进行多次,直至收敛,实现知识的充分融合。It should be understood that, in order to improve the accuracy of the soft label and to better improve the performance of the student model, the above-mentioned process of weighted transfer and integration of knowledge can be carried out multiple times until convergence, so as to achieve full integration of knowledge.
其中,知识传递和融合过程如下:Among them, the knowledge transfer and fusion process is as follows:
Figure PCTCN2021121379-appb-000030
Figure PCTCN2021121379-appb-000030
其中,公式(7)中,t表示第t次传递和整合的迭代,
Figure PCTCN2021121379-appb-000031
为前一周期第i个锚定样本的软标签。
Among them, in formula (7), t represents the iteration of the t-th transfer and integration,
Figure PCTCN2021121379-appb-000031
is the soft label of the ith anchor sample in the previous cycle.
可见,在循环过程的每个周期中,先基于上一周期得到的当前锚定样本的软标签和每一知识传递参数执行一次知识传递处理,获得当前锚定样本的知识传递概率,即
Figure PCTCN2021121379-appb-000032
然后基于当前锚定样本的知识传递概率
Figure PCTCN2021121379-appb-000033
和当前锚定样本的预测概率
Figure PCTCN2021121379-appb-000034
执行一次知识融合处理,获得下一周期的当前锚定样本的软标签
Figure PCTCN2021121379-appb-000035
It can be seen that in each cycle of the loop process, a knowledge transfer process is first performed based on the soft label of the current anchor sample obtained in the previous cycle and each knowledge transfer parameter to obtain the knowledge transfer probability of the current anchor sample, that is
Figure PCTCN2021121379-appb-000032
Then based on the knowledge transfer probability of the current anchor sample
Figure PCTCN2021121379-appb-000033
and the predicted probability of the current anchor sample
Figure PCTCN2021121379-appb-000034
Perform a knowledge fusion process to obtain the soft label of the current anchor sample in the next cycle
Figure PCTCN2021121379-appb-000035
这里,在同一批次训练样本的并行传递情况下,知识传递和融合过程如下:Here, in the case of parallel transfer of the same batch of training samples, the knowledge transfer and fusion process is as follows:
Figure PCTCN2021121379-appb-000036
Figure PCTCN2021121379-appb-000036
其中,公式(8)中,t表示第t次传递和整合的迭代,
Figure PCTCN2021121379-appb-000037
为前一周期同一批次中的训练样本的软标签。
Among them, in formula (8), t represents the iteration of the t-th transfer and integration,
Figure PCTCN2021121379-appb-000037
is the soft label of the training samples in the same batch of the previous cycle.
可见,在循环过程的每个周期中,先基于上一周期得到的同一批次下每个训练样本的软标签,然后和每一训练样本的预测概率执行知识融合处理,获得下一周期每个训练样本的软标签。It can be seen that in each cycle of the loop process, firstly, based on the soft label of each training sample in the same batch obtained in the previous cycle, and then perform knowledge fusion processing with the predicted probability of each training sample to obtain each Soft labels for training samples.
在循环过程执行多次,即知识传递过程和知识融合过程迭代无限次,每一训练样本的至少一个知识传递样本的预测概率小于预设概率阈值,如知识传递样本的预测概率无穷小趋近于零的情况下,
Figure PCTCN2021121379-appb-000038
同时,
Figure PCTCN2021121379-appb-000039
进而获得每一训练样本的软标签。
Execute multiple times in the loop process, that is, the knowledge transfer process and the knowledge fusion process are iterated infinite times, and the predicted probability of at least one knowledge transfer sample of each training sample is less than the preset probability threshold. For example, the predicted probability of the knowledge transfer sample is infinitely small and approaches zero. in the case of,
Figure PCTCN2021121379-appb-000038
at the same time,
Figure PCTCN2021121379-appb-000039
And then obtain the soft label of each training sample.
基于上述知识多次传递和融合过程,每一训练样本软标签的生成函数可以估计为:Based on the multiple transfer and fusion process of the above knowledge, the generation function of the soft label of each training sample can be estimated as:
Figure PCTCN2021121379-appb-000040
Figure PCTCN2021121379-appb-000040
如此,每个训练样本的至少一个知识传递样本传递的“暗知识”已完全融合至每个训练样本,每个训练样本对应的软标签准确性较高,准确性趋近于100%In this way, the "dark knowledge" transmitted by at least one knowledge transfer sample of each training sample has been completely integrated into each training sample, and the soft label corresponding to each training sample has high accuracy, and the accuracy is close to 100%
对于同一批次下的所有训练样本,由于在同一量纲下,自然满足
Figure PCTCN2021121379-appb-000041
无需进行额外的归一化处理。
For all training samples in the same batch, since they are in the same dimension, it naturally satisfies
Figure PCTCN2021121379-appb-000041
No additional normalization is required.
可见,在本公开实施例中,对于同一批次下的每个训练样本,都可以结合来自同一批次训练样本中的每一其他样本该训练样本的相似度,对每一其他样本的“暗知识”进行加权和整合传递至当前训练样本,直至知识完全融合,进而获得每一训练样本准确且鲁棒的软标签。It can be seen that, in the embodiment of the present disclosure, for each training sample in the same batch, the similarity of the training sample from each other sample in the same batch of training samples can be combined to determine the "darkness" of each other sample. The knowledge” is weighted and integrated to the current training samples until the knowledge is fully integrated, and an accurate and robust soft label for each training sample is obtained.
在本公开实施例中,图11为本公开实施例提出的神经网络训练方法的实现流程示意图九,如图11所示,神经网络训练方法还包括以下步骤:In the embodiment of the present disclosure, FIG. 11 is a schematic diagram 9 of the implementation flow of the neural network training method proposed by the embodiment of the present disclosure. As shown in FIG. 11 , the neural network training method further includes the following steps:
S120、获取训练数据集,训练数据集包括至少一个批次的训练数据子集。S120. Acquire a training data set, where the training data set includes at least one batch of training data subsets.
S130、从训练数据集中选择一个批次的之前未被选中作为训练样本集的训练数据子集,作为当前训练样本集。S130: Select a batch of training data subsets that have not been previously selected as the training sample set from the training data set, as the current training sample set.
在本公开实施例中,可以获取训练数据集,如ImageNet(数据集),但是考虑到该训练数据集过大,实际应用时往往无法将该训练数据一次加载到计算设备中,那么我们可以将该训练数据集划分成至少一个占用容量较小的训练数据子集,以进行至少一个批次的神经网络训练。In the embodiment of the present disclosure, a training data set, such as ImageNet (data set), can be obtained, but considering that the training data set is too large, it is often impossible to load the training data into the computing device at one time in practical application, then we can The training data set is divided into at least one smaller occupancy-capacity training data subset for at least one batch of neural network training.
在本公开实施例的一实施方式中,可以采用多批次,即mini-batch的形式进行神经网络训练。进而,可以将多个批次训练数据子集中的任意一个批次的训练子集确定为上述训练样本集,通过执行S101-S103的知识整合的自蒸馏算法得到锚定样本的软标签。In an implementation manner of the embodiments of the present disclosure, the neural network training may be performed in the form of multiple batches, that is, mini-batches. Furthermore, any batch of training subsets in the multiple batches of training data subsets may be determined as the above-mentioned training sample set, and the soft label of the anchor sample is obtained by performing the self-distillation algorithm of knowledge integration in S101-S103.
其中,在上述训练样本集为一个批次的训练数据子集时,可以针对该训练样本集中的每一个训练样本执行上述S101-S103的知识整合的自蒸馏算法,得到该训练样本集中每一训练样本对应的软标签。Wherein, when the above-mentioned training sample set is a batch of training data subsets, the self-distillation algorithm of the above-mentioned knowledge integration of S101-S103 may be performed for each training sample in the training sample set, to obtain each training sample in the training sample set The soft label corresponding to the sample.
这里,在将某一批次的训练数据子集作为当前训练样本集执行完上述S100-S110之后, 可以将训练数据集中之前未被确定为训练样本集的另一批次的训练数据子集继续确定为下一轮的训练样本集,并执行上述S101-S103的知识整合的自蒸馏算法和S100-S110的神经网络训练方法,提升网络性能。Here, after performing the above S100-S110 with a certain batch of training data subsets as the current training sample set, another batch of training data subsets in the training data set that was not previously determined as the training sample set may be continued. It is determined as the training sample set for the next round, and the self-distillation algorithm of knowledge integration of S101-S103 and the neural network training method of S100-S110 are executed to improve the network performance.
在本公开实施例中,图12为本公开实施例提出的神经网络训练方法的实现流程示意图十,如图12所示,神经网络训练方法还包括以下步骤:In the embodiment of the present disclosure, FIG. 12 is a schematic diagram tenth of the implementation flow of the neural network training method proposed in the embodiment of the present disclosure. As shown in FIG. 12 , the neural network training method further includes the following steps:
S140、对训练数据集执行随机采样处理,得到至少一个第一训练数据。S140. Perform random sampling processing on the training data set to obtain at least one piece of first training data.
S150、确定每一第一训练数据的硬标签,并基于每一第一训练数据的硬标签对训练数据集中未被选中作为第一训练数据的剩余数据继续进行相似度采样处理,得到每一第一训练数据对应的至少一个第二训练数据。S150. Determine the hard label of each first training data, and continue to perform similarity sampling processing on the remaining data in the training data set that is not selected as the first training data based on the hard label of each first training data, to obtain each At least one second training data corresponding to one training data.
S160、将基于至少一个第一训练数据和每一第一训练数据对应的至少一个第二训练数据构建的一个批次的训练数据子集,作为当前训练样本集。S160. Use a batch of training data subsets constructed based on at least one first training data and at least one second training data corresponding to each first training data as a current training sample set.
可以理解的是,为了可以实现根据样本间的相似度在样本之间进行知识的加权传递和整合,首先要确保训练样本中,每个样本都存在与其具有视图相似性的至少一个其他样本。It can be understood that, in order to implement the weighted transfer and integration of knowledge between samples according to the similarity between samples, it is first necessary to ensure that in the training samples, each sample has at least one other sample with view similarity with it.
在本公开实施例的一实施方式中,可以设置一类数据采样器,即在通用随机采样机制之上实现基于视图相似性进行训练样本的采样。In an implementation manner of the embodiments of the present disclosure, a type of data sampler may be provided, that is, to implement sampling of training samples based on view similarity based on a general random sampling mechanism.
其中,采样处理中,可以先通过该类数据采样器先对上述训练数据集ImageNet进行随机采样,获得至少一个第一训练数据;然后确定每一第一训练数据对应的人为硬标签。Wherein, in the sampling process, the above-mentioned training data set ImageNet can be randomly sampled by this type of data sampler to obtain at least one first training data; and then the artificial hard label corresponding to each first training data is determined.
之后,基于该硬标签从训练数据集的剩余的数据中进行相似度采样处理,即采样出与每一第一训练数据的具有视图相似性,即硬标签相同的至少一个第二训练数据,进而由该至少一个第一训练数据与每一第一训练数据对应的至少一个第二训练数据构成一个批次的训练数据子集,并将该一个批次的训练数据子集作为训练样本集。Then, based on the hard label, the similarity sampling process is performed from the remaining data of the training data set, that is, at least one second training data having the same view similarity as each first training data, that is, the same hard label, is sampled, and then A batch of training data subsets is formed by the at least one first training data and at least one second training data corresponding to each first training data, and the one batch of training data subsets is used as a training sample set.
如此,基于上述方法可以从训练数据集中选取出样本之间存在视图相似性的多个批次的训练数据子集,在基于S101-S103知识整合的自蒸馏算法和S100-S110的神经网络训练方法时,可以将每个批次的训练数据子集作为当前训练样本集。In this way, based on the above method, multiple batches of training data subsets with view similarity between samples can be selected from the training data set. , the training data subset of each batch can be used as the current training sample set.
例如,随机采样获得的第一训练数据的数目为N,为每一第一训练数据选择出M个具有视图相似性的第二训练数据,那么最终构成的一个批次的训练数据子集的样本数目为N×(M+1)。For example, the number of first training data obtained by random sampling is N, and M pieces of second training data with view similarity are selected for each first training data, then a batch of training data subset samples is finally formed The number is N×(M+1).
可见,通过基于样本相似度的数据采样方法,能够保证当前训练样本集中的每一训练样本存在与其视觉性相似的至少一个其他样本,进而可以实现根据样本间的相似度在样本之间进行知识的加权传递。It can be seen that through the data sampling method based on the sample similarity, it can be ensured that each training sample in the current training sample set has at least one other sample that is visually similar to it, so that knowledge can be obtained between samples according to the similarity between samples. Weighted delivery.
在本公开实施例中,图13为本公开实施例提出的神经网络训练方法的实现流程示意图十一,如图13所示,至少基于预设数量的锚定样本的软标签和预设数量的锚定样本,对神经网络进行训练的方法还包括以下步骤:In the embodiment of the present disclosure, FIG. 13 is a schematic diagram eleventh of the implementation flow of the neural network training method proposed in the embodiment of the present disclosure. As shown in FIG. 13 , at least based on a preset number of soft labels of anchor samples and a preset number of soft labels Anchoring the samples, the method for training the neural network also includes the following steps:
S111、基于每一锚定样本的软标签和对应该锚定样本的预测概率,确定每一锚定样本的相对熵。S111. Determine the relative entropy of each anchor sample based on the soft label of each anchor sample and the predicted probability corresponding to the anchor sample.
S112、基于每一锚定样本的硬标签和对应该锚定样本的预测概率,确定每一锚定样本的交叉熵。S112. Determine the cross-entropy of each anchor sample based on the hard label of each anchor sample and the predicted probability corresponding to the anchor sample.
S113、基于预设数量的锚定样本的交叉熵和预设数量的锚定样本的相对熵对神经网络进行训练。S113. Train the neural network based on the cross-entropy of the preset number of anchor samples and the relative entropy of the preset number of anchor samples.
在本公开实施例中,相对熵即KL散度(Kullback Leibler divergence,KLD),交叉熵和相对熵都是用来描述样本真实结果分布和预测结果分布之间的差异性。In the embodiment of the present disclosure, the relative entropy is the KL divergence (Kullback Leibler divergence, KLD), the cross entropy and the relative entropy are both used to describe the difference between the distribution of the actual result of the sample and the distribution of the predicted result.
在本公开实施例中,在得到预设数量的锚定样本的软标签,并至少基于该预设数量的锚定样本以及锚定样本的软标签进行神经网络的训练时,可以通过计算训练样本的两类差异性来实现对神经网络的训练。In the embodiment of the present disclosure, when a preset number of soft labels of anchor samples are obtained, and the neural network is trained based on at least the preset number of anchor samples and the soft labels of the anchor samples, the training samples can be calculated by calculating The two types of differences to achieve the training of the neural network.
其中,一类是将人为硬标签作为样本的真实结果分布,基于人为硬标签和预测概率之 间的差异性确定交叉熵;另一类是将鲁棒的软标签作为样本的“真实”结果分布,基于软标签和预测概率之间的差异性确定相对熵。Among them, one is the distribution of real results with artificial hard labels as samples, and the cross-entropy is determined based on the difference between the artificial hard labels and predicted probabilities; the other is the distribution of "true" results with robust soft labels as samples , the relative entropy is determined based on the difference between soft labels and predicted probabilities.
应理解,神经网络的训练即最小化交叉熵和相对熵,也就是通过神经网络确定出的样本的预测结果分布近似于样本的真实结果分布。这里,由于经上述循环过程获得满足预设数量的锚定样本的鲁棒的软标签进行神经网络训练,标签准确性好了,那么相应的训练出的神经网络性能也被提升。It should be understood that the training of the neural network is to minimize the cross entropy and the relative entropy, that is, the distribution of the predicted results of the samples determined by the neural network approximates the distribution of the real results of the samples. Here, since robust soft labels satisfying a preset number of anchor samples are obtained through the above-mentioned cyclic process for neural network training, and the label accuracy is good, the performance of the corresponding trained neural network is also improved.
这里,损失函数如下所示:Here, the loss function looks like this:
Figure PCTCN2021121379-appb-000042
Figure PCTCN2021121379-appb-000042
其中,公式(10)中y∈{1,…,K},p(y)=[p(1),…,p(K)]表示硬标签,p T为预测概率,λ为权重系数,T为温度,D KL(q T||p T)为KL散度。 Among them, y∈{1,...,K} in formula (10), p(y)=[p(1),...,p(K)] represents the hard label, p T is the prediction probability, λ is the weight coefficient, T is the temperature and D KL (q T ||p T ) is the KL divergence.
可见,前一部分为根据硬标签和初始预测概率确定的交叉熵,后一部分为根据软标签和初始预测概率确定的KL散度。It can be seen that the former part is the cross entropy determined according to the hard label and the initial prediction probability, and the latter part is the KL divergence determined according to the soft label and the initial prediction probability.
在模型训练过程中,基于公式(10)计算交叉熵和KL散度值,并进一步进行最小化,实现神经网络的训练,提升网络模型的性能。In the model training process, the cross entropy and KL divergence values are calculated based on formula (10), and further minimized to realize the training of the neural network and improve the performance of the network model.
可见,在本公开实施例中,通过准确性高、鲁棒的软标签可以实现通过更小的数据和更高的学习率训练模型。It can be seen that, in the embodiment of the present disclosure, a model can be trained with smaller data and a higher learning rate through a highly accurate and robust soft label.
图14为本公开实施例提出的知识整合自蒸馏算法的原理示意图一,如图14所示,{x 1,…,x N}为同一小批样本集合中、锚定样本以外的其他样本,学生模型针对锚定样本的预测概率为[ anchor,针对{x 1,…,x N}的预测概率为{p 1,…,p N}。 FIG. 14 is a schematic diagram 1 of the principle of the knowledge integration self-distillation algorithm proposed by the embodiment of the present disclosure. As shown in FIG. 14 , { x 1 , . The predicted probability of the student model for anchor samples is [ anchor , and the predicted probability for {x 1 , ..., x N } is {p 1 , ..., p N }.
进一步的,可以对其他样本对应的预测概率{p 1,…,p N}进行知识整合,进而将获得的结果作为软标签,并通过蒸馏的方式迁移至锚定样本。 Further, knowledge integration can be performed on the predicted probabilities {p 1 , .
可见,相比于图1和图2,本申请提出的知识整合仅采用一个网络,是通过在同一批量生产中集合锚定样本以外的其他样本,通过动态汇总来自同一批处理中不同样本的“暗知识”来获得知识集合,即{x 1,…,x N}的知识来生成鲁棒的软标签,将知识整合到单个网络中,进而跟大程度上节省了内存和时间成本。 It can be seen that, compared with Figure 1 and Figure 2, the knowledge integration proposed in this application only adopts one network, by collecting other samples other than the anchor samples in the same batch production, and by dynamically aggregating the data from different samples in the same batch” "Dark knowledge" to obtain the knowledge set, that is, the knowledge of {x 1 , ..., x N } to generate robust soft labels, integrate the knowledge into a single network, and save memory and time costs to a large extent.
图15为本公开实施例提出的知识整合自蒸馏算法的原理示意图二,如图15所示,同一批次训练样本集中的样本包括锚定样本和至少一个知识传递样本{x 1,…,x N},对锚定样本和每一知识传递样本应用编码器encoder分别进行特征编码处理,获得锚定样本的样本特征f anchor和至少一个知识传递样本的样本特征{f 1,f 2,f 3,…},基于样本特征估算锚定样本与每一知识传递样本之间的相似度。 FIG. 15 is a second schematic diagram of the principle of the knowledge integration self-distillation algorithm proposed by the embodiment of the present disclosure. As shown in FIG. 15 , the samples in the same batch of training samples include anchor samples and at least one knowledge transfer sample {x 1 , . . . , x N }, apply the encoder encoder to the anchor sample and each knowledge transfer sample to perform feature encoding respectively, and obtain the sample feature f anchor of the anchor sample and the sample feature of at least one knowledge transfer sample {f 1 , f 2 , f 3 , ...}, estimate the similarity between the anchor sample and each knowledge transfer sample based on the sample features.
进一步的,通过当前学生模型的分类器classifier确定样本对应的预测概率,包括锚定样本的预测概率p anchor和至少一个知识传递样本的预测概率{p 1,…,p N},对于锚定样本,基于样本相似度对至少一个知识传递样本的{p 1,…,p N}进行加权传递和整合,以形成锚定样本的软标签,并通过蒸馏的方式迁移至锚定样本。 Further, the predicted probability corresponding to the sample is determined by the classifier of the current student model, including the predicted probability p anchor of the anchored sample and the predicted probability {p 1 , ..., p N } of the at least one knowledge transfer sample. For the anchored sample , based on the sample similarity, weighted transfer and integration of {p 1 , .
表1为多教师/学生模型的知识整合蒸馏算法即MEAL和KDCL,与本公开实施例提出的知识加权传递及整合的自蒸馏算法的有效性比较:Table 1 is a comparison of the effectiveness of the knowledge integration distillation algorithms of the multi-teacher/student model, namely MEAL and KDCL, with the knowledge weighted transfer and integrated self-distillation algorithms proposed in the embodiments of the present disclosure:
表1Table 1
方法method 训练次数training times 标签准确性Label Accuracy 额外网络extra network
MEALMEAL 180180 78.278.2 ResNet-101&152ResNet-101&152
KDCLKDCL 200200 77.877.8 ResNet-18ResNet-18
本方案This program 100100 78,078,0 none
基于表1可知,图1和图2所示的MEAL和KDCL方法都需要额外的网络辅助,而相比,本公开实施例提出的知识加权传递及整合的自蒸馏算法无需额外的网络辅助,并且本方案在单个网络下进行更少的训练时间,如100次,便可以获得与线管技术中MEAL和KDCL方法相似的结果,甚至于使用KDCL方法一半的训练次数可以获得与其相似的标签性能。Based on Table 1, it can be seen that the MEAL and KDCL methods shown in FIG. 1 and FIG. 2 both require additional network assistance, while the self-distillation algorithm for weighted knowledge transfer and integration proposed in the embodiments of the present disclosure does not require additional network assistance, and This scheme performs less training time under a single network, such as 100 times, and can obtain similar results to MEAL and KDCL methods in line pipe technology, and even half the training times of KDCL method can obtain similar label performance.
表2为本公开实施例提出的知识加权传递及整合的自蒸馏算法在多种网络体系结构上的有效性:Table 2 is the effectiveness of the self-distillation algorithm of knowledge weighted transfer and integration proposed in the embodiment of the present disclosure on various network architectures:
表2Table 2
Figure PCTCN2021121379-appb-000043
Figure PCTCN2021121379-appb-000043
由表2可知,相关技术指传统交叉熵损失的训练,本公开实施例提出的知识加权传递及整合的自蒸馏算法在广泛使用的架构如ResNet-5;评估更深/更广的架构如ResNet-152和ResNeXt-152;以及较轻的架构如MobileNet-V2,都以最小的计算开销提高了网络性能,且仅需要花费较少的图形处理器(Graphic Processing Unit,GPU)时间。It can be seen from Table 2 that the related art refers to the training of traditional cross-entropy loss, and the self-distillation algorithm of knowledge weighted transfer and integration proposed in the embodiment of the present disclosure is widely used in architectures such as ResNet-5; evaluation of deeper/wider architectures such as ResNet- 152 and ResNeXt-152; and lighter architectures such as MobileNet-V2, both improve network performance with minimal computational overhead and require less Graphics Processing Unit (GPU) time.
如ResNet-50体系架构由76.8提升至78.0,且仅需要花费3.7%的时间。For example, the ResNet-50 architecture is improved from 76.8 to 78.0, and it only takes 3.7% of the time.
表3为本公开实施例提出的知识加权传递及整合的自蒸馏算法与相关技术中的自蒸馏方法的有效性比较:Table 3 compares the effectiveness of the self-distillation algorithm of knowledge weighted transfer and integration proposed in the embodiment of the present disclosure and the self-distillation method in the related art:
表3table 3
Figure PCTCN2021121379-appb-000044
Figure PCTCN2021121379-appb-000044
由表3可知,虽然传统自蒸馏方法以及一系列如Label smoothing、Tf-KD regKD reg、BAN、CS-KD以及Tf-KD aelf等标签正则化算法都是基于单个网络的,但是与传统自蒸馏算法以及标签正则化算法相比,本公开实施例提出的知识加权传递及整合的自蒸馏算法在数据集ImageNet上的训练结果均超越上述传统自蒸馏算法以及标签正则化算法。如在体系结构ResNet-50上的无教师Tf-KD reg正则化算法,其标签准确性为77.5%,但仍比本方案低于0.5%。 As can be seen from Table 3, although the traditional self-distillation method and a series of label regularization algorithms such as Label smoothing, Tf-KD reg KD reg , BAN, CS-KD and Tf-KD aelf are all based on a single network, they are different from traditional self-distillation methods. Compared with the distillation algorithm and the label regularization algorithm, the training results of the self-distillation algorithm with weighted knowledge transfer and integration proposed in the embodiments of the present disclosure on the dataset ImageNet all surpass the above-mentioned traditional self-distillation algorithm and label regularization algorithm. For example, the teacherless Tf-KD reg regularization algorithm on the architecture ResNet-50, its label accuracy is 77.5%, but still lower than 0.5% of this scheme.
可见,本公开实施例提出的知识加权传递及整合的自蒸馏算法,不仅可以通过在单个 网络中实现知识整合来节省内存和时间,而且可以通过在同一小批量中从一组样本中聚合知识来生成同样强大的软标签。It can be seen that the self-distillation algorithm for weighted transfer and integration of knowledge proposed in the embodiments of the present disclosure can not only save memory and time by realizing knowledge integration in a single network, but also can aggregate knowledge from a group of samples in the same mini-batch to save memory and time. Generate equally powerful soft labels.
基于上述实施例,在本公开的在一实施例中,图16为本公开实施例提出的神经网络训练装置的组成结构示意图,如图16所示,所述神经网络的训练装置10包括获取部分11、训练部分12、选择部分13、采样部分14以及确定部分15。Based on the above embodiments, in an embodiment of the present disclosure, FIG. 16 is a schematic diagram of the composition and structure of a neural network training apparatus proposed in an embodiment of the present disclosure. As shown in FIG. 16 , the neural network training apparatus 10 includes an acquisition part 11. Training part 12, selection part 13, sampling part 14 and determination part 15.
所述获取部分11,配置为执行循环过程直至获得满足预设数量的锚定样本的软标签;其中,所述循环过程包括以下步骤:获取当前训练样本集,在执行循环过程的每个周期中从所述当前训练样本集中确定所述当前锚定样本和至少一个知识传递样本;其中,所述当前锚定样本为所述当前训练样本集中的任意一个,所述至少一个知识传递样本为所述当前训练样本集中、所述当前锚定样本以外的至少一个其他样本;基于所述神经网络,确定所述当前锚定样本与每一所述知识传递样本之间的相似度、以及所述当前锚定样本的预测概率和每一所述知识传递样本的预测概率;基于所述当前锚定样本与每一所述知识传递样本之间的相似度,所述当前锚定样本的预测概率和每一所述知识传递样本的预测概率,确定所述当前锚定样本的软标签。The acquisition part 11 is configured to perform a looping process until soft labels satisfying a preset number of anchor samples are obtained; wherein, the looping process includes the following steps: acquiring the current training sample set, in each cycle of executing the looping process The current anchor sample and at least one knowledge transfer sample are determined from the current training sample set; wherein, the current anchor sample is any one of the current training sample set, and the at least one knowledge transfer sample is the At least one other sample other than the current anchor sample in the current training sample set; based on the neural network, determine the similarity between the current anchor sample and each of the knowledge transfer samples, and the current anchor The predicted probability of the anchor sample and the predicted probability of each of the knowledge transfer samples; based on the similarity between the current anchor sample and each of the knowledge transfer samples, the predicted probability of the current anchor sample and each of the knowledge transfer samples The predicted probability of the knowledge transfer sample determines the soft label of the current anchor sample.
所述训练部分12,配置为至少基于所述预设数量的锚定样本的软标签和所述预设数量的锚定样本,对神经网络进行训练。The training part 12 is configured to train the neural network based on at least the soft labels of the preset number of anchor samples and the preset number of anchor samples.
在一些实施例中,所述训练部分12,配置为基于每一所述锚定样本的软标签和对应该锚定样本的预测概率,确定每一所述锚定样本的相对熵;基于每一所述锚定样本的硬标签和对应该锚定样本的预测概率,确定每一所述锚定样本的交叉熵;基于所述预设数量的锚定样本的交叉熵和所述预设数量的锚定样本的相对熵对所述神经网络进行训练。In some embodiments, the training portion 12 is configured to determine the relative entropy of each of the anchor samples based on the soft label of each of the anchor samples and the predicted probability corresponding to the anchor sample; based on each of the anchor samples The hard label of the anchor sample and the predicted probability corresponding to the anchor sample determine the cross entropy of each anchor sample; based on the cross entropy of the preset number of anchor samples and the preset number of The relative entropy of the anchored samples trains the neural network.
在一些实施例中,所述神经网络包括编码器和分类器,所述获取部分12,配置为基于所述神经网络的编码器,确定所述当前锚定样本的样本特征和每一所述知识传递样本的样本特征;基于所述当前锚定样本的样本特征和每一所述知识传递样本的样本特征,确定所述当前锚定样本与每一所述知识传递样本之间的相似度;基于所述神经网络的分类器,确定所述当前锚定样本的预测概率和每一所述知识传递样本的预测概率。In some embodiments, the neural network includes an encoder and a classifier, and the acquisition section 12 is configured to determine the sample characteristics of the current anchor sample and each of the knowledge based on the encoder of the neural network The sample feature of the transfer sample; based on the sample feature of the current anchor sample and the sample feature of each of the knowledge transfer samples, determine the similarity between the current anchor sample and each of the knowledge transfer samples; based on The classifier of the neural network determines the predicted probability of the current anchor sample and the predicted probability of each of the knowledge transfer samples.
在一些实施例中,所述获取部分11,还配置为对所述当前锚定样本的样本特征进行归一化处理,得到所述当前锚定样本的归一化样本特征;对每一所述知识传递样本的样本特征进行归一化处理,得到每一所述知识传递样本的归一化特征;对所述当前锚定样本的归一化样本特征和每一所述知识传递样本的归一化特征进行点积运算处理,得到所述当前锚定样本与每一所述知识传递样本之间的相似度。In some embodiments, the acquiring part 11 is further configured to perform normalization processing on the sample features of the current anchor sample to obtain the normalized sample features of the current anchor sample; The sample features of the knowledge transfer samples are normalized to obtain the normalized features of each of the knowledge transfer samples; the normalized sample features of the current anchor samples and the normalized sample features of each of the knowledge transfer samples are normalized A dot product operation is performed on the transformed features to obtain the similarity between the current anchor sample and each of the knowledge transfer samples.
在一些实施例中,所述获取部分11,还配置为基于所述当前锚定样本与每一所述知识传递样本之间的相似度,确定每一所述知识传递样本对所述当前锚定样本的知识传递参数;基于每一所述知识传递样本对所述当前锚定样本的知识传递参数、所述当前锚定样本的预测概率以及每一所述知识传递样本的预测概率,确定所述当前锚定样本的软标签。In some embodiments, the obtaining part 11 is further configured to determine, based on the similarity between the current anchor sample and each of the knowledge transfer samples, that each of the knowledge transfer samples corresponds to the current anchor knowledge transfer parameters of the samples; determine the Soft label for the current anchor sample.
在一些实施例中,所述获取部分11,还配置为对所述当前锚定样本与每一所述知识传递样本之间的至少一个相似度进行累加处理,得到相似度累加值;基于所述锚定样本与每一所述知识传递样本之间的相似度、所述相似度累加值,确定每一所述知识传递样本对所述当前锚定样本的知识传递参数。In some embodiments, the acquiring part 11 is further configured to perform an accumulation process on at least one similarity between the current anchor sample and each of the knowledge transfer samples to obtain an accumulated similarity value; based on the The similarity between the anchor sample and each of the knowledge transfer samples and the accumulated value of the similarity determine the knowledge transfer parameter of each of the knowledge transfer samples to the current anchor sample.
在一些实施例中,所述获取部分11,还配置为基于每一所述知识传递样本对所述当前锚定样本的知识传递参数和每一所述知识传递样本的预测概率执行一次知识传递处理,得到所述当前锚定样本的初始知识传递概率;基于所述初始知识传递概率和所述当前锚定样本的预测概率执行一次知识融合处理,得到所述当前锚定样本的初始软标签;基于所述当前锚定样本的初始软标签,执行循环过程直至所述至少一个知识传递样本的预测概率小于预设概率阈值的情况下,得到所述当前锚定样本的软标签;其中,所述循环过程包括:在循环过程的每个周期中,基于上一周期得到的所述当前锚定样本的软标签和每一所述知识 传递参数执行知识传递处理,获得所述当前锚定样本的知识传递概率;基于所述当前锚定样本的知识传递概率和所述当前锚定样本的预测概率执行知识融合处理,得到下一周期的所述当前锚定样本的软标签。In some embodiments, the acquiring part 11 is further configured to perform a knowledge transfer process on the knowledge transfer parameters of the current anchor sample and the predicted probability of each knowledge transfer sample based on each of the knowledge transfer samples , obtain the initial knowledge transfer probability of the current anchor sample; perform a knowledge fusion process based on the initial knowledge transfer probability and the predicted probability of the current anchor sample, and obtain the initial soft label of the current anchor sample; based on For the initial soft label of the current anchor sample, a loop process is performed until the predicted probability of the at least one knowledge transfer sample is less than a preset probability threshold, and the soft label of the current anchor sample is obtained; wherein, the loop The process includes: in each cycle of the cyclic process, performing knowledge transfer processing based on the soft label of the current anchor sample obtained in the previous cycle and each of the knowledge transfer parameters to obtain the knowledge transfer of the current anchor sample probability; perform knowledge fusion processing based on the knowledge transfer probability of the current anchor sample and the prediction probability of the current anchor sample to obtain the soft label of the current anchor sample in the next cycle.
在一些实施例中,所述获取部分11,还配置为基于每一所述知识传递样本对所述当前锚定样本的知识传递参数和所述每一所述知识传递样本的预测概率,确定每一所述知识传递样本对所述当前锚定样本的知识传递概率;对每一所述知识传递样本对所述当前锚定样本的至少一个知识传递概率进行累加处理,得到知识传递概率累加值;基于所述知识传递概率累加值进行一次知识传递处理,得到所述当前锚定样本的初始知识传递概率。In some embodiments, the acquiring part 11 is further configured to determine each knowledge transfer sample based on the knowledge transfer parameter of each of the knowledge transfer samples to the current anchor sample and the predicted probability of each of the knowledge transfer samples. 1. The knowledge transfer probability of the knowledge transfer sample to the current anchor sample; Accumulate at least one knowledge transfer probability of the current anchor sample for each knowledge transfer sample to obtain an accumulated value of the knowledge transfer probability; A knowledge transfer process is performed based on the accumulated value of the knowledge transfer probability to obtain the initial knowledge transfer probability of the current anchor sample.
在一些实施例中,所述获取部分11,还配置为获取训练数据集,所述训练数据集包括至少一个批次的训练数据子集。In some embodiments, the obtaining part 11 is further configured to obtain a training data set, the training data set including at least one batch of training data subsets.
在一些实施例中,所述选择部分13,配置为从所述训练数据集中选择一个批次的之前未被选中作为训练样本集的所述训练数据子集,作为所述当前训练样本集。In some embodiments, the selection part 13 is configured to select a batch of the training data subsets that were not previously selected as the training sample set from the training data set, as the current training sample set.
在一些实施例中,所述采样部分14,配置为对所述训练数据集执行随机采样处理,得到至少一个第一训练数据。In some embodiments, the sampling part 14 is configured to perform random sampling processing on the training data set to obtain at least one first training data.
在一些实施例中,所述确定部分15,还配置为确定所述第一样本对应的目标初始硬标签。In some embodiments, the determining part 15 is further configured to determine a target initial hard label corresponding to the first sample.
在一些实施例中,所述采样部分14,还配置为基于每一所述第一训练数据的硬标签对所述训练数据集中未被选中作为所述第一训练数据的剩余数据进行相似度采样处理,得到每一所述第一训练数据对应的至少一个第二训练数据。In some embodiments, the sampling part 14 is further configured to perform similarity sampling on the remaining data in the training data set that are not selected as the first training data based on the hard label of each of the first training data processing to obtain at least one second training data corresponding to each of the first training data.
在一些实施例中,所述确定部分15,还配置为将基于所述至少一个第一训练数据和每一所述第一训练数据对应的至少一个第二训练数据构建的一个批次的所述训练数据子集,作为所述当前训练样本集。In some embodiments, the determining part 15 is further configured to construct a batch of the said at least one first training data and at least one second training data corresponding to each said first training data The training data subset is used as the current training sample set.
在本公开的实施例中,进一步地,图17为本公开实施例提出的计算机设备的组成结构示意图,如图17所示,本公开实施例提出的计算机设备20还可以包括处理器21、存储有处理器21可执行指令的存储器22,进一步地,活体检测设备20还可以包括通信接口23,和用于连接处理器21、存储器22以及通信接口23的总线24。In the embodiment of the present disclosure, further, FIG. 17 is a schematic diagram of the composition structure of the computer device proposed by the embodiment of the present disclosure. As shown in FIG. 17 , the computer device 20 proposed by the embodiment of the present disclosure may further include a processor 21, a storage There is a memory 22 in which the processor 21 can execute instructions. Further, the living body detection device 20 may further include a communication interface 23 and a bus 24 for connecting the processor 21 , the memory 22 and the communication interface 23 .
在本公开的实施例中,上述处理器21可以为特定用途集成电路(Application Specific Integrated Circuit,ASIC)、数字信号处理器(Digital Signal Processor,DSP)、数字信号处理装置(Digital Signal Processing Device,DSPD)、可编程逻辑装置(ProgRAMmable Logic Device,PLD)、现场可编程门阵列(Field Prog RAMmable Gate Array,FPGA)、中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器中的至少一种。可以理解地,对于不同的设备,用于实现上述处理器功能的电子器件还可以为其它,本公开实施例不作具体限定。活体检测设备20还可以包括存储器22,该存储器22可以与处理器21连接,其中,存储器22用于存储可执行程序代码,该程序代码包括计算机操作指令,存储器22可能包含高速RAM存储器,也可能还包括非易失性存储器,例如,至少两个磁盘存储器。In the embodiment of the present disclosure, the above-mentioned processor 21 may be an application specific integrated circuit (ASIC), a digital signal processor (Digital Signal Processor, DSP), a digital signal processing device (Digital Signal Processing Device, DSPD) ), Programmable Logic Device (ProgRAMmable Logic Device, PLD), Field Programmable Gate Array (Field Prog RAMmable Gate Array, FPGA), Central Processing Unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor at least one of them. It can be understood that, for different devices, the electronic device used to implement the function of the processor may also be other, which is not specifically limited in the embodiment of the present disclosure. The living body detection device 20 may also include a memory 22, which may be connected to the processor 21, wherein the memory 22 is used to store executable program codes, which include computer operating instructions, and the memory 22 may include high-speed RAM memory, or may Also included is non-volatile memory, eg, at least two disk drives.
在本公开的实施例中,总线24用于连接通信接口23、处理器21以及存储器22以及这些器件之间的相互通信。In the embodiment of the present disclosure, the bus 24 is used to connect the communication interface 23 , the processor 21 and the memory 22 and the mutual communication among these devices.
在本公开的实施例中,存储器22,用于存储指令和数据。In the embodiment of the present disclosure, the memory 22 is used to store instructions and data.
进一步地,在本公开的实施例中,上述处理器21,用于执行循环过程直至获得满足预设数量的锚定样本的软标签;至少基于所述预设数量的锚定样本的软标签和所述预设数量的锚定样本,对神经网络进行训练;其中,所述循环过程包括以下步骤:获取当前训练样本集,在执行循环过程的每个周期中从所述当前训练样本集中确定所述当前锚定样本和至少一个知识传递样本;其中,所述当前锚定样本为所述当前训练样本集中的任意一个,所述至少一个知识传递样本为所述当前训练样本集中、所述当前锚定样本以外的至少一个其 他样本;基于所述神经网络,确定所述当前锚定样本与每一所述知识传递样本之间的相似度、以及所述当前锚定样本的预测概率和每一所述知识传递样本的预测概率;基于所述当前锚定样本与每一所述知识传递样本之间的相似度,所述当前锚定样本的预测概率和每一所述知识传递样本的预测概率,确定所述当前锚定样本的软标签。Further, in the embodiment of the present disclosure, the above-mentioned processor 21 is configured to perform a loop process until a soft label satisfying a preset number of anchor samples is obtained; at least based on the soft label and the preset number of anchor samples The preset number of anchor samples is used to train the neural network; wherein, the cyclic process includes the following steps: acquiring a current training sample set, and determining the current training sample set from the current training sample set in each cycle of executing the cyclic process. The current anchor sample and at least one knowledge transfer sample; wherein, the current anchor sample is any one of the current training sample set, and the at least one knowledge transfer sample is the current training sample set, the current anchor sample at least one other sample other than the anchor sample; based on the neural network, determine the similarity between the current anchor sample and each of the knowledge transfer samples, and the predicted probability of the current anchor sample and each the predicted probability of the knowledge transfer sample; based on the similarity between the current anchor sample and each of the knowledge transfer samples, the predicted probability of the current anchor sample and the predicted probability of each of the knowledge transfer samples, A soft label for the current anchor sample is determined.
在实际应用中,上述存储器22可以是易失性存储器(volatile memory),例如随机存取存储器(Random-Access Memory,RAM);或者非易失性存储器(non-volatile memory),例如只读存储器(Read-Only Memory,ROM),快闪存储器(flash memory),硬盘(Hard Disk Drive,HDD)或固态硬盘(Solid-State Drive,SSD);或者上述种类的存储器的组合,并向处理器21提供指令和数据。In practical applications, the above-mentioned memory 22 may be a volatile memory (volatile memory), such as a random access memory (Random-Access Memory, RAM); or a non-volatile memory (non-volatile memory), such as a read-only memory (Read-Only Memory, ROM), flash memory (flash memory), hard disk (Hard Disk Drive, HDD) or solid-state drive (Solid-State Drive, SSD); or a combination of the above types of memory, and send it to the processor 21 Provide instructions and data.
另外,在本实施例中的各功能模块可以集成在一个推荐单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。In addition, each functional module in this embodiment may be integrated into one recommendation unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of software function modules.
集成的单元如果以软件功能模块的形式实现并非作为独立的产品进行销售或使用时,可以存储在一个计算机可读取存储介质中,基于这样的理解,本实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或processor(处理器)执行本实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of software function modules and is not sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this embodiment is essentially or correct. Part of the contribution made by the prior art or all or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium, and includes several instructions to make a computer device (which can be a personal A computer, a server, or a network device, etc.) or a processor (processor) executes all or part of the steps of the method in this embodiment. The aforementioned storage medium includes: U disk, mobile hard disk, read only memory (Read Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes.
本公开实施例提供了一种计算机设备,该计算机设备可以通过可以执行循环过程直至获得满足预设数量的锚定样本的软标签;至少基于预设数量的锚定样本的软标签和预设数量的锚定样本,对神经网络进行训练;其中,循环过程包括以下步骤:获取当前训练样本集,在执行循环过程的每个周期中从当前训练样本集中确定当前锚定样本和至少一个知识传递样本;其中,当前锚定样本为当前训练样本集中的任意一个,至少一个知识传递样本为当前训练样本集中、当前锚定样本以外的至少一个其他样本;基于神经网络,确定当前锚定样本与每一知识传递样本之间的相似度、以及当前锚定样本的预测概率和每一知识传递样本的预测概率;基于当前锚定样本与每一知识传递样本之间的相似度,当前锚定样本的预测概率和每一知识传递样本的预测概率,确定当前锚定样本的软标签。An embodiment of the present disclosure provides a computer device that can perform a loop process until soft labels satisfying a preset number of anchor samples are obtained; at least based on the soft labels of the preset number of anchor samples and the preset number The anchor samples are used to train the neural network; wherein, the cyclic process includes the following steps: obtaining the current training sample set, and determining the current anchor sample and at least one knowledge transfer sample from the current training sample set in each cycle of executing the cyclic process. ; wherein, the current anchor sample is any one of the current training sample set, and at least one knowledge transfer sample is at least one other sample other than the current training sample set and the current anchor sample; based on the neural network, determine the current anchor sample and each The similarity between knowledge transfer samples, the predicted probability of the current anchor sample and the predicted probability of each knowledge transfer sample; based on the similarity between the current anchor sample and each knowledge transfer sample, the prediction of the current anchor sample The probability and the predicted probability of each knowledge transfer sample determine the soft label of the current anchor sample.
如此,针对神经网络下的每个训练样本,可利用其他样本与该样本间的相似度以及其他样本的预测概率来辅助该训练样本软标签的生成,进而基于满足预设数量的训练样本的软标签为神经网络执行高效的训练监督。可见,本公开以同一神经网络下跨样本的知识整合代替传统的跨网络的知识整合,在仅利用单个网络的基础上实现了基于样本间相似度的知识整合并获得有效的软标签。In this way, for each training sample under the neural network, the similarity between other samples and the sample and the predicted probability of other samples can be used to assist the generation of the soft label of the training sample, and then based on the soft label of the training sample that meets the preset number. Labels perform efficient training supervision for neural networks. It can be seen that the present disclosure replaces traditional cross-network knowledge integration with cross-sample knowledge integration under the same neural network, realizes knowledge integration based on similarity between samples and obtains effective soft labels on the basis of only using a single network.
本公开实施例提供一种计算机可读存储介质,其上存储有程序,该程序被处理器执行时实现如上所述的神经网络训练方法。Embodiments of the present disclosure provide a computer-readable storage medium on which a program is stored, and when the program is executed by a processor, implements the above-described neural network training method.
具体来讲,本实施例中的一种神经网络训练方法对应的程序指令可以被存储在光盘,硬盘,U盘等存储介质上,当存储介质中的与一种神经网络训练方法对应的程序指令被一电子设备读取或被执行时,包括如下步骤:Specifically, the program instructions corresponding to a neural network training method in this embodiment may be stored on a storage medium such as an optical disk, a hard disk, a U disk, etc. When the program instructions corresponding to a neural network training method in the storage medium When read or executed by an electronic device, it includes the following steps:
执行循环过程直至获得满足预设数量的锚定样本的软标签;Perform a looping process until soft labels satisfying a preset number of anchor samples are obtained;
至少基于所述预设数量的锚定样本的软标签和所述预设数量的锚定样本,对神经网络进行训练;training the neural network based on at least the soft labels of the preset number of anchor samples and the preset number of anchor samples;
其中,所述循环过程包括以下步骤:Wherein, the cycle process includes the following steps:
获取当前训练样本集,在执行循环过程的每个周期中从所述当前训练样本集中确定所述当前锚定样本和至少一个知识传递样本;其中,所述当前锚定样本为所述当前训练样本 集中的任意一个,所述至少一个知识传递样本为所述当前训练样本集中、所述当前锚定样本以外的至少一个其他样本;基于所述神经网络,确定所述当前锚定样本与每一所述知识传递样本之间的相似度、以及所述当前锚定样本的预测概率和每一所述知识传递样本的预测概率;基于所述当前锚定样本与每一所述知识传递样本之间的相似度,所述当前锚定样本的预测概率和每一所述知识传递样本的预测概率,确定所述当前锚定样本的软标签。Obtain the current training sample set, and determine the current anchor sample and at least one knowledge transfer sample from the current training sample set in each cycle of the execution cycle; wherein, the current anchor sample is the current training sample Any one of the set, the at least one knowledge transfer sample is at least one other sample other than the current anchor sample in the current training sample set; based on the neural network, determine the current anchor sample and each similarity between the knowledge transfer samples, the predicted probability of the current anchor sample and the predicted probability of each of the knowledge transfer samples; based on the current anchor sample and each of the knowledge transfer samples The similarity, the predicted probability of the current anchor sample and the predicted probability of each of the knowledge transfer samples, determine the soft label of the current anchor sample.
相应地,本公开实施例再提供一种计算机程序产品,所述计算机程序产品包括计算机可执行指令,该计算机可执行指令用于实现本公开实施例提出的神经网络训练方法中的步骤。Correspondingly, the embodiments of the present disclosure further provide a computer program product, where the computer program product includes computer-executable instructions for implementing the steps in the neural network training method proposed by the embodiments of the present disclosure.
本领域内的技术人员应明白,本公开的实施例可提供为方法、系统、或计算机程序产品。因此,本公开可采用硬件实施例、软件实施例、或结合软件和硬件方面的实施例的形式。而且,本公开可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied therein, including but not limited to disk storage, optical storage, and the like.
本公开是参照根据本公开实施例的方法、设备(系统)、和计算机程序产品的实现流程示意图和/或方框图来描述的。应理解可由计算机程序指令实现流程示意图和/或方框图中的每一流程和/或方框、以及实现流程示意图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程计算机设备的处理器以产生一个机器,使得通过计算机或其他可编程计算机设备的处理器执行的指令产生用于实现在实现流程示意图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present disclosure is described with reference to schematic flowchart illustrations and/or block diagrams of implementations of methods, apparatus (systems), and computer program products according to embodiments of the present disclosure. It will be understood that each process and/or block in the schematic flowchart illustrations and/or block diagrams, and combinations of processes and/or blocks in the schematic flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor or other programmable computing device to produce a machine such that the instructions executed by the processor of the computer or other programmable computing device produce a Means for implementing the functions specified in one or more of the flowcharts and/or one or more blocks of the block diagrams.
这些计算机程序指令也可存储在能引导计算机或其他可编程计算机设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在实现流程示意图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。The computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable computer device to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instruction means Implements the functionality specified in the flow or flows of the implementation flow diagram and/or the block or blocks of the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程计算机设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在实现流程示意图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions may also be loaded onto a computer or other programmable computing device such that a series of operational steps are performed on the computer or other programmable device to produce a computer-implemented process, whereby the instructions to be executed on the computer or other programmable device Steps are provided for implementing the functions specified in the flow or blocks of the implementation flow diagram and/or the block or blocks of the block diagram.
以上所述,仅为本公开的较佳实施例而已,并非用于限定本公开的保护范围。The above descriptions are merely preferred embodiments of the present disclosure, and are not intended to limit the protection scope of the present disclosure.
工业实用性Industrial Applicability
本公开实施例中,通过执行循环过程直至获得满足预设数量的锚定样本的软标签;至少基于预设数量的锚定样本的软标签和预设数量的锚定样本,对神经网络进行训练;其中,在执行循环过程的每个周期中从当前训练样本集中确定当前锚定样本和至少一个知识传递样本;基于神经网络,确定当前锚定样本与每一知识传递样本之间的相似度、以及当前锚定样本的预测概率和每一知识传递样本的预测概率;基于当前锚定样本与每一知识传递样本之间的相似度,当前锚定样本的预测概率和每一知识传递样本的预测概率,确定当前锚定样本的软标签。实现了自蒸馏算法下的知识整合。In the embodiment of the present disclosure, a loop process is performed until soft labels satisfying a preset number of anchor samples are obtained; the neural network is trained based on at least the soft labels of the preset number of anchor samples and the preset number of anchor samples ; wherein, in each cycle of the execution cycle process, the current anchor sample and at least one knowledge transfer sample are determined from the current training sample set; based on the neural network, the similarity between the current anchor sample and each knowledge transfer sample is determined, and the predicted probability of the current anchor sample and the predicted probability of each knowledge transfer sample; based on the similarity between the current anchor sample and each knowledge transfer sample, the predicted probability of the current anchor sample and the prediction of each knowledge transfer sample Probability to determine the soft label of the current anchor sample. The knowledge integration under the self-distillation algorithm is realized.

Claims (24)

  1. 一种神经网络的训练方法,所述方法包括:A method for training a neural network, the method comprising:
    执行循环过程直至获得满足预设数量的锚定样本的软标签;Perform a looping process until soft labels satisfying a preset number of anchor samples are obtained;
    至少基于所述预设数量的锚定样本的软标签和所述预设数量的锚定样本,对神经网络进行训练;training the neural network based on at least the soft labels of the preset number of anchor samples and the preset number of anchor samples;
    其中,所述循环过程包括以下步骤:Wherein, the cycle process includes the following steps:
    获取当前训练样本集,在执行循环过程的每个周期中从所述当前训练样本集中确定所述当前锚定样本和至少一个知识传递样本;其中,所述当前锚定样本为所述当前训练样本集中的任意一个,所述至少一个知识传递样本为所述当前训练样本集中、所述当前锚定样本以外的至少一个其他样本;Obtain the current training sample set, and determine the current anchor sample and at least one knowledge transfer sample from the current training sample set in each cycle of the execution cycle; wherein, the current anchor sample is the current training sample Any one of the set, the at least one knowledge transfer sample is at least one other sample other than the current anchor sample in the current training sample set;
    基于所述神经网络,确定所述当前锚定样本与每一所述知识传递样本之间的相似度、以及所述当前锚定样本的预测概率和每一所述知识传递样本的预测概率;Based on the neural network, determining the similarity between the current anchor sample and each of the knowledge transfer samples, and the predicted probability of the current anchor sample and the predicted probability of each of the knowledge transfer samples;
    基于所述当前锚定样本与每一所述知识传递样本之间的相似度,所述当前锚定样本的预测概率和每一所述知识传递样本的预测概率,确定所述当前锚定样本的软标签。Based on the similarity between the current anchor sample and each of the knowledge transfer samples, the predicted probability of the current anchor sample and the predicted probability of each of the knowledge transfer samples, determine the soft label.
  2. 根据权利要求1所述的方法,其中,所述至少基于所述预设数量的锚定样本的软标签和所述预设数量的锚定样本,对神经网络进行训练,包括:The method according to claim 1, wherein the training of the neural network based on at least the soft labels of the preset number of anchor samples and the preset number of anchor samples comprises:
    基于每一所述锚定样本的软标签和对应该锚定样本的预测概率,确定每一所述锚定样本的相对熵;determining the relative entropy of each of the anchor samples based on the soft label of each of the anchor samples and the predicted probability corresponding to the anchor sample;
    基于每一所述锚定样本的硬标签和对应该锚定样本的预测概率,确定每一所述锚定样本的交叉熵;determining a cross-entropy for each of the anchor samples based on the hard label of each of the anchor samples and the predicted probability corresponding to the anchor sample;
    基于所述预设数量的锚定样本的交叉熵和所述预设数量的锚定样本的相对熵对所述神经网络进行训练。The neural network is trained based on the cross-entropy of the preset number of anchor samples and the relative entropy of the preset number of anchor samples.
  3. 根据权利要求1或2所述的方法,其中,所述神经网络包括编码器和分类器;所述基于所述神经网络,确定所述当前锚定样本与每一所述知识传递样本之间的相似度、以及所述当前锚定样本的预测概率和每一所述知识传递样本的预测概率,包括:The method according to claim 1 or 2, wherein the neural network comprises an encoder and a classifier; and the determination between the current anchor sample and each of the knowledge transfer samples is based on the neural network. Similarity, and the predicted probability of the current anchor sample and the predicted probability of each of the knowledge transfer samples, including:
    基于所述神经网络的编码器,确定所述当前锚定样本的样本特征和每一所述知识传递样本的样本特征;Based on the encoder of the neural network, determine the sample feature of the current anchor sample and the sample feature of each of the knowledge transfer samples;
    基于所述当前锚定样本的样本特征和每一所述知识传递样本的样本特征,确定所述当前锚定样本与每一所述知识传递样本之间的相似度;determining the similarity between the current anchor sample and each of the knowledge transfer samples based on the sample feature of the current anchor sample and the sample feature of each of the knowledge transfer samples;
    基于所述神经网络的分类器,确定所述当前锚定样本的预测概率和每一所述知识传递样本的预测概率。A classifier based on the neural network determines a predicted probability for the current anchor sample and a predicted probability for each of the knowledge transfer samples.
  4. 根据权利要求3所述的方法,其中,所述基于所述当前锚定样本的样本特征和每一所述知识传递样本的样本特征,确定所述当前锚定样本与每一所述知识传递样本之间的相似度,包括:The method of claim 3, wherein the current anchor sample and each of the knowledge transfer samples are determined based on the sample feature of the current anchor sample and the sample feature of each of the knowledge transfer samples Similarities between, including:
    对所述当前锚定样本的样本特征进行归一化处理,得到所述当前锚定样本的归一化样本特征;normalizing the sample features of the current anchor sample to obtain the normalized sample features of the current anchor sample;
    对每一所述知识传递样本的样本特征进行归一化处理,得到每一所述知识传递样本的归一化特征;Normalizing the sample features of each of the knowledge transfer samples to obtain the normalized features of each of the knowledge transfer samples;
    对所述当前锚定样本的归一化样本特征和每一所述知识传递样本的归一化特征进行点积运算处理,得到所述当前锚定样本与每一所述知识传递样本之间的相似度。The normalized sample feature of the current anchor sample and the normalized feature of each of the knowledge transfer samples are subjected to a dot product operation to obtain the difference between the current anchor sample and each of the knowledge transfer samples. similarity.
  5. 根据权利要求1至4任一项所述的方法,其中,所述基于所述当前锚定样本与每一所述知识传递样本之间的相似度,所述当前锚定样本的预测概率和每一所述知识传递样本的预测概率,确定所述当前锚定样本的软标签,包括:The method according to any one of claims 1 to 4, wherein the predicted probability of the current anchor sample and each of the knowledge transfer samples are based on the similarity between the current anchor sample and each of the knowledge transfer samples. 1. The predicted probability of the knowledge transfer sample, to determine the soft label of the current anchor sample, including:
    基于所述当前锚定样本与每一所述知识传递样本之间的相似度,确定每一所述知识传递样本对所述当前锚定样本的知识传递参数;determining a knowledge transfer parameter of each of the knowledge transfer samples to the current anchor sample based on the similarity between the current anchor sample and each of the knowledge transfer samples;
    基于每一所述知识传递样本对所述当前锚定样本的知识传递参数、每一所述知识传递样本的预测概率以及所述当前锚定样本的预测概率,确定所述当前锚定样本的软标签。Based on the knowledge transfer parameters of each of the knowledge transfer samples to the current anchor sample, the predicted probability of each of the knowledge transfer samples, and the predicted probability of the current anchor sample, determine the softness of the current anchor sample. Label.
  6. 根据权利要求5所述的方法,其中,所述基于所述当前锚定样本与每一所述知识传递样本之间的相似度,确定每一所述知识传递样本对所述当前锚定样本的知识传递参数,包括:6. The method of claim 5, wherein the determination of the similarity of each of the knowledge transfer samples to the current anchor sample is based on the similarity between the current anchor sample and each of the knowledge transfer samples Knowledge transfer parameters, including:
    对所述当前锚定样本与每一所述知识传递样本之间的至少一个相似度进行累加处理,得到相似度累加值;Accumulating at least one similarity between the current anchor sample and each of the knowledge transfer samples to obtain an accumulated similarity value;
    基于所述锚定样本与每一所述知识传递样本之间的相似度、所述相似度累加值,确定每一所述知识传递样本对所述当前锚定样本的知识传递参数。Based on the similarity between the anchor sample and each of the knowledge transfer samples, and the accumulated similarity value, a knowledge transfer parameter of each of the knowledge transfer samples to the current anchor sample is determined.
  7. 根据权利要求5或6所述的方法,其中,所述基于每一所述知识传递样本对所述当前锚定样本的知识传递参数、每一所述知识传递样本的预测概率以及所述当前锚定样本的预测概率,确定所述当前锚定样本的软标签,包括:The method according to claim 5 or 6, wherein the knowledge transfer parameters based on each of the knowledge transfer samples for the current anchor sample, the predicted probability of each of the knowledge transfer samples, and the current anchor The predicted probability of the anchored sample, and the soft label of the current anchored sample is determined, including:
    基于每一所述知识传递样本对所述当前锚定样本的知识传递参数和每一所述知识传递样本的预测概率执行一次知识传递处理,得到所述当前锚定样本的初始知识传递概率;Perform a knowledge transfer process on the knowledge transfer parameters of the current anchor sample and the predicted probability of each knowledge transfer sample based on each of the knowledge transfer samples, to obtain the initial knowledge transfer probability of the current anchor sample;
    基于所述初始知识传递概率和所述当前锚定样本的预测概率执行一次知识融合处理,得到所述当前锚定样本的初始软标签;Perform a knowledge fusion process based on the initial knowledge transfer probability and the predicted probability of the current anchor sample to obtain the initial soft label of the current anchor sample;
    基于所述当前锚定样本的初始软标签,执行循环过程直至所述至少一个知识传递样本的预测概率小于预设概率阈值的情况下,得到所述当前锚定样本的软标签;Based on the initial soft label of the current anchor sample, a loop process is performed until the predicted probability of the at least one knowledge transfer sample is less than a preset probability threshold, and the soft label of the current anchor sample is obtained;
    其中,所述循环过程包括:Wherein, the cycle process includes:
    在循环过程的每个周期中,基于上一周期得到的所述当前锚定样本的软标签和每一所述知识传递参数执行知识传递处理,获得所述当前锚定样本的知识传递概率;In each cycle of the cyclic process, a knowledge transfer process is performed based on the soft label of the current anchor sample obtained in the previous cycle and each of the knowledge transfer parameters, to obtain the knowledge transfer probability of the current anchor sample;
    基于所述当前锚定样本的知识传递概率和所述当前锚定样本的预测概率执行知识融合处理,得到下一周期的所述当前锚定样本的软标签。A knowledge fusion process is performed based on the knowledge transfer probability of the current anchor sample and the prediction probability of the current anchor sample to obtain the soft label of the current anchor sample in the next cycle.
  8. 根据权利要求7所述的方法,其中,所述基于每一所述知识传递样本对所述当前锚定样本的知识传递参数和每一所述知识传递样本的预测概率进行一次知识传递处理,得到所述当前锚定样本的初始知识传递概率,包括:The method according to claim 7, wherein the knowledge transfer process is performed once on the knowledge transfer parameter of the current anchor sample and the predicted probability of each knowledge transfer sample based on each of the knowledge transfer samples, to obtain The initial knowledge transfer probability of the current anchor sample, including:
    基于每一所述知识传递样本对所述当前锚定样本的知识传递参数和所述每一所述知识传递样本的预测概率,确定每一所述知识传递样本对所述当前锚定样本的知识传递概率;Based on the knowledge transfer parameters of each of the knowledge transfer samples for the current anchor sample and the predicted probability of each of the knowledge transfer samples, the knowledge of each of the knowledge transfer samples on the current anchor sample is determined transmission probability;
    对每一所述知识传递样本对所述当前锚定样本的至少一个知识传递概率进行累加处理,得到知识传递概率累加值;Accumulating at least one knowledge transfer probability of the current anchor sample for each of the knowledge transfer samples to obtain an accumulated value of the knowledge transfer probability;
    基于所述知识传递概率累加值进行一次知识传递处理,得到所述当前锚定样本的初始知识传递概率。A knowledge transfer process is performed based on the accumulated value of the knowledge transfer probability to obtain the initial knowledge transfer probability of the current anchor sample.
  9. 根据权利要求1至8任一项所述的方法,其中,所述方法还包括:The method according to any one of claims 1 to 8, wherein the method further comprises:
    获取训练数据集,所述训练数据集包括至少一个批次的训练数据子集;obtaining a training data set, the training data set including at least one batch of training data subsets;
    从所述训练数据集中选择一个批次的之前未被选中作为训练样本集的所述训练数据子集,作为所述当前训练样本集。A batch of the training data subset not previously selected as a training sample set is selected from the training data set as the current training sample set.
  10. 根据权利要求9所述的方法,其中,所述方法还包括:The method of claim 9, wherein the method further comprises:
    对所述训练数据集执行随机采样处理,得到至少一个第一训练数据;Perform random sampling processing on the training data set to obtain at least one first training data;
    确定每一所述第一训练数据的硬标签,并基于每一所述第一训练数据的硬标签对所述训练数据集中未被选中作为所述第一训练数据的剩余数据进行相似度采样处理,得到每一所述第一训练数据对应的至少一个第二训练数据;Determine the hard label of each of the first training data, and perform similarity sampling processing on the remaining data in the training data set that is not selected as the first training data based on the hard label of each of the first training data to obtain at least one second training data corresponding to each of the first training data;
    将基于所述至少一个第一训练数据和每一所述第一训练数据对应的至少一个第二训练数据构建的一个批次的所述训练数据子集,作为所述当前训练样本集。A batch of the training data subset constructed based on the at least one first training data and at least one second training data corresponding to each first training data is used as the current training sample set.
  11. 一种神经网络的训练装置,所述神经网络的训练装置包括:A training device for a neural network, the training device for a neural network comprising:
    获取部分,配置为执行循环过程直至获得满足预设数量的锚定样本的软标签;其中,所述循环过程包括以下步骤:获取当前训练样本集,在执行循环过程的每个周期中从所述当前训练样本集中确定所述当前锚定样本和至少一个知识传递样本;其中,所述当前锚定样本为所述当前训练样本集中的任意一个,所述至少一个知识传递样本为所述当前训练样本集中、所述当前锚定样本以外的至少一个其他样本;基于所述神经网络,确定所述当前锚定样本与每一所述知识传递样本之间的相似度、以及所述当前锚定样本的预测概率和每一所述知识传递样本的预测概率;The acquisition part is configured to perform a loop process until soft labels satisfying a preset number of anchor samples are obtained; wherein, the loop process includes the following steps: acquiring a current training sample set, and performing a loop process from the The current anchor sample and at least one knowledge transfer sample are determined in the current training sample set; wherein, the current anchor sample is any one of the current training sample set, and the at least one knowledge transfer sample is the current training sample At least one other sample other than the current anchor sample in the set; based on the neural network, determine the similarity between the current anchor sample and each of the knowledge transfer samples, and the similarity of the current anchor sample a predicted probability and a predicted probability for each of said knowledge transfer samples;
    训练部分,配置为至少基于所述预设数量的锚定样本的软标签和所述预设数量的锚定样本,对神经网络进行训练。The training part is configured to train the neural network based on at least the soft labels of the preset number of anchor samples and the preset number of anchor samples.
  12. 根据权利要求11所述的神经网络的训练装置,其中,The training device of neural network according to claim 11, wherein,
    所述训练部分,还配置为基于每一所述锚定样本的软标签和对应该锚定样本的预测概率,确定每一所述锚定样本的相对熵;基于每一所述锚定样本的硬标签和对应该锚定样本的预测概率,确定每一所述锚定样本的交叉熵;基于所述预设数量的锚定样本的交叉熵和所述预设数量的锚定样本的相对熵对所述神经网络进行训练。The training part is further configured to determine the relative entropy of each anchor sample based on the soft label of each anchor sample and the predicted probability corresponding to the anchor sample; The hard label and the predicted probability corresponding to the anchor sample, determine the cross entropy of each anchor sample; based on the cross entropy of the preset number of anchor samples and the relative entropy of the preset number of anchor samples The neural network is trained.
  13. 根据权利要求11或12所述的神经网络的训练装置,其中,所述神经网络包括编码器和分类器,The apparatus for training a neural network according to claim 11 or 12, wherein the neural network comprises an encoder and a classifier,
    所述获取部分,还配置为基于所述神经网络的编码器,确定所述当前锚定样本的样本特征和每一所述知识传递样本的样本特征;基于所述当前锚定样本的样本特征和每一所述知识传递样本的样本特征,确定所述当前锚定样本与每一所述知识传递样本之间的相似度;基于所述神经网络的分类器,确定所述当前锚定样本的预测概率和每一所述知识传递样本的预测概率。The acquisition part is further configured to determine the sample feature of the current anchor sample and the sample feature of each of the knowledge transfer samples based on the encoder of the neural network; based on the sample feature of the current anchor sample and The sample characteristics of each of the knowledge transfer samples determine the similarity between the current anchor sample and each of the knowledge transfer samples; the classifier based on the neural network determines the prediction of the current anchor sample probabilities and predicted probabilities for each of said knowledge transfer samples.
  14. 根据权利要求13所述的神经网络的训练装置,其中,The training device of neural network according to claim 13, wherein,
    所述获取部分,还配置为对所述当前锚定样本的样本特征进行归一化处理,得到所述当前锚定样本的归一化样本特征;对每一所述知识传递样本的样本特征进行归一化处理,得到每一所述知识传递样本的归一化特征;对所述当前锚定样本的归一化样本特征和每一所述知识传递样本的归一化特征进行点积运算处理,得到所述当前锚定样本与每一所述知识传递样本之间的相似度。The acquisition part is further configured to perform normalization processing on the sample features of the current anchor samples to obtain the normalized sample features of the current anchor samples; perform sample features on each of the knowledge transfer samples. Normalization processing to obtain the normalized features of each of the knowledge transfer samples; performing dot product operation processing on the normalized sample features of the current anchor sample and the normalized features of each of the knowledge transfer samples , to obtain the similarity between the current anchor sample and each of the knowledge transfer samples.
  15. 根据权利要求11至14任一项所述的神经网络的训练装置,其中,The training device of neural network according to any one of claims 11 to 14, wherein,
    所述获取部分,还配置为基于所述当前锚定样本与每一所述知识传递样本之间的相似度,确定每一所述知识传递样本对所述当前锚定样本的知识传递参数;基于每一所述知识传递样本对所述当前锚定样本的知识传递参数、所述当前锚定样本的预测概率以及每一所述知识传递样本的预测概率,确定所述当前锚定样本的软标签。The acquiring part is further configured to determine, based on the similarity between the current anchor sample and each of the knowledge transfer samples, a knowledge transfer parameter of each of the knowledge transfer samples to the current anchor sample; based on The knowledge transfer parameters of each of the knowledge transfer samples to the current anchor sample, the predicted probability of the current anchor sample, and the predicted probability of each of the knowledge transfer samples, determine the soft label of the current anchor sample .
  16. 根据权利要求15所述的神经网络的训练装置,其中,The training device of neural network according to claim 15, wherein,
    所述获取部分,还配置为对所述当前锚定样本与每一所述知识传递样本之间的至少一个相似度进行累加处理,得到相似度累加值;基于所述锚定样本与每一所述知识传递样本之间的相似度、所述相似度累加值,确定每一所述知识传递样本对所述当前锚定样本的知识传递参数。The acquisition part is further configured to perform accumulation processing on at least one similarity between the current anchor sample and each of the knowledge transfer samples to obtain an accumulated similarity value; based on the anchor sample and each The similarity between the knowledge transfer samples and the accumulated value of the similarity are determined to determine the knowledge transfer parameter of each knowledge transfer sample to the current anchor sample.
  17. 根据权利要求15或16所述的神经网络的训练装置,其中,The training device of neural network according to claim 15 or 16, wherein,
    所述获取部分,还配置为基于每一所述知识传递样本对所述当前锚定样本的知识传递参数和每一所述知识传递样本的预测概率执行一次知识传递处理,得到所述当前锚定样本的初始知识传递概率;基于所述初始知识传递概率和所述当前锚定样本的预测概率执行一次知识融合处理,得到所述当前锚定样本的初始软标签;基于所述当前锚定样本的初始软标签,执行循环过程直至所述至少一个知识传递样本的预测概率小于预设概率阈值的情况下,得到所述当前锚定样本的软标签;其中,所述循环过程包括:在循环过程的每个周期 中,基于上一周期得到的所述当前锚定样本的软标签和每一所述知识传递参数执行知识传递处理,获得所述当前锚定样本的知识传递概率;基于所述当前锚定样本的知识传递概率和所述当前锚定样本的预测概率执行知识融合处理,得到下一周期的所述当前锚定样本的软标签。The acquiring part is further configured to perform a knowledge transfer process on the knowledge transfer parameters of the current anchor sample and the predicted probability of each knowledge transfer sample based on each of the knowledge transfer samples, to obtain the current anchor The initial knowledge transfer probability of the sample; perform a knowledge fusion process based on the initial knowledge transfer probability and the predicted probability of the current anchor sample to obtain the initial soft label of the current anchor sample; For the initial soft label, a loop process is performed until the predicted probability of the at least one knowledge transfer sample is less than a preset probability threshold, and the soft label of the current anchor sample is obtained; wherein, the loop process includes: in the loop process In each cycle, the knowledge transfer process is performed based on the soft label of the current anchor sample obtained in the previous cycle and each of the knowledge transfer parameters to obtain the knowledge transfer probability of the current anchor sample; based on the current anchor sample The knowledge fusion process is performed on the knowledge transfer probability of the fixed sample and the prediction probability of the current anchor sample to obtain the soft label of the current anchor sample in the next cycle.
  18. 根据权利要求17所述的神经网络的训练装置,其中,The training device of neural network according to claim 17, wherein,
    所述获取部分,还配置为基于每一所述知识传递样本对所述当前锚定样本的知识传递参数和所述每一所述知识传递样本的预测概率,确定每一所述知识传递样本对所述当前锚定样本的知识传递概率;对每一所述知识传递样本对所述当前锚定样本的至少一个知识传递概率进行累加处理,得到知识传递概率累加值;基于所述知识传递概率累加值进行一次知识传递处理,得到所述当前锚定样本的初始知识传递概率。The acquiring part is further configured to determine each pair of the knowledge transfer samples based on the knowledge transfer parameters of each of the knowledge transfer samples to the current anchor sample and the predicted probability of each of the knowledge transfer samples The knowledge transfer probability of the current anchor sample; for each of the knowledge transfer samples, at least one knowledge transfer probability of the current anchor sample is accumulated to obtain an accumulated value of the knowledge transfer probability; based on the knowledge transfer probability accumulation A knowledge transfer process is performed on the value of the current anchor sample to obtain the initial knowledge transfer probability of the current anchor sample.
  19. 根据权利要求11至18任一项所述的神经网络的训练装置,其中,所述神经网络的训练装置还包括选择部分,The neural network training device according to any one of claims 11 to 18, wherein the neural network training device further comprises a selection part,
    所述获取部分,还配置为获取训练数据集,所述训练数据集包括至少一个批次的训练数据子集。The obtaining part is further configured to obtain a training data set, the training data set including at least one batch of training data subsets.
    所述选择部分,配置为从所述训练数据集中选择一个批次的之前未被选中作为训练样本集的所述训练数据子集,作为所述当前训练样本集。The selection part is configured to select a batch of the training data subsets that were not previously selected as the training sample set from the training data set, as the current training sample set.
  20. 根据权利要求19所述的神经网络的训练装置,其中,所述神经网络的训练装置还包括采样部分和确定部分,The training device of the neural network according to claim 19, wherein the training device of the neural network further comprises a sampling part and a determination part,
    所述采样部分,配置为对所述训练数据集执行随机采样处理,得到至少一个第一训练数据。The sampling part is configured to perform random sampling processing on the training data set to obtain at least one first training data.
    所述确定部分,配置为确定所述第一样本对应的目标初始硬标签。The determining part is configured to determine the target initial hard label corresponding to the first sample.
    所述采样部分,还配置为基于每一所述第一训练数据的硬标签对所述训练数据集中未被选中作为所述第一训练数据的剩余数据进行相似度采样处理,得到每一所述第一训练数据对应的至少一个第二训练数据。The sampling part is further configured to perform similarity sampling processing on the remaining data in the training data set that are not selected as the first training data based on the hard label of each of the first training data, to obtain each of the first training data. At least one second training data corresponding to the first training data.
    所述确定部分,还配置为将基于所述至少一个第一训练数据和每一所述第一训练数据对应的至少一个第二训练数据构建的一个批次的所述训练数据子集,作为所述当前训练样本集。The determining part is further configured to use a batch of the training data subset constructed based on the at least one first training data and at least one second training data corresponding to each of the first training data as the the current training sample set.
  21. 一种计算机设备,所述计算机设备包括处理器、存储有所述处理器可执行指令的存储器,当所述指令被所述处理器执行时,实现如权利要求1至10任一项所述的方法。A computer device comprising a processor and a memory storing instructions executable by the processor, when the instructions are executed by the processor, the implementation of any one of claims 1 to 10 is implemented method.
  22. 一种计算机可读存储介质,其上存储有程序,应配置为计算机设备中,所述程序被处理器执行时,实现如权利要求1至10任一项所述的方法。A computer-readable storage medium, on which a program is stored, should be configured as a computer device, and when the program is executed by a processor, the method according to any one of claims 1 to 10 is implemented.
  23. 一种计算机程序,包括计算机可读代码,在所述计算机可读代码在电子设备中运行,被所述电子设备中的处理器执行的情况下,实现权利要求1至10任一项所述的神经网络的训练方法。A computer program, comprising computer-readable codes, in the case that the computer-readable codes are executed in an electronic device and executed by a processor in the electronic device, to implement the method described in any one of claims 1 to 10 Training methods for neural networks.
  24. 一种计算机程序产品,当其在计算机上运行时,使得计算机执行如权利要求1至10任一项所述的神经网络的训练方法。A computer program product which, when run on a computer, causes the computer to execute the method for training a neural network as claimed in any one of claims 1 to 10.
PCT/CN2021/121379 2021-04-27 2021-09-28 Neural network training method and apparatus, device, and computer storage medium WO2022227400A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110462397.6A CN113222139B (en) 2021-04-27 2021-04-27 Neural network training method, device, equipment and computer storage medium
CN202110462397.6 2021-04-27

Publications (1)

Publication Number Publication Date
WO2022227400A1 true WO2022227400A1 (en) 2022-11-03

Family

ID=77089304

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/121379 WO2022227400A1 (en) 2021-04-27 2021-09-28 Neural network training method and apparatus, device, and computer storage medium

Country Status (2)

Country Link
CN (1) CN113222139B (en)
WO (1) WO2022227400A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116361658A (en) * 2023-04-07 2023-06-30 北京百度网讯科技有限公司 Model training method, task processing method, device, electronic equipment and medium
CN117743858A (en) * 2024-02-19 2024-03-22 中国科学院自动化研究所 Knowledge enhancement-based continuous learning soft tag construction method
WO2024104241A1 (en) * 2022-11-14 2024-05-23 上海淇玥信息技术有限公司 Message-pushing method and apparatus based on implicit multi-target fusion of models

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113222139B (en) * 2021-04-27 2024-06-14 商汤集团有限公司 Neural network training method, device, equipment and computer storage medium
CN113487614B (en) * 2021-09-08 2021-11-30 四川大学 Training method and device for fetus ultrasonic standard section image recognition network model
CN115171731A (en) * 2022-07-11 2022-10-11 腾讯科技(深圳)有限公司 Emotion category determination method, device and equipment and readable storage medium
CN115936091B (en) * 2022-11-24 2024-03-08 北京百度网讯科技有限公司 Training method and device for deep learning model, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674880A (en) * 2019-09-27 2020-01-10 北京迈格威科技有限公司 Network training method, device, medium and electronic equipment for knowledge distillation
CN111368997A (en) * 2020-03-04 2020-07-03 支付宝(杭州)信息技术有限公司 Training method and device of neural network model
CN111507378A (en) * 2020-03-24 2020-08-07 华为技术有限公司 Method and apparatus for training image processing model
CN111753092A (en) * 2020-06-30 2020-10-09 深圳创新奇智科技有限公司 Data processing method, model training device and electronic equipment
US20210034985A1 (en) * 2019-03-22 2021-02-04 International Business Machines Corporation Unification of models having respective target classes with distillation
CN113222139A (en) * 2021-04-27 2021-08-06 商汤集团有限公司 Neural network training method, device and equipment and computer storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635668B (en) * 2018-11-16 2021-04-06 华中师范大学 Facial expression recognition method and system based on soft label integrated convolutional neural network
CN110647938B (en) * 2019-09-24 2022-07-15 北京市商汤科技开发有限公司 Image processing method and related device
CN111681059B (en) * 2020-08-14 2020-11-13 支付宝(杭州)信息技术有限公司 Training method and device of behavior prediction model

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210034985A1 (en) * 2019-03-22 2021-02-04 International Business Machines Corporation Unification of models having respective target classes with distillation
CN110674880A (en) * 2019-09-27 2020-01-10 北京迈格威科技有限公司 Network training method, device, medium and electronic equipment for knowledge distillation
CN111368997A (en) * 2020-03-04 2020-07-03 支付宝(杭州)信息技术有限公司 Training method and device of neural network model
CN111507378A (en) * 2020-03-24 2020-08-07 华为技术有限公司 Method and apparatus for training image processing model
CN111753092A (en) * 2020-06-30 2020-10-09 深圳创新奇智科技有限公司 Data processing method, model training device and electronic equipment
CN113222139A (en) * 2021-04-27 2021-08-06 商汤集团有限公司 Neural network training method, device and equipment and computer storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024104241A1 (en) * 2022-11-14 2024-05-23 上海淇玥信息技术有限公司 Message-pushing method and apparatus based on implicit multi-target fusion of models
CN116361658A (en) * 2023-04-07 2023-06-30 北京百度网讯科技有限公司 Model training method, task processing method, device, electronic equipment and medium
CN117743858A (en) * 2024-02-19 2024-03-22 中国科学院自动化研究所 Knowledge enhancement-based continuous learning soft tag construction method

Also Published As

Publication number Publication date
CN113222139B (en) 2024-06-14
CN113222139A (en) 2021-08-06

Similar Documents

Publication Publication Date Title
WO2022227400A1 (en) Neural network training method and apparatus, device, and computer storage medium
Yang et al. Heterogeneous graph attention network for unsupervised multiple-target domain adaptation
He et al. Neural factorization machines for sparse predictive analytics
WO2022042002A1 (en) Training method for semi-supervised learning model, image processing method, and device
Kaymak et al. A brief survey and an application of semantic image segmentation for autonomous driving
Mishra et al. Deep machine learning and neural networks: An overview
Springenberg et al. Improving deep neural networks with probabilistic maxout units
JP2023060820A (en) Deep Neural Network Optimization System for Machine Learning Model Scaling
WO2021159714A1 (en) Data processing method and related device
EP4167130A1 (en) Neural network training method and related device
CN111105008A (en) Model training method, data recognition method and data recognition device
WO2019218748A1 (en) Insurance service risk prediction processing method, device and processing equipment
US11775770B2 (en) Adversarial bootstrapping for multi-turn dialogue model training
WO2023221928A1 (en) Recommendation method and apparatus, and training method and apparatus
WO2022206498A1 (en) Federated transfer learning-based model training method and computing nodes
WO2021208799A1 (en) Transfer model training method and apparatus and fault detection method and apparatus
Liu et al. Resource-constrained federated edge learning with heterogeneous data: Formulation and analysis
He et al. A hybrid data-driven method for rapid prediction of lithium-ion battery capacity
Ouyang Feature learning for stacked ELM via low-rank matrix factorization
Shi et al. Mobile edge artificial intelligence: Opportunities and challenges
Miao et al. Evolving convolutional neural networks by symbiotic organisms search algorithm for image classification
CN114266897A (en) Method and device for predicting pox types, electronic equipment and storage medium
WO2022222854A1 (en) Data processing method and related device
Moro et al. Anomaly detection speed-up by quantum restricted Boltzmann machines
Yang et al. A new mc-lstm network structure designed for regression prediction of time series

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21938863

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 230124)