CN112232506A

CN112232506A - Network model training method, image target recognition method, device and electronic equipment

Info

Publication number: CN112232506A
Application number: CN202010950541.6A
Authority: CN
Inventors: 李泽民
Original assignee: Beijing Megvii Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd
Priority date: 2020-09-10
Filing date: 2020-09-10
Publication date: 2021-01-15

Abstract

The embodiment of the application provides a network model training method, an image target identification device and electronic equipment, wherein the training method comprises the following steps: training a second network model by applying a sample pair set which is the same as the first network model, and acquiring a first prediction result of the first network model on the first sample pair and a second prediction result of the second network model when the second network model completes the iterative training of the current round; if the second prediction result is better than the first prediction result, deleting the first sample pair from the sample pair set to obtain a sample pair updating set; determining a total loss function value for the second network model based on the sample pair update set; and if the total loss function value is larger than the preset value, updating the parameters of the second network model by applying the total loss function value, and continuing to perform the next round of iterative training until the total loss function value is converged to the preset value, so as to obtain the trained second network model. The method and the device can improve the accuracy of the second network model for identifying the target to a certain extent.

Description

Network model training method, image target recognition method, device and electronic equipment

Technical Field

The invention relates to the technical field of image processing, in particular to a network model training method, an image target identification device and electronic equipment.

Background

In the process of landing deployment of some edge computing devices, the edge devices cannot have the computing capacity and memory resources similar to those of server devices, so that higher requirements are put forward on a model miniaturization technology, a relatively perfect technology is needed to obtain a small model from a large model, and a model distillation method is developed. The model distillation method takes a large model with high accuracy as a teacher to guide a small model, namely a student model, and the knowledge of the teacher model is transferred to the student model, so that the student model can have high accuracy on edge computing equipment.

In the process of model distillation, the knowledge of the teacher model can be directly transmitted to the student models, so that some negative knowledge of the teacher model can be transmitted to the student network, and the accuracy of the student models is influenced to a certain extent.

Disclosure of Invention

In view of the above, the present invention provides a network model training method, an image target recognition device, and an electronic device, so as to improve the accuracy of a student model.

In a first aspect, an embodiment of the present invention provides a network model training method, where the method is applied to a server, and includes: training a second network model by applying the same sample pair set as the first network model, wherein the calculation amount of the first network model is larger than that of the second network model; when the second network model completes the iterative training of the current round, a first prediction result of the first network model for the first sample pair and a second prediction result of the second network model for the first sample pair are obtained; if the second prediction result is better than the first prediction result, deleting the first sample pair from the sample pair set to obtain a sample pair updating set; determining a total loss function value for the second network model based on the sample pair update set; and if the total loss function value is larger than the preset value, updating the parameters of the second network model by applying the total loss function value, and continuing the next round of iterative training on the updated second network model by applying the sample pair set until the total loss function value is converged to the preset value to obtain the trained second network model.

With reference to the first aspect, an embodiment of the present invention provides a first possible implementation manner of the first aspect, where the step of obtaining a first predicted result of the first network model for the first sample pair and a second predicted result of the second network model for the first sample pair includes: calculating a first characteristic distance of the first network model to the first sample pair, and calculating a second characteristic distance of the second network model to the first sample pair; calculating a first characteristic distance mean value of the sample class to which the first sample pair belongs based on the first network model, and calculating a second characteristic distance mean value of the sample class to which the first sample pair belongs based on the second network model; wherein, the sample class comprises homogeneous samples or heterogeneous samples; normalizing the first characteristic distance by using the first characteristic distance average value, and taking a first characteristic distance normalization result as a first prediction result of the first network model for the first sample pair; and applying the second characteristic distance average value to carry out normalization processing on the second characteristic distance, and taking the second characteristic distance normalization result as a second prediction result of the second network model on the first sample pair.

With reference to the first possible implementation manner of the first aspect, an embodiment of the present invention provides a second possible implementation manner of the first aspect, where the feature distance is a euclidean distance or a cosine similarity corresponding to a feature of the first sample pair.

With reference to the first possible implementation manner of the first aspect, an embodiment of the present invention provides a third possible implementation manner of the first aspect, where the method further includes: and judging whether the second prediction result is superior to the first prediction result or not by comparing the first prediction result with the second prediction result on the basis of the sample class to which the first sample pair belongs.

With reference to the third possible implementation manner of the first aspect, an embodiment of the present invention provides a fourth possible implementation manner of the first aspect, where the feature distance is a euclidean distance corresponding to a feature of the first sample pair; the step of judging whether the second prediction result is superior to the first prediction result by comparing the first prediction result with the second prediction result based on the sample class to which the first sample pair belongs includes: if the sample class of the first sample pair is the same type sample, subtracting the second prediction result from the first prediction result to obtain a first difference value; and if the first difference is greater than 0 or greater than a preset first positive value, determining that the second prediction result is better than the first prediction result.

With reference to the third possible implementation manner of the first aspect, an embodiment of the present invention provides a fifth possible implementation manner of the first aspect, where the feature distance is a euclidean distance corresponding to a feature of the first sample pair; the step of judging whether the second prediction result is superior to the first prediction result by comparing the first prediction result with the second prediction result based on the sample class to which the first sample pair belongs includes: if the sample class of the first sample pair is a heterogeneous sample, calculating a first prediction result minus a second prediction result to obtain a second difference value; and if the second difference is less than 0 or less than a preset second negative value, determining that the second prediction result is better than the first prediction result.

With reference to the first aspect, an embodiment of the present invention provides a sixth possible implementation manner of the first aspect, where the step of determining a total loss function value of the second network model based on the updated set of sample pairs includes: determining homogeneous sample distillation loss values and heterogeneous sample distillation loss values for the second network model based on the sample pair update sets; and determining a total loss function value of the second network model based on the distillation loss values of the homogeneous samples and the distillation loss values of the heterogeneous samples.

With reference to the sixth possible implementation manner of the first aspect, the present invention provides a seventh possible implementation manner of the first aspect, wherein the step of determining the homogeneous sample distillation loss value and the heterogeneous sample distillation loss value of the second network model based on the updated set of sample pairs includes: calculating the sum of squares of the difference of the effective homogeneous distances of the first network model and the second network model based on the homogeneous sample pairs in the sample pair update set; dividing the square sum of the effective similar distance difference by the number of similar sample pairs to obtain a distillation loss value of the similar sample; calculating the sum of squares of the effective heterogeneous distance differences of the first network model and the second network model based on the heterogeneous sample pairs in the sample pair update set; and dividing the square sum of the effective heterogeneous distance difference by the number of the heterogeneous sample pairs to obtain a heterogeneous sample distillation loss value.

With reference to the seventh possible implementation manner of the first aspect, an embodiment of the present invention provides an eighth possible implementation manner of the first aspect, where the step of calculating a sum of squares of differences between the distances of the first network model and the distances of the second network model in the same category includes: calculating the first network model and the second network modelSum of squares W of distance differences of equivalent classes_intra＝∑((D_{intra_student}[i]-D_{intra_teacher}[i])×mask[i])²(ii) a Wherein D is_{intra_student}[i]Vectors, D, representing the distance components of the samples corresponding to the pairs of homogeneous samples of the second network model_{intra_teacher}[i]A vector consisting of sample distances corresponding to the pair of homogeneous samples of the first network model;

1≤i≤N_intra，N_intrathe number of the same type sample pairs; mask [ i ]]Is a mask of the same kind of sample pairs.

With reference to the seventh possible implementation manner of the first aspect, an embodiment of the present invention provides a ninth possible implementation manner of the first aspect, where the step of calculating a sum of squares of the difference between the valid heterogeneous distances of the first network model and the second network model includes: calculating the sum of squares W of the effective heterogeneous distance differences of the first network model and the second network model_inter＝∑((D_{intra_student}[j]-D_{intra_teacher}[j])×mask[j])²(ii) a Wherein D is_{intra_student}[j]Vectors, D, representing the distance components of the heterogeneous pairs of samples of the second network model_{intra_teacher}[j]A vector representing the distance between the heterogeneous sample of the first network model and the corresponding sample;

1≤j≤N_inter，N_interthe number of the heterogeneous sample pairs; mask [ j ]]Is a mask of the same kind of sample pairs.

In a second aspect, an embodiment of the present invention further provides an image target identification method, where the method is applied to an electronic device, and the method includes: receiving an image to be identified; processing the image to be recognized by using a second network model, and outputting a target recognition result; the second network model is obtained by training in advance through the network model training method.

In a third aspect, an embodiment of the present invention further provides a network model training apparatus, where the apparatus is applied to a server, and includes: the training module is used for training a second network model by applying the same sample pair set as the first network model, wherein the calculated amount of the first network model is larger than that of the second network model; the acquisition module is used for acquiring a first prediction result of the first network model on the first sample pair and a second prediction result of the second network model on the first sample pair when the second network model completes the iterative training of the current round; a deleting module, configured to delete the first sample pair from the sample pair set to obtain a sample pair update set if the second prediction result is better than the first prediction result; a determination module to determine a total loss function value for the second network model based on the sample pair update set; and the iteration module is used for updating the parameters of the second network model by applying the total loss function value if the total loss function value is larger than the preset value, and continuing the next round of iterative training on the updated second network model by applying the sample pair set until the total loss function value is converged to the preset value to obtain the trained second network model.

In a fourth aspect, an embodiment of the present invention further provides an image object recognition apparatus, where the apparatus is applied to an electronic device, and the apparatus includes: the image receiving module is used for receiving an image to be identified; the image processing module is used for processing the image to be recognized by utilizing a second network model and outputting a target recognition result; the second network model is obtained by training in advance through the network model training method.

In a fifth aspect, an embodiment of the present invention further provides an electronic device, where the electronic device includes a processor and a memory, where the memory stores computer-executable instructions that can be executed by the processor, and the processor executes the computer-executable instructions to implement the network model training method or to implement the image object recognition method.

In a sixth aspect, the present invention further provides a computer-readable storage medium, where the computer-readable storage medium stores computer-executable instructions, and when the computer-executable instructions are called and executed by a processor, the computer-executable instructions cause the processor to implement the network model training method described above or to implement the image target recognition method described above.

The embodiment of the invention has the following beneficial effects:

the embodiment of the application provides a network model training method, an image target recognition method, a device and electronic equipment, wherein a second network model is trained by applying a sample pair set which is the same as that of a first network model, and when the second network model completes the iteration training of the current round, a first prediction result of the first network model on the first sample pair and a second prediction result of the second network model on the first sample pair are obtained; if the second prediction result is better than the first prediction result, deleting the first sample pair from the sample pair set to obtain a sample pair updating set; determining a total loss function value for the first network model based on the sample pair update set; and if the total loss function value is larger than the preset value, updating the parameters of the second network model by applying the total loss function value, and continuing the next round of iterative training on the updated second network model by applying the sample pair set until the total loss function value is converged to the preset value to obtain the trained second network model. According to the method, the well samples which are learned by the second network model are deleted, so that the second network model can learn the samples which are worse than the first network model in a more concentrated manner, the well samples which are learned by the second network model can not be biased, the training efficiency is improved, the performance of the trained second network model is ensured, and the accuracy of target identification and detection by applying the second network model is higher.

Additional features and advantages of the disclosure will be set forth in the description which follows, or in part may be learned by the practice of the above-described techniques of the disclosure, or may be learned by practice of the disclosure.

In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;

FIG. 2 is a flowchart of a network model training method according to an embodiment of the present invention;

FIG. 3 is a flowchart of another network model training method according to an embodiment of the present invention;

FIG. 4 is a flowchart of another network model training method according to an embodiment of the present invention;

FIG. 5 is a flowchart of another network model training method according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a network model training apparatus according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of another network model training apparatus according to an embodiment of the present invention;

FIG. 8 is a flowchart of an image target recognition method according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of an image target recognition device according to an embodiment of the present invention.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

For example, for the prediction results of some samples, the prediction effect of the student network may be better than that of the teacher network, but at this time, the prediction results of the student network and the teacher network are still drawn close, so that the prediction accuracy of the student network on the samples is reduced. The current research lacks a distillation method based on knowledge filtering, namely, a teacher network transmits 'positive knowledge' to a student network in the distillation process. The invention mainly solves the problem that the teacher network can transmit negative knowledge to the student network in the model distillation process.

The purpose of the scale learning is to zoom in the distance of the same type of samples and zoom out the distance of different types of samples. Therefore, in the model distillation process, for the same input sample, the prediction result of the teacher network can be transmitted to the student network as knowledge, namely, the prediction results of the student network and the teacher network for the same pair of samples are drawn. The distillation method is not limited by teacher network and student network to the characteristic dimension of the sample, and can be widely applied. However, during the distillation process, the teacher network may output some "negative knowledge", for example, for some samples, the prediction results of the student network may be more accurate, and if the prediction results of the student network and the teacher network are forced to be close, the prediction accuracy of the student network on the samples may be reduced. Based on this, according to the network model training method, the network model training device and the electronic device provided by the embodiment of the invention, the good samples that the second network model (student network) has already learned are deleted, so that the second network model can more intensively learn the samples that are worse than the first network model (teacher network), and the good samples that have already been learned are not biased, so that the accuracy of the second network model is improved to a certain extent. The following is described by way of example.

The second network model in the embodiment of the present invention may be applied to various application scenarios such as target detection and target recognition, for example, the second network model is applied to recognize pedestrians or vehicles, the second network model is applied to track pedestrians or vehicles, and the second network model is applied to recognize human body parts or vehicle components (such as license plates or vehicle logos).

As shown in FIG. 1, an electronic device 100 includes one or more processors 102, one or more memories 104, an input device 106, an output device 108, and one or more image capture devices 110, which are interconnected via a bus system 112 and/or other type of connection mechanism (not shown). It should be noted that the components and structure of the electronic device 100 shown in fig. 1 are exemplary only, and not limiting, and that the electronic device may have other components and structures as desired.

Processor 102 may be a server, a smart terminal, or a device containing a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, may process data for other components in electronic device 100, and may control other components in electronic device 100 to perform network model training functions.

Memory 104 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, Random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include, for example, Read Only Memory (ROM), a hard disk, flash memory, and the like. One or more computer program instructions may be stored on a computer-readable storage medium and executed by processor 102 to implement the functions of the embodiments of the invention (as implemented by a processing device) described below and/or other desired functions. Various applications and various data, such as visible light video sequences and infrared video sequences, as well as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.

The input device 106 may be a device used by a user to input instructions and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like.

The output device 108 may output various information (e.g., images or sounds) to the outside (e.g., a user), and may include one or more of a display, a speaker, and the like.

Image capture device 110 may acquire a set of sample pairs and store captured video sequences in memory 104 for use by other components.

For example, the devices in the electronic device for implementing the network model training method, the image target recognition method and the apparatus according to the embodiment of the present invention may be integrally disposed, or may be disposed in a decentralized manner, such as integrally disposing the processor 102, the memory 104, the input device 106 and the output device 108 into a whole, and disposing the image capturing device 110 at a designated position where a video frame can be captured. When the above-described devices in the electronic apparatus are integrally provided, the electronic apparatus may be implemented as a smart terminal such as a camera, a smart phone, a tablet computer, a vehicle-mounted terminal, and the like.

The embodiment provides a network model training method, wherein the method is applied to a server, and referring to a flowchart of the network model training method shown in fig. 2, the method specifically includes the following steps:

step S202, training a second network model by applying a sample pair set which is the same as the first network model, wherein the calculated amount of the first network model is greater than that of the second network model; the first network model is the teacher network, and the second network model is the student network.

The samples in the set of pairs are all present in pairs, e.g., (a)₁，b₁) Is a sample pair, (a)₂，b₂) For one sample pair, in the present embodiment, the number of sample pairs included in the set of sample pairs is not limited.

Step S204, when the second network model completes the iterative training of the current round, a first prediction result of the first network model for the first sample pair and a second prediction result of the second network model for the first sample pair are obtained;

when the training of the second network model reaches the preset time or the preset times, the iteration training of the current round is completed, and the preset time or the preset times can be set according to actual needs and are not limited here.

In general, samples in the sample pair set can be divided into two types, one type is homogeneous samples and the other type is heterogeneous samples, for example, a picture sample including only men is regarded as homogeneous samples, and a picture sample including both men and women is regarded as heterogeneous samples; in order to train the network model by using more picture features, the same-class samples or different-class samples may be further divided into a plurality of small-class samples, for example, the picture sample of a man wearing a hat in the same-class samples is regarded as a small-class sample, the picture sample of a man wearing western clothes is regarded as a small-class sample, and the like, and the division of the same-class samples and the different-class samples may be set according to an actual application scenario, which is not limited herein.

The first sample pair may be a homogeneous sample including at least one sample pair, or a heterogeneous sample including at least one sample pair, and the first sample pair is predicted by using the second network model trained in the current round and the first network model trained in advance by using the sample pair set, so as to obtain a first predicted result predicted by the first network model pair and a second predicted result predicted by the second network model pair.

Step S206, if the second prediction result is better than the first prediction result, deleting the first sample pair from the sample pair set to obtain a sample pair updating set;

if the obtained second prediction result is better than the first prediction result, which indicates that the first sample pair is a good sample that has been learned for the second network model, in the knowledge distillation process, in order to avoid the problem of poor training effect caused by training the second network model by repeatedly using the first sample, in this embodiment, the first sample pair or the similar sample to which the first sample pair belongs may be deleted from the sample set, so as to obtain an updated set of sample pairs. For example, the first sample pair is 10 sample pairs including a hat man, and if the 10 sample pairs are better predicted by the second network model than the 10 sample pairs of the first network model, the 10 sample pairs, or all the sample pairs (20 sample pairs) of the hat man can be deleted from the sample pair set.

If the obtained second prediction result is inferior to the first prediction result, the prediction capability of the second network model for the first sample pair is inferior to that of the first network model, so that the knowledge of the first sample pair is exactly needed by the second network model, and the first sample pair is reserved for training the second network model.

Step S208, determining a total loss function value of the second network model based on the sample pair updating set;

the determined total loss function value is a convergence parameter of the second network model to achieve the final training purpose.

And step S210, if the total loss function value is larger than the preset value, the total loss function value is applied to update the parameters of the second network model, the sample pair set sample pair is applied to the updated set pair updated second network model, and the next iteration training is continued until the total loss function value is converged to the preset value, so that the trained second network model is obtained.

And when the determined total loss function value is larger than the preset value, the trained second network model does not reach the preset convergence, after the parameters of the second network model are updated by applying the total loss function value to obtain the updated second network model, a first sample pair can be selected from the sample pair set again, and the steps S204 to S210 are executed again until the determined total loss function value is not larger than the preset value, the updated second network model reaches the preset convergence effect, so that the output of the trained second network model can be close to the output of the first network model.

When the next round of training is carried out, the original sample pair set can be still used, so that the sample pair set used by the second network model and the sample pair set used by the first network model are always the same, and on the aspect of controlling the training times of the second network model, the total loss function value of the second network model is determined according to the updated set of the sample pair of the round, so that the rationality of the training times is guaranteed, and the performance of the trained second network model is basically consistent with that of the first network model.

The embodiment of the application provides a network model training method, wherein a second network model is trained by applying a sample pair set which is the same as that of a first network model, and when the second network model completes the iterative training of the current round, a first prediction result of the first network model on the first sample pair and a second prediction result of the second network model on the first sample pair are obtained; if the second prediction result is better than the first prediction result, deleting the first sample pair from the sample pair set to obtain a sample pair updating set; determining a total loss function value for the first network model based on the sample pair update set; if the total loss function value is larger than the preset value, the total loss function value is applied to update the parameters of the second network model, so that the prediction result of the second network model approaches the prediction result of the first network model; and continuing the next round of iterative training on the updated second network model by applying the sample pair set until the total loss function value converges to a preset value, so as to obtain the trained second network model. According to the method, the well samples which are learned by the second network model are deleted, so that the second network model can learn the samples which are worse than the first network model in a more concentrated manner, the well samples which are learned by the second network model can not be biased, the training efficiency is improved, the performance of the trained second network model is ensured, and the accuracy of target identification and detection by applying the second network model is higher.

The embodiment provides another network model training method, which is implemented on the basis of the embodiment; the present embodiment focuses on a specific implementation of obtaining a first prediction result of a first network model for a first sample pair and a second prediction result of a second network model for the first sample pair. As shown in fig. 3, another flow chart of a network model training method, the network model training method in this embodiment includes the following steps:

step S302, applying the sample pair set same as the first network model to train a second network model;

step S304, when the second network model completes the iterative training of the current round, calculating a first characteristic distance of the first network model to the first sample pair, and calculating a second characteristic distance of the second network model to the first sample pair;

the first feature distance and the second feature distance are euclidean distances or cosine similarities corresponding to features of the first sample pair, or other metrics, and are not limited herein.

In this embodiment, taking the characteristic distance as the euclidean distance corresponding to the characteristic of the first sample pair as an example, if the first sample pair belongs to the same type of sample, the first characteristic distance of the first sample pair calculated by using the first network model is:

D_intra1[i]＝Euclidean_Distance(a[i],b[i]) Wherein (a [ i ]],b[i]) Denotes the ith first sample pair, Euclidean _ Distance (a [ i ])],b[i]) Representing the euclidean distance of the ith first sample pair.

If the first sample pair belongs to the heterogeneous sample, the first characteristic distance of the first sample pair calculated by using the first network model is as follows:

D_inter1[j]＝Euclidean_Distance(a[j],b[j]) Wherein, Euclidean _ Distance (a [ j ])],b[j]) Representing the euclidean distance of the jth first sample pair.

If the first sample pair belongs to the same type of sample, the second characteristic distance of the first sample pair calculated by using the second network model is as follows: d_intra2[i]＝Euclidean_Distance(a[i],b[i])。

If the first sample pair belongs to the heterogeneous sample, the second characteristic distance of the first sample pair calculated by using the second network model is as follows: d_inter2[j]＝Euclidean_Distance(a[j],b[j])。

In the embodiment of the present invention, the calculation method of the euclidean distance is similar to the conventional calculation method of the euclidean distance, and details thereof are not described here.

Step S306, calculating a first characteristic distance mean value of the sample class to which the first sample pair belongs based on the first network model, and calculating a second characteristic distance mean value of the sample class to which the first sample pair belongs based on the second network model; wherein, the sample class comprises homogeneous samples or heterogeneous samples;

in actual use, in order to facilitate distinguishing the sample class to which the sample pair belongs, the sample class identification may be performed on each sample pair in the sample pair set, wherein the sample pair to which the sample pair belongs may be identified as 1, and the heterogeneous sample may be identified as 0, so that the sample class to which the sample pair belongs may be determined according to the identification when the first sample pair is obtained.

The characteristic distance mean value refers to a mean value of characteristic distances of all sample pairs in the sample class to which the first sample pair belongs, and the first characteristic distance mean value and the second characteristic distance mean value of the sample class to which the first sample belongs can be calculated by using the first network model and the second network model respectively.

The specific calculation method of the first characteristic distance mean value is as follows: inputting all sample pairs of the sample class to which the first sample pair belongs into the first network model one by one to obtain the characteristic distance of each sample pair, then summing the obtained characteristic distances of each sample pair, and finally dividing the total sample logarithm of the sample class to which the first sample pair belongs by using the calculated sum value to obtain a result value, namely a first characteristic distance mean1 (D)_intra) Or mean1 (D)_inter) Wherein mean1 (D)_intra) Means mean1 (D) representing the mean of the distance of the first feature of the homogeneous sample to which the first sample pair belongs_inter) Representing a first feature distance mean of the heterogeneous samples to which the first sample pair belongs.

The mean distance of the second feature calculated based on the second network model is mean2 (D)_intra) Or mean2 (D)_inter) Wherein mean2 (D)_intra) Means mean2 (D) representing the mean of the distance between the second feature of the same type of sample to which the first sample belongs_inter) And representing a second characteristic distance mean of the heterogeneous samples belonging to the first sample pair. Since the process of calculating the second characteristic distance average value based on the second network model is the same as the process of calculating the first characteristic distance average value based on the first network model, the detailed description is omitted here.

Step S308, applying the first characteristic distance average value to normalize the first characteristic distance, and taking the normalized result of the first characteristic distance as a first prediction result of the first network model for the first sample pair;

the purpose of applying the first characteristic distance average value to carry out normalization processing on the first characteristic distance is to ensure the stability of network model training; in this embodiment, the normalization processing method is to divide the first feature distance by the first feature distance mean value, and the obtained normalization result can be regarded as the first prediction result of the first network model for the first sample pair.

If the normalization result of the first characteristic distance obtained by the first sample to the similar sample is as follows:

if the normalization result of the first characteristic distance obtained by the first sample to the heterogeneous sample is as follows:

step S310, applying the second characteristic distance average value to normalize the second characteristic distance, and taking the second characteristic distance normalization result as a second prediction result of the second network model for the first sample pair;

the purpose and method of performing normalization processing on the second characteristic distance are the same as the step S308, and are not described herein again.

If the normalization result of the second characteristic distance obtained by the first sample to the similar sample is as follows:

if the normalization result of the second characteristic distance obtained by the first sample to the heterogeneous sample is as follows:

step S312, based on the sample class to which the first sample pair belongs, comparing the first prediction result with the second prediction result to judge whether the second prediction result is superior to the first prediction result;

the process of step S312 can be realized by steps a 1-a 2:

step A1, if the sample class of the first sample pair is the same type sample, subtracting the second prediction result from the first prediction result to obtain a first difference value; if the first difference is greater than 0 or greater than a preset first positive value, determining that the second prediction result is superior to the first prediction result;

the first difference is greater than 0 or greater than a preset first positive value, which indicates that the first prediction result is greater than the second prediction result. Continuing to explain by taking the characteristic distance as the Euclidean distance corresponding to the characteristics of the first sample pair as an example, in the similar samples, if the second prediction result of the second network model is smaller than the first prediction result of the first network model, it is shown that the prediction capability of the second network model is better than that of the first network model in the first sample pair, so that the knowledge of the samples is filtered in the distillation process.

If the second prediction result of the second network model is greater than or equal to the first prediction result of the first network model, the prediction capability of the second network model prediction on the part is poorer than that of the first network model, so that the knowledge of the part of samples is required to be trained by the second network model.

Step A2, if the sample class to which the first sample pair belongs is a heterogeneous sample, calculating a first prediction result minus a second prediction result to obtain a second difference value; and if the second difference is less than 0 or less than a preset second negative value, determining that the second prediction result is better than the first prediction result.

The second difference is smaller than 0 or smaller than a preset second negative value, which indicates that the first prediction result is not larger than the second prediction result. Taking the euclidean distance as the measurement of the characteristic distance as an example, for the heterogeneous sample, if the second prediction result of the second network model is greater than the first prediction result of the first network model, it is indicated that the prediction capability of the second network model is better than that of the first network model on the part of the sample, and therefore, the knowledge of the part of the sample is negative knowledge for the student network in the distillation process and should not be learned.

If the second prediction result of the second network model is smaller than or equal to the first prediction result of the first network model, the second network model has an optimization space compared with the first network model on the part of samples, so that the learning of the part of samples is strengthened in the distillation process, namely the second network model is trained by using the first sample.

Step S314, if the second prediction result is better than the first prediction result, deleting the first sample pair from the sample pair set to obtain a sample pair update set;

whether the samples are homogeneous samples or heterogeneous samples, if the second prediction result of the second network model is better than the first prediction result of the first network model, the knowledge of the partial samples can be deleted from the sample pair set without learning.

Step S316, determining a total loss function value of the second network model based on the sample pair update set;

step S318, if the total loss function value is greater than the preset value, the total loss function value is applied to update the parameters of the second network model, and the sample pair set is applied to continue the next round of iterative training on the updated second network model until the total loss function value converges to the preset value, so as to obtain the trained second network model.

The network model training method provided by the embodiment of the invention can obtain a first prediction result based on the first network model according to the first characteristic distance average value and the first characteristic distance calculated by the first sample pair, obtain a second prediction result based on the second network model according to the second characteristic distance average value and the second characteristic distance calculated by the first sample pair, delete the first sample pair from the sample pair set when judging that the second prediction result is superior to the first prediction result, determine the total loss function value of the second network model by using the sample pair update set, and update the parameter of the second network model by using the total loss function value if the total loss function value is greater than the preset value, so that the prediction result of the second network model approaches the prediction result of the first network model; applying the sample pair set to continuously perform the next round of iterative training on the updated second network model until the total loss function value is converged to a preset value, so as to obtain a trained second network model; according to the method, the well samples which are already learned by the second network model are deleted, so that the second network model can learn and predict samples which are worse than the first network model more intensely, and the accuracy of the second network model can be improved to a certain extent by training the samples which are poor in learning.

The embodiment provides another network model training method, which is implemented on the basis of the embodiment; this embodiment focuses on a specific implementation of determining the total loss function value of the second network model based on the sample pair update set. As shown in fig. 4, another flow chart of a network model training method, the network model training method in this embodiment includes the following steps:

step S402, applying the sample pair set same as the first network model to train a second network model;

step S404, when the second network model completes the iterative training of the current round, a first prediction result of the first network model for the first sample pair and a second prediction result of the second network model for the first sample pair are obtained;

step S406, if the second prediction result is better than the first prediction result, deleting the first sample pair from the sample pair set to obtain a sample pair update set;

step S408, determining a distillation loss value of the homogeneous sample and a distillation loss value of the heterogeneous sample of the second network model based on the sample pair updating set;

the process of step S408 can be realized by steps B1-B4:

step B1, calculating the square sum of the difference of the effective homogeneous distances of the first network model and the second network model based on the homogeneous sample pairs in the sample pair updating set;

in this embodiment, the following formula can be used to calculate the sum of squares of the difference between the valid homogeneous distances of the homogeneous sample pairs: w_intra＝∑((D_{intra_student}[i]-D_{intra_teacher}[i])×mask[i])²(ii) a Wherein D is_{intra_student}[i]Vectors, D, representing the distance components of the samples corresponding to the pairs of homogeneous samples of the second network model_{intra_teacher}[i]A vector consisting of sample distances corresponding to the pair of homogeneous samples of the first network model;

Step B2, dividing the square sum of the effective similar distance difference by the number of similar sample pairs to obtain the distillation loss value of the similar sample;

the distillation loss values of the same samples are as follows:

step B3, calculating the square sum of the effective heterogeneous distance differences of the first network model and the second network model based on the heterogeneous sample pairs in the sample pair updating set;

in this embodiment, the following formula can be used to calculate the sum of squares of the valid distance differences between heterogeneous sample pairs: w_inter＝∑((D_{intra_student}[j]-D_{intra_teacher}[j])×mask[j])²(ii) a Wherein D is_{intra_student}[j]Vectors, D, representing the distance components of the heterogeneous pairs of samples of the second network model_{intra_teacher}[j]A vector representing the distance between the heterogeneous sample of the first network model and the corresponding sample;

And step B4, dividing the square sum of the effective heterogeneous distance difference by the number of the heterogeneous sample pairs to obtain the distillation loss value of the heterogeneous samples.

The distillation loss values of the heterogeneous samples are as follows:

step S410, determining a total loss function value of the second network model based on the distillation loss value of the same-class sample and the distillation loss value of the different-class sample;

the total loss function value can be expressed by the following formula:

L_distill＝αL_intra+βL_inter(ii) a Wherein, alpha represents the weight value corresponding to the distillation loss value of the same sample, and beta represents the weight value corresponding to the distillation loss value of the different sample. The two weight values can be adjusted according to actual needs to highlight the importance degree of the distillation loss value of the same-class sample and the distillation loss value of the different-class sample to the total loss function value.

Step S412, if the total loss function value is greater than the preset value, the total loss function value is applied to update the parameters of the second network model, and the sample pair set is applied to continue the next round of iterative training on the updated second network model until the total loss function value converges to the preset value, so as to obtain the trained second network model.

According to the network model training method provided by the embodiment of the invention, a first prediction result of a first network model for a first sample pair and a second prediction result of a second network model for the first sample pair can be obtained, and when the second prediction result is superior to the first prediction result, the sample pair set is updated and the first sample pair is deleted to obtain a sample pair update set; respectively calculating the square sum of the effective homogeneous distance differences and the square sum of the effective heterogeneous distance differences of the first network model and the second network model based on the homogeneous sample pairs and the heterogeneous sample pairs in the sample pair updating set, obtaining the homogeneous sample distillation loss value and the heterogeneous sample distillation loss value according to the number of the homogeneous sample pairs and the number of the heterogeneous sample pairs, and accordingly determining the total loss function value of the second network model. According to the method, the well-learned sample pairs of the second network model are filtered, so that the second network model can be concentrated on the sample pairs which need not be well learned, the output result of the well-trained second network model can be close to the output result of the first network model, and a good training effect is achieved.

Further, in order to fully understand the network model training method, fig. 5 shows a flowchart of another network model training method, and as shown in fig. 5, the network model training method includes the following steps:

step S500, extracting a first sample characteristic of a first homogeneous sample pair by using a first network model; extracting a second sample characteristic of the first homogeneous sample by using a second network model;

step S501, calculating Euclidean distance of a first homogeneous sample based on first sample characteristics and carrying out normalization processing to obtain a first prediction result;

step S502, calculating the Euclidean distance of the first homogeneous sample based on the second sample characteristic and carrying out normalization processing to obtain a second prediction result;

step S503, if the second prediction result is better than the first prediction result, filtering the same kind of samples in the sample pair set to obtain a sample pair update set;

step S504, determining a distillation loss value of the sample of the same type based on the sample pair updating set;

step S505, extracting a third sample characteristic of the first heterogeneous sample pair by using the first network model; extracting a fourth sample feature of the first heterogeneous sample pair by using the second network model;

step S506, calculating Euclidean distance of the first heterogeneous sample pair based on the third sample characteristics and carrying out normalization processing to obtain a first prediction result;

step S507, calculating Euclidean distance of the first heterogeneous sample pair based on the fourth sample characteristic and carrying out normalization processing to obtain a second prediction result;

step S508, if the second prediction result is better than the first prediction result, filtering heterogeneous samples in the sample pair set to obtain a sample pair update set;

step S509, determining a heterogeneous sample distillation loss value based on the sample pair update set;

step S510, determining a total loss function value of a second network model based on the distillation loss value of the same-class sample and the distillation loss value of the different-class sample;

and step S511, if the total loss function value is larger than the preset value, the total loss function value is applied to update the parameters of the second network model, and the sample pair set is applied to continue the next round of iterative training on the updated second network model until the total loss function value converges to the preset value, so that the trained second network model is obtained.

The steps S500 to S504 are processes of updating the homogeneous sample set and calculating the loss value, and the steps S505 to S509 are processes of updating the heterogeneous sample set and calculating the loss value, so the execution order of the two processes may be performed in an opposite or parallel manner, and is not limited herein.

In the network model training method provided by the embodiment of the present invention, regardless of whether the first sample pair is the first homogeneous sample or the first heterogeneous sample, when the second prediction result of the second network model is better than the first prediction result of the first network model, the partial prediction sample pairs can be filtered from the sample pair set, and the remaining second network model prediction results are not better than the sample pairs of the first network model prediction results, and the iterative training of the second network model is stopped until the total loss function value converges to the preset value.

The second network model can be used for processing of target recognition, target detection and the like in the image, and the accuracy of target recognition and target detection of the second network model is improved to a certain extent through the trained second network model. Based on this, an embodiment of the present invention further provides an image target identification method, which is applied to the electronic device, and referring to a flowchart of the image target identification method shown in fig. 8, the method includes the following steps:

step S802, receiving an image to be identified;

the image to be recognized in this embodiment may be an image acquired by an image acquisition device (such as a camera), and the image acquisition device may be a camera or other devices installed in a public place, or may be a camera or other devices installed in a specific place.

The image to be identified can be obtained from a third-party device, besides the image acquisition device, the third-party device can provide the acquired original image to the electronic device, and can also provide the image to the electronic device after the image is filtered or screened.

Step S804, processing the image to be recognized by using a second network model, and outputting a target recognition result; the second network model is obtained by training in advance by the network model training method provided in the above embodiment.

In the image target identification method, the second network model obtained through the training of the embodiment is used for target identification, and a target identification result is obtained. In the training process of the second network model, the same sample pair set as the first network model is used for training, and when the second network model completes each iteration training, a first prediction result of the first network model for the first sample pair and a second prediction result of the second network model for the first sample pair are obtained; if the second prediction result is better than the first prediction result, deleting the first sample pair from the sample pair set to obtain a sample pair updating set; determining a total loss function value for the first network model based on the sample pair update set; and if the total loss function value is larger than the preset value, updating the parameters of the second network model by applying the total loss function value, and continuing the next round of iterative training on the updated second network model by applying the sample pair set until the total loss function value is converged to the preset value to obtain the trained second network model. According to the method, the well samples which are learned by the second network model are deleted, so that the second network model can learn the samples which are worse than the first network model in a more concentrated manner, and the well samples which are learned by the second network model cannot be biased, so that the performance of the trained second network model is ensured, and the accuracy of target identification and detection by applying the second network model is higher.

Corresponding to the above network model training method embodiment, an embodiment of the present invention provides a network model training apparatus, where the apparatus is applied to a server, fig. 6 shows a structural schematic diagram of network model training, and as shown in fig. 6, the apparatus includes:

a training module 602, configured to train a second network model by applying the same sample pair set as the first network model, where a computation amount of the first network model is greater than a computation amount of the second network model;

an obtaining module 604, configured to obtain a first prediction result of the first network model for the first sample pair and a second prediction result of the second network model for the first sample pair when the second network model completes the iterative training of the current round;

a deleting module 606, configured to delete the first sample pair from the sample pair set to obtain a sample pair update set if the second prediction result is better than the first prediction result;

a determining module 608 for determining a total loss function value for the second network model based on the sample pair update set;

and an iteration module 610, configured to, if the total loss function value is greater than the preset value, update parameters of the second network model by applying the total loss function value, and continue to perform the next round of iterative training on the updated second network model by applying the set of sample pairs until the total loss function value converges to the preset value, so as to obtain a trained second network model.

The embodiment of the application provides a network model training device, wherein a second network model is trained by applying a sample pair set which is the same as that of a first network model, and when the second network model completes the iterative training of the current round, a first prediction result of the first network model on the first sample pair and a second prediction result of the second network model on the first sample pair are obtained; if the second prediction result is better than the first prediction result, deleting the first sample pair from the sample pair set to obtain a sample pair updating set; determining a total loss function value for the first network model based on the sample pair update set; and if the total loss function value is larger than the preset value, updating the parameters of the second network model by applying the total loss function value, and continuing the next round of iterative training on the updated second network model by applying the sample pair set until the total loss function value is converged to the preset value to obtain the trained second network model. According to the method, the well samples which are already learned by the second network model are deleted, so that the second network model can learn the samples which are worse than the first network model in a more concentrated manner, and the well samples which are already learned can not be biased, and the accuracy of the second network model is improved to a certain extent.

The obtaining module 604 is further configured to calculate a first characteristic distance of the first network model to the first sample pair, and calculate a second characteristic distance of the second network model to the first sample pair; calculating a first characteristic distance mean value of the sample class to which the first sample pair belongs based on the first network model, and calculating a second characteristic distance mean value of the sample class to which the first sample pair belongs based on the second network model; wherein, the sample class comprises homogeneous samples or heterogeneous samples; normalizing the first characteristic distance by using the first characteristic distance average value, and taking a first characteristic distance normalization result as a first prediction result of the first network model for the first sample pair; and applying the second characteristic distance average value to carry out normalization processing on the second characteristic distance, and taking the second characteristic distance normalization result as a second prediction result of the second network model on the first sample pair.

The characteristic distance is a Euclidean distance or cosine similarity corresponding to the characteristics of the first sample pair.

Based on the above network model training apparatus, another network model training apparatus is provided in the embodiments of the present invention, referring to the schematic structural diagram of the network model training apparatus shown in fig. 7, the network model training apparatus includes, in addition to the structure shown in fig. 6, a comparison module 702 connected to both the obtaining module 604 and the deleting module 606, and is configured to determine whether the second prediction result is better than the first prediction result by comparing the first prediction result with the second prediction result based on the sample class to which the first sample pair belongs.

The comparing module 702 is further configured to, if the sample class of the first sample pair is a similar sample, subtract the second prediction result from the first prediction result to obtain a first difference; and if the first difference is greater than 0 or greater than a preset first positive value, determining that the second prediction result is better than the first prediction result.

The comparing module 702 is further configured to, if the sample class to which the first sample pair belongs is a heterogeneous sample, calculate a second difference value by subtracting the second prediction result from the first prediction result; and if the second difference is less than 0 or less than a preset second negative value, determining that the second prediction result is better than the first prediction result.

The determining module 608 is further configured to determine homogeneous sample distillation loss values and heterogeneous sample distillation loss values of the second network model based on the updated set of sample pairs; and determining a total loss function value of the second network model based on the distillation loss values of the homogeneous samples and the distillation loss values of the heterogeneous samples.

The determining module 608 is further configured to calculate a sum of squares of differences between the distances of the first network model and the second network model in the same type based on the pairs of sample pairs in the updated set of sample pairs; dividing the square sum of the effective similar distance difference by the number of similar sample pairs to obtain a distillation loss value of the similar sample; calculating the sum of squares of the effective heterogeneous distance differences of the first network model and the second network model based on the heterogeneous sample pairs in the sample pair update set; and dividing the square sum of the effective heterogeneous distance difference by the number of the heterogeneous sample pairs to obtain a heterogeneous sample distillation loss value.

The determining module 608 is further configured to calculate a sum of squares W of the difference of the distances of the first network model and the second network model_intra＝∑((D_{intra_student}[i]-D_{intra_teacher}[i])×mask[i])²(ii) a Wherein D is_{intra_student}[i]Vectors, D, representing the distance components of the samples corresponding to the pairs of homogeneous samples of the second network model_{intra_teacher}[i]A vector consisting of sample distances corresponding to the pair of homogeneous samples of the first network model;

The determining module 608 is further configured to calculate a sum of squares W of the difference of the valid heterogeneous distances of the first network model and the second network model_inter＝∑((D_{intra_student}[j]-D_{intra_teacher}[j])×mask[j])²(ii) a Wherein D is_{intra_student}[j]Vectors, D, representing the distance components of the heterogeneous pairs of samples of the second network model_{intra_teacher}[j]A vector representing the distance between the heterogeneous sample of the first network model and the corresponding sample;

The network model training device provided by the embodiment of the invention has the same technical characteristics as the network model training method provided by the embodiment, so that the same technical problems can be solved, and the same technical effects can be achieved.

Corresponding to the image target recognition method, an embodiment of the present invention further provides an image target recognition apparatus, which is applied to an electronic device, and referring to a schematic structural diagram of the image target recognition apparatus shown in fig. 9, the apparatus includes: an image receiving module 92, configured to receive an image to be identified; the image processing module 94 is configured to process the image to be recognized by using a second network model, and output a target recognition result; the second network model is obtained by training in advance through the network model training method.

In the image target recognition device, the second network model obtained through the training of the above embodiment is applied to perform target recognition, and a target recognition result is obtained. In the training process of the second network model, the same sample pair set as the first network model is used for training, and when the second network model completes each iteration training, a first prediction result of the first network model for the first sample pair and a second prediction result of the second network model for the first sample pair are obtained; if the second prediction result is better than the first prediction result, deleting the first sample pair from the sample pair set to obtain a sample pair updating set; determining a total loss function value for the first network model based on the sample pair update set; and if the total loss function value is larger than the preset value, updating the parameters of the second network model by applying the total loss function value, and continuing the next round of iterative training on the updated second network model by applying the sample pair set until the total loss function value is converged to the preset value to obtain the trained second network model. According to the method, the well samples which are learned by the second network model are deleted, so that the second network model can learn the samples which are worse than the first network model in a more concentrated manner, and the well samples which are learned by the second network model cannot be biased, so that the performance of the trained second network model is ensured, and the accuracy of target identification and detection by applying the second network model is higher.

The present embodiment also provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processing device to perform the network model training method or the image target recognition method.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the electronic devices, apparatuses and units described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The network model training method, the image target recognition device and the computer program product of the electronic device provided by the embodiments of the present invention include a computer readable storage medium storing a program code, instructions included in the program code may be used to execute the method described in the foregoing method embodiments, and specific implementations may refer to the method embodiments and are not described herein again.

In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A network model training method is applied to a server and comprises the following steps:

training a second network model by applying the same sample pair set as the first network model, wherein the calculation amount of the first network model is larger than that of the second network model;

when the second network model completes the iteration training of the current round, a first prediction result of the first network model for a first sample pair and a second prediction result of the second network model for the first sample pair are obtained;

if the second prediction result is better than the first prediction result, deleting the first sample pair from the sample pair set to obtain a sample pair updating set;

determining a total loss function value for the second network model based on the set of sample pair updates;

and if the total loss function value is larger than a preset value, the total loss function value is applied to update the parameters of the second network model, and the sample pair set is applied to continue the next round of iterative training on the updated second network model until the total loss function value converges to the preset value, so that the trained second network model is obtained.

2. The method of claim 1, wherein the step of obtaining a first prediction of the first network model for a first sample pair and a second prediction of the second network model for the first sample pair comprises:

calculating a first characteristic distance of the first network model for the first sample pair, and calculating a second characteristic distance of the second network model for the first sample pair;

calculating a first characteristic distance mean value of the sample class to which the first sample pair belongs based on the first network model, and calculating a second characteristic distance mean value of the sample class to which the first sample pair belongs based on the second network model; wherein the sample class comprises homogeneous samples or heterogeneous samples;

applying the first feature distance mean value to carry out normalization processing on the first feature distance, and taking the first feature distance normalization result as a first prediction result of the first network model for the first sample pair;

and applying the second feature distance mean value to carry out normalization processing on the second feature distance, and taking the second feature distance normalization result as a second prediction result of the second network model on the first sample pair.

3. The method of claim 2, wherein the feature distance is a Euclidean distance or a cosine similarity corresponding to a feature of the first sample pair.

4. The method of claim 2, further comprising:

and judging whether the second prediction result is superior to the first prediction result or not by comparing the first prediction result with the second prediction result on the basis of the sample class to which the first sample pair belongs.

5. The method of claim 4, wherein the feature distance is a Euclidean distance corresponding to a feature of the first sample pair;

the step of judging whether the second prediction result is better than the first prediction result by comparing the first prediction result with the second prediction result based on the sample class to which the first sample pair belongs includes:

if the sample class of the first sample pair is the same type sample, subtracting the second prediction result from the first prediction result to obtain a first difference value;

and if the first difference is greater than 0 or greater than a preset first positive value, determining that the second prediction result is better than the first prediction result.

6. The method of claim 4, wherein the feature distance is a Euclidean distance corresponding to a feature of the first sample pair;

if the sample class of the first sample pair is a heterogeneous sample, calculating the first prediction result minus the second prediction result to obtain a second difference value;

and if the second difference is less than 0 or less than a preset second negative value, determining that the second prediction result is better than the first prediction result.

7. The method of claim 1, wherein the step of determining a total loss function value for the second network model based on the sample pair update set comprises:

determining homogeneous and heterogeneous sample distillation loss values for the second network model based on the updated set of sample pairs;

determining a total loss function value for the second network model based on the homogeneous sample distillation loss value and the heterogeneous sample distillation loss value.

8. The method of claim 7, wherein the step of determining homogeneous and heterogeneous sample distillation loss values for the second network model based on the updated set of sample pairs comprises:

calculating a sum of squares of the effective homogenous distance differences of the first network model and the second network model based on the homogenous sample pairs in the sample pair update set; dividing the sum of squares of the effective homogeneous distance differences by the number of the homogeneous sample pairs to obtain a homogeneous sample distillation loss value;

calculating a sum of squares of valid heterogeneous distance differences for the first network model and the second network model based on heterogeneous sample pairs in the sample pair update set; and dividing the square sum of the effective heterogeneous distance difference by the number of the heterogeneous sample pairs to obtain a heterogeneous sample distillation loss value.

9. The method of claim 8, wherein the step of calculating the sum of squares of the difference of the effective homogeneous distances of the first network model and the second network model comprises:

calculating a sum of squares W of the difference of the distances of the first network model and the second network model_intra＝∑((D_{intra_student}[i]-D_{intra_teacher}[i])×mask[i])²(ii) a Wherein D is_{intra_student}[i]A vector of distances, D, representing said homogeneous sample pairs of said second network model_{intra_teacher}[i]A vector representing a distance contribution of said homogeneous sample pairs of said first network model to corresponding samples;

1≤i≤N_intra，N_intrathe number of the same type sample pairs is shown; mask [ i ]]Is a mask of the same kind of sample pairs.

10. The method of claim 8, wherein the step of calculating the sum of squares of the difference between the valid heterogeneous distances of the first network model and the second network model comprises:

calculating the sum of squares W of the effective heterogeneous distance differences of the first network model and the second network model_inter＝∑((D_{intra_student}[j]-D_{intra_teacher}[j])×mask[j])²(ii) a Wherein D is_{intra_student}[j]Representing directions of the heterogeneous sample-to-corresponding sample distance components of the second network modelAmount, D_{intra_teacher}[j]A vector representing distance contributions of said heterogeneous sample pairs of said first network model to corresponding samples;

1≤j≤N_inter，N_interthe number of the heterogeneous sample pairs is shown; mask [ j ]]Is a mask for a heterogeneous sample pair.

11. An image target recognition method is applied to an electronic device, and comprises the following steps:

receiving an image to be identified;

processing the image to be recognized by using a second network model, and outputting a target recognition result; wherein the second network model is a second network model obtained by training in advance by the method of any one of claims 1 to 10.

12. A network model training device, which is applied to a server, comprises:

the training module is used for training a second network model by applying the same sample pair set as the first network model, wherein the calculated amount of the first network model is larger than that of the second network model;

an obtaining module, configured to obtain a first prediction result of a first sample pair by the first network model and a second prediction result of the first sample pair by the second network model when the second network model completes the iterative training of the current round;

a deleting module, configured to delete the first sample pair from the sample pair set to obtain a sample pair update set if the second prediction result is better than the first prediction result;

a determination module to determine a total loss function value for the second network model based on the set of sample pair updates;

and the iteration module is used for applying the total loss function value to update the parameters of the second network model if the total loss function value is larger than a preset value, and applying the sample pair set to continue the next round of iterative training on the updated second network model until the total loss function value converges to the preset value, so as to obtain the trained second network model.

13. An image object recognition device, which is applied to an electronic device, the device comprising:

the image receiving module is used for receiving an image to be identified;

the image processing module is used for processing the image to be recognized by utilizing a second network model and outputting a target recognition result; wherein the second network model is a second network model obtained by training in advance by the method of any one of claims 1 to 10.

14. An electronic device comprising a processor and a memory, the memory storing computer-executable instructions executable by the processor, the processor executing the computer-executable instructions to implement the network model training method of any one of claims 1 to 10 or to implement the image object recognition method of claim 11.

15. A computer-readable storage medium storing computer-executable instructions that, when invoked and executed by a processor, cause the processor to implement the network model training method of any one of claims 1 to 10 or to implement the image object recognition method of claim 11.