CN110909815B

CN110909815B - Neural network training method, neural network training device, neural network processing device, neural network training device, image processing device and electronic equipment

Info

Publication number: CN110909815B
Application number: CN201911203384.6A
Authority: CN
Inventors: 王华明; 朱烽
Original assignee: Shenzhen Sensetime Technology Co Ltd
Current assignee: Shenzhen Sensetime Technology Co Ltd
Priority date: 2019-11-29
Filing date: 2019-11-29
Publication date: 2022-08-12
Anticipated expiration: 2039-11-29
Also published as: CN110909815A

Abstract

The disclosure relates to a neural network training method, an image processing method, a device and an electronic device, wherein the method comprises the following steps: respectively inputting sample images in a training set into a first neural network and a second neural network to obtain a first probability result and a second probability result of at least one attribute of an object in the sample images; determining the loss of the first neural network according to the relative relation between the first probability result and the second probability result and the labeling information of the sample image; training the first neural network according to the loss of the first neural network. The disclosed embodiments may improve the accuracy of the first neural network.

Description

Neural network training method, neural network training device, neural network processing device, neural network training device, image processing device and electronic equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a neural network training method, an image processing method, an apparatus, and an electronic device.

Background

Face attribute classification is one of the basic problems of face recognition research. The classification of face attributes aims to provide a rich and detailed description for a face, and given a face picture, classification tasks of predicting the attributes of several or even tens of individual faces, such as gender, whether a hat is worn, whether a mustache exists, and the like, are generally required. The face attribute classification has great significance for tasks such as security monitoring, video structuring processing and picture retrieval.

Disclosure of Invention

The disclosure provides a neural network training method, a neural network training device, an image processing method, an image processing device and electronic equipment.

According to a first aspect of the present disclosure, there is provided a neural network training method, the method comprising:

respectively inputting sample images in a training set into a first neural network and a second neural network to obtain a first probability result and a second probability result of at least one attribute of an object in the sample images;

determining the loss of the first neural network according to the relative relation between the first probability result and the second probability result and the labeling information of the sample image;

training the first neural network according to the loss of the first neural network.

With reference to the first aspect, in a possible implementation manner, determining a loss of the first neural network according to a relative relationship between the first probability result and the second probability result and the annotation information of the sample image includes:

determining classification loss according to the labeling information of the sample image and the initial probability result of the attribute;

according to the labeling information of the sample image, respectively determining a first probability and a second probability of the real category of the attribute from the first probability result and the second probability result;

Determining a difference loss from the first probability result and the second probability result if the first probability is less than the second probability;

determining a loss of the first neural network based on the difference loss and the classification loss.

Therefore, under the condition that the classification accuracy of the first neural network is lower than that of the second neural network, the first neural network simulates the output probability of the second neural network, and the classification accuracy of the first neural network is improved.

With reference to the first aspect, in a possible implementation manner, determining a loss of the first neural network according to a relative relationship between the first probability result and the second probability result further includes:

determining the classification loss as a loss of the first neural network if the first probability is greater than or equal to the second probability.

Therefore, under the condition that the classification accuracy of the first neural network is superior to that of the second neural network, the first neural network does not simulate the output probability of the second neural network any more, which is equivalent to strengthening the influence of the difference loss generated on the knowledge distillation module by the sample image of which the classification accuracy of the first neural network is inferior to that of the second neural network on the first neural network, so that the first neural network is more focused on the sample image of which the classification accuracy of the first neural network is lower, and the accuracy of the first neural network is improved.

With reference to the first aspect, in a possible implementation manner, the first probability result and the second probability result are used to represent probability results processed according to preset temperature parameters, and the obtaining a first probability result and a second probability result of at least one attribute of an object in the sample image by inputting sample images in a training set into a first neural network and a second neural network respectively includes:

respectively inputting sample images in a training set into a first neural network and a second neural network, and extracting a first feature and a second feature of at least one attribute of an object in the sample images;

and respectively processing the first characteristic and the second characteristic based on preset temperature parameters, and respectively performing normalization processing on the processed first characteristic and the processed second characteristic to obtain the first probability result and the second probability result.

The first probability result and the second probability result are determined based on the temperature parameter, so that the first probability result and the second probability result contain more information, the speed of the first neural network for simulating the second neural network is accelerated, and the training efficiency of the first neural network is improved.

With reference to the first aspect, in a possible implementation manner, the method further includes:

And carrying out normalization processing on the first characteristics to obtain an initial probability result of the attribute.

By determining the initial probability result, a classification loss of the first neural network may be determined.

With reference to the first aspect, in a possible implementation manner, before the sample images in the training set are respectively input to the first neural network and the second neural network, the method further includes:

training the second neural network according to the training set, comprising:

inputting the sample images in the training set into a second neural network to be trained to obtain a third feature of at least one attribute of the object in the sample images;

carrying out normalization processing on the third characteristics to obtain a third probability result;

determining the loss of the second neural network according to the labeling information of the sample image and the third probability result;

training the second neural network according to the loss of the second neural network.

By training the second neural network with higher precision, the learning process of the first neural network can be restrained by the relative relation of the probability results, so that the precision of the first neural network is further improved while the first neural network simulates the second neural network.

With reference to the first aspect, in a possible implementation manner, the sample image includes a face image, and the first neural network is configured to identify a category of at least one attribute of a face in the face image.

In this way, the trained first neural network can be used to identify a class of at least one attribute of the face in the face image.

According to a second aspect of the present disclosure, there is provided an image processing method, the method comprising:

inputting an image to be processed into a neural network to obtain the characteristics of each attribute of an object in the image to be processed;

respectively carrying out normalization processing on the characteristics of each attribute of the image to be processed to obtain the prediction probability of each category corresponding to each attribute;

for each attribute, predicting the category of the attribute according to the prediction probability of each category of the attribute;

wherein the neural network comprises a first neural network trained according to the method of the first aspect.

According to a third aspect of the present disclosure, there is provided a neural network training apparatus, the apparatus comprising:

the input module is used for respectively inputting the sample images in the training set into the first neural network and the second neural network to obtain a first probability result and a second probability result of at least one attribute of the object in the sample images;

A determining module, configured to determine a loss of the first neural network according to a relative relationship between the first probability result and the second probability result and the labeling information of the sample image;

a first training module to train the first neural network based on a loss of the first neural network.

With reference to the third aspect, in a possible implementation manner, determining a loss of the first neural network according to a relative relationship between the first probability result and the second probability result and the annotation information of the sample image includes:

With reference to the third aspect, in a possible implementation manner, determining a loss of the first neural network according to a relative relationship between the first probability result and the second probability result further includes:

With reference to the third aspect, in a possible implementation manner, the first probability result and the second probability result are used to represent probability results processed according to preset temperature parameters, and the obtaining a first probability result and a second probability result of at least one attribute of an object in the sample image by inputting sample images in a training set into a first neural network and a second neural network respectively includes:

With reference to the third aspect, in a possible implementation manner, the apparatus further includes:

and the processing module is used for carrying out normalization processing on the first characteristics to obtain an initial probability result of the attribute.

With reference to the third aspect, in a possible implementation manner, before the sample images in the training set are respectively input to the first neural network and the second neural network, the apparatus further includes:

a second training module for training the second neural network according to the training set, comprising:

With reference to the third aspect, in a possible implementation manner, the sample image includes a face image, and the first neural network is configured to identify a category of at least one attribute of a face in the face image.

According to a fourth aspect of the present disclosure, there is provided an image processing apparatus, the apparatus comprising:

the input module is used for inputting the image to be processed into the neural network to obtain the characteristics of each attribute of the object in the image to be processed;

The processing module is used for respectively carrying out normalization processing on the characteristics of each attribute of the image to be processed to obtain the prediction probability of each category corresponding to each attribute;

a prediction module for predicting, for each attribute, a category of the attribute according to the prediction probability of each category of the attribute;

wherein the neural network comprises a first neural network trained by the apparatus of the third aspect.

According to a fifth aspect of the present disclosure, there is provided an electronic device comprising: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

According to a sixth aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.

In the embodiment of the disclosure, the probability results output by the first neural network (student network) and the second neural network (teacher network) can be obtained, and the loss of the first neural network is determined based on the relative relationship between the probability result output by the first neural network and the output probability result of the second neural network, so that the learning process of the first neural network is constrained by the relative relationship between the probability results, and the accuracy of the first neural network is further improved while the first neural network simulates the second neural network.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

FIG. 1 shows a flow diagram of a neural network training method in accordance with an embodiment of the present disclosure;

FIG. 2 illustrates an architectural schematic of a neural network implemented according to the present disclosure;

FIG. 3 shows a flow diagram of an image processing method according to an embodiment of the present disclosure;

FIG. 4 shows a block diagram of a neural network training device, in accordance with an embodiment of the present disclosure;

fig. 5 shows a block diagram of an image processing apparatus according to an embodiment of the present disclosure;

FIG. 6 illustrates a block diagram of an electronic device 800 in accordance with an embodiment of the disclosure;

fig. 7 illustrates a block diagram of an electronic device 1900 in accordance with an embodiment of the disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

When a classification task of predicting a plurality of face attributes in the same neural network is performed, the neural network is required to have relatively strong generalization capability. Increasing the number of parameters of the neural network can improve the generalization capability of the neural network, but the calculation speed of the neural network becomes slow, which is not practical. Knowledge distillation provides a training method for learning a large network (a neural network with a relatively large parameter quantity, which can be called a teacher network) by a small network (a neural network with a relatively small parameter quantity, which can be called a student network), and the small network can approach the precision of the large network by simulating the output probability distribution of the large network.

The embodiment of the disclosure provides a neural network training method, which can train a small network with less parameters, high speed and high precision by adopting a knowledge distillation method. The embodiment of the disclosure also provides an image processing method, which can perform category prediction of multiple attributes on an object in an image to be processed by adopting the trained small network.

First, a neural network training method provided by the embodiment of the present disclosure is explained below. Fig. 1 shows a flow diagram of a neural network training method according to an embodiment of the present disclosure. As shown in fig. 1, the method may include:

step S11, respectively inputting the sample images in the training set into the first neural network and the second neural network, and obtaining a first probability result and a second probability result of at least one attribute of the object in the sample images.

Step S12, determining a loss of the first neural network based on a relative relationship between the first probability result and the second probability result and the labeling information of the sample image.

Step S13, training the first neural network according to the loss of the first neural network.

According to the neural network training method disclosed by the embodiment of the disclosure, the probability results output by the first neural network (student network) and the second neural network (teacher network) can be obtained, and the loss of the first neural network is determined based on the relative relationship between the probability result output by the first neural network and the probability result output by the second neural network, so that the learning process of the first neural network is constrained by the relative relationship between the probability results, and the precision of the first neural network is further improved while the first neural network simulates the second neural network.

In one possible implementation, the neural network training method may be performed by an electronic device such as a terminal device or a server, the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like, and the method may be implemented by a processor calling a computer readable instruction stored in a memory. Alternatively, the method may be performed by a server.

In an embodiment of the present disclosure, the first neural network may be a student network for processing an image to be processed, and the second neural network may be a teacher network for training the first neural network. In a possible implementation manner, the network structures and parameter quantities of the first neural network and the second neural network are different, for example, the first neural network can be applied to a mobile terminal, and the network structure is simpler and the parameter quantity is smaller; the second neural network can be applied to a server side, and is complex in network structure and large in parameter quantity. The performance of the trained student network is similar to that of the teacher network.

In a possible implementation manner, a training set may be preset, and the training set may include a large number of sample images conforming to a mathematical distribution for training the first neural network and the second neural network. In one example, the sample images and the images to be processed in the training set may be face images, and the first neural network may be used to identify a category of at least one face attribute of a face in the face images, for example, the first neural network may be used to identify categories of face attributes in the face images such as gender (corresponding categories may include men and women), age (corresponding categories may include elderly, children, young and middle-aged), glasses style (corresponding categories may include no glasses, sunglasses and transparent colors), huhu type (corresponding categories may include no beard, huhu and yahu). In yet another example, the sample images and the to-be-processed images in the training set may be human body images, and the first neural network may be used to identify categories of at least one human body attribute of a human body in the human body images, for example, the first neural network may be used to identify categories of human body attributes such as hairstyle (corresponding categories may include baldness, shawl, and short hair), hat style (corresponding categories may include no hat, fisherman's cap, pullover cap, and peaked cap), and bag style (corresponding categories may include no backpack, one-shoulder bag, two-shoulder bag, handbag) of a human body in the human body images. It should be understood that the above is only an example, the image to be processed and the sample image may be other types of images, and the first neural network may be used in other types of scenes, and the disclosure is not limited thereto.

The embodiment of the present disclosure trains the first neural network through steps S11 to S13. In one possible implementation manner, before training the first neural network, the neural network training method may further include: training the second neural network according to the training set.

For example, a training set X × Y may be given, where X ═ X (X) ₁ 、x ₂ 、……、x _m ) Representing a set of sample images; y ═ Y ₁ 、y ₂ 、……、y _m ) Indicating annotation information of each sample image, and m indicates the number of sample images. The labeling information may be used to identify real categories of various attributes of the object in the sample image, and in one example, the labeling information may be a label. According to the training set, a second neural network with a large parameter quantity can be trained. The disclosed embodiments are directed to training a second neural networkThe specific manner of use is not limiting.

In one possible implementation, training the second neural network according to the training set may include: inputting the sample images in the training set into a second neural network to be trained to obtain a third feature of at least one attribute of the object in the sample images; carrying out normalization processing on the third characteristics to obtain a third probability result; determining the loss of the second neural network according to the labeling information of the sample image and the third probability result; training the second neural network according to the loss of the second neural network.

Wherein the third feature may be used to represent a feature of a certain attribute extracted by the second neural network to be trained. The third probability result may represent a result of a normalization process of the third feature, and the third probability result may include probabilities of respective classes of a certain attribute predicted by the second neural network to be trained. For example, the gender attribute includes a female category and a male category, and assuming that the third probability result of the gender attribute is [0.9,0.1], it means that the probability of the female category predicted by the second neural network to be trained is 0.9 and the probability of the male category is 0.1.

Fig. 2 illustrates an architectural schematic of a neural network implemented according to the present disclosure. As shown in fig. 2, the architecture of the neural network includes a first neural network and a second neural network. The first neural network and the second neural network both comprise a convolutional layer, a fully-connected layer, a softmax layer and a softmax layer with temperature parameters. Taking the second neural network as an example, the sample image may be used as an input to the convolutional layer, and the output of the convolutional layer may be used as an input to the fully-connected layer. The second neural network may include a plurality of fully-connected layers, each fully-connected layer corresponding to an attribute of the object in the sample image (e.g., fully-connected layer a corresponding to attribute a, fully-connected layer b corresponding to attribute b, … …, and fully-connected layer n corresponding to attribute n), each fully-connected layer connecting one softmax layer and one softmax layer with temperature parameters (e.g., fully-connected layer n connecting softmax layer n and softmax layer n with temperature parameters). The first neural network may be referred to the second neural network, and will not be described in detail herein. As shown in fig. 2, the temperature parameters between the softmax layers with temperature parameters corresponding to the same attribute in the first neural network and the second neural network are the same. The softmax layers with temperature parameters corresponding to the same attribute in the first neural network and the second neural network can form a knowledge distillation module.

In one example, as shown in fig. 2, for any attribute, a feature output by a fully-connected layer corresponding to the attribute in a second neural network to be trained may be determined as a third feature, the third feature is input into a softmax layer corresponding to the attribute to perform normalization processing of the third feature, and a probability of each category of the attribute output by the softmax layer is a third probability result of the attribute.

And determining the loss of the second neural network according to the labeling information of the sample image and the third probability result. In one example, a cross-entropy function or other loss function may be employed to determine the loss of the second neural network. The disclosed embodiments do not limit the penalty function. And then, the loss of the second neural network can be propagated reversely, and the parameters of the second neural network are optimized by using a gradient descent method, so that the second neural network is trained for one time.

Through the mode, the second neural network can be trained through multiple iterations, and therefore the high-precision teacher network is obtained. It should be noted that, in the embodiment of the present disclosure, the training process of the second neural network and the training process of the first neural network are two independent training processes, and the training process of the second neural network does not affect the parameters of the first neural network.

After the training of the second neural network (teacher network) is completed, the trained second neural network (teacher network) can be used for training the first neural network (student network) in a knowledge distillation mode. The training process of the first neural network is explained below through steps S11 to S13.

In step S11, the probability result of the attribute includes probabilities of the respective categories of the attribute, such as the probability of a female and the probability of a male of the gender attribute. After the sample image is input into the first neural network and the second neural network, a first probability result and a second probability result of each attribute of the object in the sample image can be obtained. For example, the face image is input into the first neural network and the second neural network, respectively, and the first probability result and the second probability result of the gender attribute of the face, the first probability result and the second probability result of the hat attribute, and the first probability result and the second probability result of the mustache attribute in the sample image can be obtained.

In one possible implementation, the first probability result may represent a probability result output by the first neural network processed according to a preset temperature parameter, and the second probability result may represent a probability result output by the second neural network processed according to a preset temperature parameter. The temperature parameter may be preset as required, and may be, for example, 1 to 10. The value of the temperature parameter is not limited by this disclosure. It should be noted that the larger the value of T is, the larger the entropy values of the first probability result and the second probability result are, and the more information the first probability result and the second probability result contain. Accordingly, step S11 may include: respectively inputting sample images in a training set into a first neural network and a second neural network, and extracting a first feature and a second feature of at least one attribute of an object in the sample images; and respectively processing the first characteristic and the second characteristic based on preset temperature parameters, and respectively performing normalization processing on the processed first characteristic and the processed second characteristic to obtain the first probability result and the second probability result.

The first feature may represent a feature of a certain attribute extracted by the first neural network to be trained, and the second feature may represent a feature of a certain attribute extracted by the second neural network after training.

As shown in fig. 2, taking the attribute n as an example, after the sample image is input into the first neural network to be trained and the trained second neural network, the feature output by the fully-connected layer n in the first neural network to be trained may be determined as a first feature, and the feature output by the fully-connected layer n in the trained second neural network may be determined as a second feature.

In one possible implementation, after the first feature and the second feature are extracted, the first feature and the second feature may be respectively processed based on preset temperature parameters through formulas (1) and (2).

In formula (1), T may represent a preset temperature parameter; z may represent a first feature of either attribute; j may represent the number of categories the attribute has, j being a positive integer; z is a radical of _i A first feature value of the ith category of the attribute (i.e. the feature value of the ith category in the first feature z) can be represented, i is greater than or equal to 0 and less than j, and i is a positive integer; the first feature values of the j categories may constitute a first feature z;

A first eigenvalue z of the ith class which can represent the property _i A processing result based on a preset temperature parameter T; q. q of _i May represent a first feature value for the ith category of the attribute after processing

The result of the normalization process, i.e. the first probability of the ith class of the attribute; the first probability of the j categories of the attribute may constitute a first probability result for the attribute.

In formula (2), T may represent a preset temperature parameter; v may represent a second characteristic of any one attribute, j may represent the number of categories that the attribute has, j being a positive integer; v. of _i A second feature value of the ith category of the attribute (i.e. the feature value of the ith category in the second feature v) can be represented, i is greater than or equal to 0 and less than j, and i is a positive integer; the second feature values of the j categories may constitute a second feature v;

a second characteristic value v which may represent the ith category of the property _i Processing results based on preset temperature parameters; p is a radical of _i May represent a second eigenvalue of the ith category of the processed attribute

The result of the normalization process, i.e. the second probability of the ith class of the attribute; the second probabilities for the j classes of the attribute may constitute a second probability result for the attribute.

For example, for a gender attribute having two categories, female and male, z is assumed to be the 0 th category for the female category and the 1 st category for the male category ₀ A first characteristic value, z, which may represent a female category ₁ A first characteristic value, q, which may represent a gender attribute as a male category ₀ A first probability, q, that can represent a female category ₁ A first probability that may represent a male category; v. of ₀ A second characteristic value, v, which may represent a female category ₁ A second characteristic value, p, which may represent a gender attribute as a male category ₀ A second probability, p, that can represent a female category ₁ A second probability of a male category may be represented. The first characteristic of the gender attribute is [ -1, 2 [)]The second characteristic is [ -4, 5 [)]The value of T is 1 as an example: z is a radical of ₀ Has a value of-1, z ₁ Is 2, q can be obtained based on the formula (1) ₀ ＝0.047，q ₁ ＝0.953，q ₀ And q is ₁ A first probability result is composed; v. of ₀ Has a value of-4, v ₁ Is 5, p can be obtained based on the formula (2) ₀ ＝0.0001，p ₁ ＝0.9999，p ₀ And p ₁ Constituting a second probabilistic result.

In an actual training process, as shown in fig. 2, taking an attribute n as an example, a first feature output by a fully-connected layer n in a first neural network may be input into a softmax layer n with a temperature parameter in the first neural network to perform normalization processing with the temperature parameter, where a probability of each category output by the softmax layer n with the temperature parameter is a first probability result of the attribute n; the second characteristic output by the fully-connected layer n in the second neural network can be input into the softmax layer n with the temperature parameter in the second neural network for normalization processing with the temperature parameter, and the probability of each category output by the softmax layer n with the temperature parameter is the second probability result of the attribute n.

In the embodiment of the disclosure, the first probability result and the second probability result are determined based on the temperature parameter, so that the first probability result and the second probability result contain more information, thereby accelerating the speed of the first neural network for simulating the second neural network, and improving the training efficiency of the first neural network.

In one possible implementation, step S11 may include: respectively inputting sample images in a training set into a first neural network and a second neural network, and extracting a first feature and a second feature of at least one attribute of an object in the sample images; and respectively carrying out normalization processing on the first characteristic and the second characteristic to obtain the first probability result and the second probability result.

It can be understood that, when T in the formula (1) and the formula (2) takes a value of 1, q is obtained by the formula (1) _i May represent a first eigenvalue z for the ith category of attributes _i The result of the normalization process, i.e. the first probability of the ith class of the attribute; the first probability of the j categories of the attribute may constitute a first probability result for the attribute. p is a radical of _i A second characteristic value v which may represent the ith category of the property _i The result of the normalization process, i.e. the second probability of the ith class of the attribute; the second probabilities for the j classes of the attribute may constitute a second probability result for the attribute.

In the disclosed embodiment, the first feature and the second feature may be directly subjected to the normalization processing, and the first feature and the second feature may not be subjected to the processing based on the temperature parameter. In this way, in the actual training process, the first feature output by the fully-connected layer n in the first neural network can be input into the softmax layer n in the first neural network for normalization processing with temperature parameters, and the probability of each category output by the softmax layer n is the first probability result of the attribute n; the second feature output by the fully-connected layer n in the second neural network can be input into the softmax layer n in the second neural network for normalization processing with temperature parameters, the probability of each category output by the softmax layer n is the second probability result of the attribute n, and the structures of the first neural network and the second neural network are simplified.

In step S12, a loss of the first neural network may be determined according to a relative relationship between the first probability result and the second probability result, and the annotation information of the sample image.

For any attribute, the relative relationship of the first probability result and the second probability result of the attribute may be used to compare the accuracy of the first neural network and the second neural network in classifying the attribute. In the related art, in the knowledge distillation process, all sample images in a training set adopted in the teacher network training process are used as learning targets of a student network. However, as the number of training times increases, the classification effect of the student network may be better than that of the teacher network on a part of the sample images, that is, the output probability of the student network on the real class of the attribute may be higher than that of the teacher network on the real class of the attribute, and at this time, if the student network continues to be forced to simulate the probability result output by the teacher network, the precision of the student network may be reduced. In the embodiment of the disclosure, the sample images in the training set are screened according to the relative relationship between the first probability result and the second probability result, and part of the sample images of which the teacher network (the second neural network) classification precision is higher than that of the student network (the first neural network) are used as the learning target of the student network (the first neural network), so that the classification precision of the student network is improved.

It is to be understood that, for any attribute, the sample image may be a learning target of the student network in a case where the accuracy of the attribute classification by the first neural network is less than the accuracy of the attribute classification by the second neural network, and the sample image may not be a learning target of the student network in a case where the accuracy of the attribute classification by the first neural network is greater than or equal to the accuracy of the attribute classification by the second neural network. That is, in the embodiment of the present disclosure, for any attribute, a sample image in which the accuracy of attribute classification by the first neural network in the training set is smaller than the accuracy of attribute classification by the second neural network may be taken as a learning target of the student network.

In one possible implementation, step S12 may include: determining classification loss according to the labeling information of the sample image and the initial probability result of the attribute; according to the labeling information of the sample image, respectively determining a first probability and a second probability of the real category of the attribute from the first probability result and the second probability result; determining a difference loss from the first probability result and the second probability result if the first probability is less than the second probability; determining a loss of the first neural network based on the difference loss and the classification loss.

In one possible implementation manner, step S12 may further include: determining the classification loss as a loss of the first neural network if the first probability is greater than or equal to the second probability.

A classification loss and a difference loss are provided in the knowledge distillation, wherein the classification loss is used for enabling the probability result output by the first neural network to approximate the real class of the object attribute in the sample image. The difference loss is used to approximate the probability result output by the first neural network to the probability result output by the second neural network.

In the embodiment of the present disclosure, for any attribute, the classification loss may be determined according to the labeling information of the sample image and the initial probability result of the attribute.

Wherein the initial probability result of the attribute may represent a probability result output by the first neural network that is not processed by the temperature parameter. In one example, the first feature may be normalized to obtain an initial probability result of the attribute. As shown in fig. 2, in the actual training process, taking the attribute n as an example, the first feature of the attribute n is input into the softmax layer n in the first neural network, and an initial probability result of the attribute n can be obtained. It can be understood that, when the value of T in the formula (1) is 1, the initial probability result of the attribute can be determined according to the calculation result of the formula (1).

For any attribute, the classification loss of the attribute may represent a difference between the probability (i.e., initial probability result) of each class of the attribute that is not processed by the temperature parameter and the true class of the attribute, which may be determined according to the labeling information of the sample image, predicted by the first neural network. Therefore, the classification loss of the attribute can be determined by the loss function according to the labeling information of the sample image and the initial probability result of the attribute, for example, the classification loss of the attribute can be determined by the cross entropy function.

In the disclosed embodiments, for any attribute, a disparity penalty can be determined from the first and second probability outcomes for that attribute.

Wherein the first probability result and the second probability result of the attribute each include a probability of each possible category of the attribute. The loss of difference can be determined by methods in the related art. In one example, a KL divergence (Kullback-Leibler divergence) may be employed as the difference loss. Specifically, the KL divergence may be determined by equation (3).

Where j may represent the number of categories an attribute has, j is a positive integer, q _i May represent the probability, p, of the ith class in the first probability result of the attribute _i A second probability of the ith category in the second probability result for the attribute may be represented, i is greater than or equal to 0 < j, and i is a positive integer. It can be understood that, as shown in equation (3), when the first probability result and the second probability result completely coincide with each other, the KL divergence takes a value of 0.

In consideration of the fact that, in the embodiment of the present disclosure, for any attribute, a sample image in which the accuracy of the attribute classification by the first neural network is smaller than the accuracy of the attribute classification by the second neural network in the training set is used as a learning target of the student network, and therefore, it is necessary to determine the relative relationship between the first probability result and the second probability result. In a possible implementation manner, a first probability and a second probability of the true category of the attribute may be determined from the first probability result and the second probability result respectively according to the labeling information of the sample image; and determining the relative relation between the first probability result and the second probability result according to the magnitude relation between the first probability and the second probability.

Wherein the first probability may represent a probability of a true class of the attribute in the first probability result and the second probability may represent a probability of a true class of the attribute in the second probability result.

In the case where the first probability is smaller than the second probability, indicating that the probability that the first neural network correctly classifies the attribute is smaller than the probability that the second neural network correctly classifies the attribute, that is, the accuracy of classification by the first neural network on the attribute is smaller than the accuracy of classification by the second neural network on the attribute, the sample image may be a learning target of the first neural network. Therefore, in the case where the first probability is smaller than the second probability, the disparity loss needs to be considered.

When the first probability is greater than or equal to the second probability, the probability that the first neural network correctly classifies the attributes is smaller than the probability that the second neural network correctly classifies the attributes, the accuracy of classification of the first neural network on the attributes is greater than or equal to the accuracy of classification of the second neural network on the attributes, and the accuracy of classification of the first neural network on the attributes is reduced when the sample image is used as a learning target of the first neural network. Thus, in the case where the first probability is greater than or equal to the second probability, the disparity penalty may not be considered.

In one example, the difference loss can be determined using equation (4):

As shown in formula (3), the kth class is the true class of the attribute, and the first probability q of the true class of the attribute _k Less than attributeSecond probability p of real class _k In the case of (2), the value of the difference loss is KL divergence

First probability q of true class in attribute _k A second probability p greater than or equal to the true class of the attribute _k In the case of (3), the value of the loss of variance may be 0.

Since the difference loss needs to be considered in case the first probability is smaller than the second probability, the loss of the first neural network can be determined from the difference loss and the classification loss in case of the first probability and the second probability.

In one example, weights may be set for the classification loss and the difference loss, respectively, and the loss of the first neural network may be determined from the weighted classification loss and the weighted difference loss.

It should be noted that, in the embodiment of the present disclosure, in the case where the first probability is smaller than the second probability, the difference loss may be determined according to the first probability result and the second probability result, and in the case where the first probability is greater than or equal to the second probability, the step of determining the difference loss may be omitted, thereby reducing the amount of calculation.

Since the difference loss does not need to be considered in case the first probability is greater than or equal to the second probability, the classification loss may be determined as the loss of the first neural network in case the first probability is greater than or equal to the second probability.

In the embodiment of the disclosure, in the knowledge distillation process, in the case that the classification accuracy of the first neural network is better than that of the second neural network, the corresponding sample image does not generate difference loss on the knowledge distillation module, which is equivalent to enhancing the influence of the difference loss generated on the knowledge distillation module by the sample image of which the classification accuracy of the first neural network is worse than that of the second neural network on the first neural network, so that the first neural network concentrates more on the sample image of which the classification accuracy of the first neural network is lower, thereby improving the accuracy of the first neural network.

The following describes an image processing method provided in the embodiments of the present disclosure. Fig. 3 shows a flow chart of an image processing method according to an embodiment of the present disclosure. As shown in fig. 3, the method may include:

step S21, inputting the image to be processed into the neural network, and obtaining the characteristics of each attribute of the object in the image to be processed.

Step S22, normalizing the features of each attribute of the image to be processed, respectively, to obtain the prediction probability of each category corresponding to each attribute.

In step S23, for each attribute, the class of the attribute is predicted based on the prediction probability of each class of the attribute.

The neural network adopted in step S21 may include the first neural network provided in the embodiments of the present disclosure.

In the embodiment of the present disclosure, by inputting the image to be processed into the trained first neural network, the prediction probability of each class of each attribute of the object in the image to be processed can be predicted. For each attribute, the class of the attribute may be predicted based on the prediction probabilities of the various classes of the attribute. In one example, for each attribute, a category corresponding to a maximum value of the prediction probabilities of the respective categories of the attribute may be determined as the predicted category of the attribute.

As shown in fig. 2, the image to be processed is input to the convolution layer of the first neural network, and the convolution layers of the first neural network are respectively input to the fully-connected layers of the respective attributes of the object to be processed (for example, a fully-connected layer a corresponding to attribute a, a fully-connected layer b corresponding to attribute b, … …, and a fully-connected layer n corresponding to attribute n). Taking the category of the predicted attribute n as an example, the output of the fully-connected layer n is the characteristic of the attribute n; inputting the characteristic of the attribute n into the softmax layer n to carry out normalization processing on the characteristic of the attribute n, so as to obtain the prediction probability of each category of the attribute n; the class of attribute n can be predicted from the predicted probability of each class of attribute n. The prediction process of the category of other attributes may refer to the attribute n, and will not be described herein.

Because the neural network training method provided by the embodiment of the disclosure improves the attribute classification precision of the first neural network, the image processing method provided by the embodiment of the disclosure improves the accuracy of predicting the attribute category.

It is understood that the above-mentioned method embodiments of the present disclosure can be combined with each other to form a combined embodiment without departing from the logic of the principle, which is limited by the space, and the detailed description of the present disclosure is omitted. Those skilled in the art will appreciate that in the above methods of the specific embodiments, the specific order of execution of the steps should be determined by their function and possibly their inherent logic.

In addition, the present disclosure also provides a neural network training apparatus, an electronic device, a computer-readable storage medium, and a program, which can be used to implement any one of the neural network training methods provided by the present disclosure, and the corresponding technical solutions and descriptions and corresponding descriptions in the methods section are not repeated herein.

FIG. 4 illustrates a block diagram of a neural network training device, in accordance with an embodiment of the present disclosure. As shown in fig. 4, the apparatus 40 includes:

an input module 41, configured to input sample images in a training set into a first neural network and a second neural network, respectively, to obtain a first probability result and a second probability result of at least one attribute of an object in the sample images;

A determining module 42, configured to determine a loss of the first neural network according to a relative relationship between the first probability result and the second probability result and the labeling information of the sample image;

a first training module 43, configured to train the first neural network according to the loss of the first neural network.

In the embodiment of the disclosure, the probability results output by the first neural network (student network) and the second neural network (teacher network) can be obtained, and the loss of the first neural network is determined based on the relative relationship between the probability result output by the first neural network and the probability result output by the second neural network, so that the learning process of the first neural network is constrained by the relative relationship between the probability results, and the accuracy of the first neural network is further improved while the first neural network simulates the second neural network.

In one possible implementation, determining a loss of the first neural network according to a relative relationship between the first probability result and the second probability result and the labeling information of the sample image includes:

In one possible implementation, determining the loss of the first neural network according to the relative relationship between the first probability result and the second probability result further comprises:

In a possible implementation manner, the first probability result and the second probability result are used to represent probability results processed according to preset temperature parameters, and the sample images in the training set are respectively input to the first neural network and the second neural network to obtain the first probability result and the second probability result of at least one attribute of the object in the sample images, which includes:

In a possible implementation manner, the apparatus 40 further includes:

In a possible implementation manner, before the sample images in the training set are respectively input into the first neural network and the second neural network, the apparatus 40 further includes:

In one possible implementation, the sample image includes a face image, and the first neural network is configured to identify a class of at least one attribute of a face in the face image.

In addition, the present disclosure also provides an image processing apparatus, an electronic device, a computer-readable storage medium, and a program, which can be used to implement any one of the image processing methods provided by the present disclosure, and the descriptions and corresponding descriptions of the corresponding technical solutions and the corresponding descriptions in the methods section are omitted for brevity.

Fig. 5 illustrates a block diagram of an image processing apparatus according to an embodiment of the present disclosure. As shown in fig. 5, the apparatus 50 includes:

an input module 51, configured to input an image to be processed into a neural network, to obtain features of each attribute of an object in the image to be processed;

the processing module 52 is configured to perform normalization processing on the features of each attribute of the image to be processed, so as to obtain a prediction probability of each category corresponding to each attribute;

a prediction module 53, configured to predict, for each attribute, a class of the attribute according to the prediction probability of each class of the attribute;

Wherein the neural network comprises a first neural network trained according to the apparatus 40 shown in fig. 4.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.

Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-mentioned method. The computer readable storage medium may be a non-volatile computer readable storage medium.

An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

The embodiments of the present disclosure also provide a computer program product, which includes computer readable code, and when the computer readable code is executed on a device, a processor in the device executes instructions for implementing the neural network training method and/or the image processing method provided in any one of the above embodiments.

The embodiments of the present disclosure also provide another computer program product for storing computer readable instructions, which when executed, cause a computer to perform the operations of the neural network training method and/or the image processing method provided in any one of the above embodiments.

The electronic device may be provided as a terminal, server, or other form of device.

Fig. 6 illustrates a block diagram of an electronic device 800 in accordance with an embodiment of the disclosure. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like terminal.

Referring to fig. 6, electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The Memory 804 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random-Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic or optical disk.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a photosensor, such as a Complementary Metal Oxide Semiconductor (CMOS) or Charge-coupled Device (CCD) image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the Communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, Infrared Data Association (IrDA) technology, Ultra Wide Band (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic Device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 804, is also provided that includes computer program instructions executable by the processor 820 of the electronic device 800 to perform the above-described methods.

Fig. 7 illustrates a block diagram of an electronic device 1900 in accordance with an embodiment of the disclosure. For example, the electronic device 1900 may be provided as a server. Referring to fig. 7, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.

The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system stored in memory 1932, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1932, is also provided that includes computer program instructions executable by the processing component 1922 of the electronic device 1900 to perform the above-described methods.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an erasable programmable Read-Only Memory (EPROM or flash Memory), a Static Random-Access Memory (SRAM), a portable Compact Disc Read-Only Memory (CD-ROM), a Digital Versatile Disc (DVD), a Memory stick, a floppy disk, a mechanical coding device, a punch card or an in-groove protrusion structure such as one having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of Network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry can execute computer-readable program instructions to implement aspects of the present disclosure by utilizing state information of the computer-readable program instructions to personalize custom electronic circuitry, such as Programmable logic circuits, Field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A neural network training method, the method comprising:

training the first neural network according to the loss of the first neural network;

wherein the determining the loss of the first neural network according to the relative relationship between the first probability result and the second probability result and the labeling information of the sample image comprises:

determining classification loss according to the labeling information of the sample image and the initial probability result of the attribute, wherein the initial probability result of the attribute represents a probability result which is output by the first neural network and is not processed by the temperature parameter;

determining a difference loss from the first probability result and the second probability result if the first probability is less than the second probability, and determining a loss of the first neural network from the difference loss and the classification loss;

2. The method of claim 1, wherein the first probability result and the second probability result are used to represent probability results processed according to preset temperature parameters, and the first probability result and the second probability result of at least one attribute of an object in the sample image are obtained by inputting sample images in a training set into a first neural network and a second neural network, respectively, and comprise:

3. The method of claim 2, further comprising:

4. The method of any of claims 1 to 3, wherein before inputting the sample images in the training set into the first and second neural networks, respectively, the method further comprises:

training the second neural network according to the training set, comprising:

5. The method of claim 4, wherein the sample image comprises a face image, and wherein the first neural network is configured to identify a class of at least one attribute of a face in the face image.

6. An image processing method, characterized in that the method comprises:

wherein the neural network comprises a first neural network trained according to the method of any one of claims 1 to 5.

7. An apparatus for neural network training, the apparatus comprising:

a first training module for training the first neural network based on a loss of the first neural network;

8. The apparatus of claim 7, wherein the first probability result and the second probability result are used to represent probability results processed according to preset temperature parameters, and the first probability result and the second probability result of at least one attribute of an object in the sample image are obtained by inputting sample images in a training set into a first neural network and a second neural network, respectively, and comprise:

9. The apparatus of claim 8, further comprising:

10. The apparatus of any of claims 7 to 9, wherein before the sample images in the training set are input into the first and second neural networks, respectively, the apparatus further comprises:

11. The apparatus of claim 10, wherein the sample image comprises a face image, and wherein the first neural network is configured to identify a class of at least one attribute of a face in the face image.

12. An image processing apparatus, characterized in that the apparatus comprises:

wherein the neural network comprises a first neural network trained according to the apparatus of any one of claims 7 to 11.

13. An electronic device, comprising:

A processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to invoke the memory-stored instructions to perform the neural network training method of any one of claims 1-5 and/or the image processing method of claim 6.

14. A computer-readable storage medium, on which computer program instructions are stored, which, when executed by a processor, implement the neural network training method of any one of claims 1 to 5 and/or the image processing method of claim 6.