CN114863181A

CN114863181A - Gender classification method and system based on prediction probability knowledge distillation

Info

Publication number: CN114863181A
Application number: CN202210556798.2A
Authority: CN
Inventors: 黄陶冶; 王麒; 陈帅斌; 蒋泽飞
Original assignee: Hangzhou Denghong Technology Co ltd
Current assignee: Hangzhou Denghong Technology Co ltd
Priority date: 2022-05-19
Filing date: 2022-05-19
Publication date: 2022-08-05

Abstract

The invention discloses a gender classification method and system based on prediction probability knowledge distillation, wherein the method comprises the following steps: acquiring a to-be-predicted image, constructing a ResNet18 network as a teacher main network, and constructing a CNN network as a student network; configuring a gender classifier for a teacher network, setting classification categories and constructing a weight matrix of the teacher network classifier; configuring a gender classifier for the student network, setting classification categories which are the same as those of the teacher network, and constructing a weight matrix of the student network classifier; calculating gender classification loss and gender distillation loss of a teacher network and a student network respectively by adopting a loss function; acquiring gender prediction probability vectors of the teacher network and the student network for respectively outputting the gender prediction to the input to-be-predicted images; and calculating total distillation loss according to the gender prediction probability vector, the gender classification loss and the gender distillation loss, and updating parameters of a teacher network and a student network through a gradient descent algorithm.

Description

Gender classification method and system based on prediction probability knowledge distillation

Technical Field

The invention relates to the technical field of image recognition, in particular to a gender classification method and system based on prediction probability knowledge distillation.

Background

In the security industry, human faces are one of the most important biological characteristics, and human face images comprise attribute information such as gender, race, age, identity and the like. Among them, gender has been receiving attention from researchers as the most important classification problem. In general, when the training data is sufficient, the larger the designed model is, the better the classification effect is. When the model becomes large, the computational power requirement of the model is also improved, which often needs long reasoning time on some low-computational-power equipment and is difficult to meet the actual production requirement. It is of interest to design relatively small models on low-computational power equipment to maximize the accuracy of classification.

Disclosure of Invention

One of the purposes of the invention is to provide a gender classification method and system based on predictive probability knowledge distillation, wherein the method and system can use a larger gender identification model as a teacher network, use a smaller gender identification model as a student network, and guide the learning of the student network through the teacher network, so that the smaller gender identification model can be adapted to a less-computationally-intensive security device.

The invention also aims to provide a gender classification method and system based on prediction probability knowledge distillation, and the method and system can enable the prediction accuracy of a small model to be closer to that of a large recognition model in the gender recognition function through the knowledge distillation method.

Another object of the present invention is to provide a gender classification method and system based on predictive probability knowledge distillation, which does not need to add training data and construct a more elaborate gender identification model, and does not need to change the structure of a small model on a low-cost device, thereby greatly reducing the cost of gender identification.

To achieve at least one of the above objects, the present invention further provides a gender classification method based on predictive probabilistic knowledge distillation, the method comprising the steps of:

acquiring a to-be-predicted image, constructing a ResNet18 network as a teacher main network, and constructing a CNN network as a student network;

configuring a gender classifier for a teacher network, setting classification categories and constructing a weight matrix of the teacher network classifier;

configuring a gender classifier for the student network, setting classification categories which are the same as those of the teacher network, and constructing a weight matrix of the student network classifier;

calculating gender classification loss and gender distillation loss of a teacher network and a student network respectively by adopting a loss function;

acquiring gender prediction probability vectors of the teacher network and the student network for respectively outputting the gender prediction to the input to-be-predicted images;

and calculating total distillation loss according to the gender prediction probability vector, the gender classification loss and the gender distillation loss, and updating parameters of a teacher network and a student network through a gradient descent algorithm to obtain a student network model for finally outputting a gender prediction image.

According to a preferred embodiment of the present invention, the method comprises training the teacher network model, and the training method comprises: acquiring an image to be trained to construct a teacher network training sample set, inputting the teacher network training sample set into a ResNet18 network, configuring network parameters to obtain a teacher network model, and calculating the gender classification loss of the teacher network model:

where the input sample set is S, S ═ x ₁ ，x ₂ ，…，x _m }；x _m Representing the mth input image information to be trained, m being the total number of images of the training sample, x _i The ith training image; n is the number of classification categories, and is configured to be 2 for the number of gender classification categories，f _t A backbone network representing a teacher network; w _t A teacher network classifier weight matrix;

representing the y-th column by the weight matrix of the teacher network classifier; bt _y For bias of the y type, obtaining gender classification loss L of the teacher network model by using a loss function ₁ 。

According to another preferred embodiment of the invention, a random gradient descent algorithm is adopted as a training optimizer to train the teacher network model, and the teacher network model is trained according to the gender classification loss of the teacher network model, so that the classification weight of the teacher network model is obtained and the classification weight of the teacher network model is updated.

According to another preferred embodiment of the present invention, the method comprises constructing a student network model, and the method for constructing the student network model comprises: adopting a CNN network with 5 layers of depth and 3-by-3 convolution size as a backbone network of the student network to construct the student network model:

wherein m is the total number of the images of the sample to be trained in one batch; x is the number of _i The ith training image; f. of _s A backbone network that is a student network; n is the number of classification categories, and the number of gender classification categories of the student network model is configured to be 2, W _s A student network classifier weight matrix;

representing the column of the y-th class for the student classifier weight matrix; bs _y Bias of type y; calculating a gender classification loss L of a student network using a loss function ₂ 。

According to another preferred embodiment of the invention, the teacher network model and the student network model respectively calculate the corresponding gender classification loss by adopting a Softmax loss function.

According to another preferred embodiment of the invention, after acquiring the gender prediction probability vectors of the teacher network and the student network respectively outputting the input to-be-predicted images, the distillation loss of the student network model to the teacher network model is further calculated:

wherein T is a temperature coefficient set to 10; p is a radical of _s (x _i ) And p _t (x _i ) The prediction probability vectors which are respectively output after the same sample data is input by the student network classifier and the teacher network classifier.

According to another preferred embodiment of the present invention, the calculation methods of the prediction probability vectors output by the student network classifier and the teacher network classifier respectively are as follows:

wherein

Predicting the jth element of the probability vector for the student network;

the jth element of the probability vector is predicted for the teacher network.

According to another preferred embodiment of the present invention, the method for calculating total distillation loss according to the gender prediction probability vector, gender classification loss and gender distillation loss comprises: l is ₄ ＝αL ₂ +(1-α)L ₃ Wherein alpha is a constant, and the value range of alpha is (0, 1).

To achieve at least one of the above objects, the present invention further provides a gender classification system based on predictive probabilistic knowledge distillation, which performs the above gender classification method based on predictive probabilistic knowledge distillation.

The present invention further provides a computer-readable storage medium having stored thereon a computer program executable by a processor to perform the method for gender classification based on predictive probabilistic knowledge distillation.

Drawings

FIG. 1 is a schematic flow chart of a gender classification method based on predictive probability knowledge distillation according to the invention.

FIG. 2 is a schematic diagram showing the structure of a gender classification system based on predictive probability knowledge distillation according to the present invention.

Detailed Description

The following description is presented to disclose the invention so as to enable any person skilled in the art to practice the invention. The preferred embodiments in the following description are given by way of example only, and other obvious variations will occur to those skilled in the art. The basic principles of the invention, as defined in the following description, may be applied to other embodiments, variations, modifications, equivalents, and other technical solutions without departing from the spirit and scope of the invention.

It will be understood by those skilled in the art that in the present disclosure, the terms "longitudinal," "lateral," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like are used in an orientation or positional relationship indicated in the drawings for ease of description and simplicity of description, and do not indicate or imply that the referenced devices or components must be in a particular orientation, constructed and operated in a particular orientation, and thus the above terms are not to be construed as limiting the present invention.

It is understood that the terms "a" and "an" should be interpreted as meaning that a number of one element or element is one in one embodiment, while a number of other elements is one in another embodiment, and the terms "a" and "an" should not be interpreted as limiting the number.

Referring to fig. 1-2, the invention discloses a gender classification method and system based on predictive probability knowledge distillation, and aims to solve the problem that a gender identification model of small equipment in the existing security field is too large and is not suitable for storage requirements of small equipment such as a camera.

Firstly, an image sample to be trained needs to be acquired, wherein the image sample can be acquired based on a crawler technology or based on the existing security equipment, and is preprocessed after the image sample is acquired, and the image data preprocessing method comprises but is not limited to data cleaning, and the existing face recognition technology or the human body recognition technology is adopted to delete impurity images which do not contain face features or human body features. For constructing a trained sample set, and further for constructing a validation set and a test set. The training sample set is used for updating and determining learning parameters such as model weight and bias meeting gradient descent requirements; the verification set is used for determining the hyper-parameters including but not limited to the number of network layers, the number of network nodes, the iteration times, the learning rate and the like; the test set is used to detect the predicted effect of the finalized model. It should be noted that the construction and functions of the training sample set, the verification set, and the test set are conventional settings in the technical field of machine learning, and how to construct the above sets is not described in detail in the present invention.

Specifically, the teacher network model and the student network model need to be established respectively, wherein the teacher network model is generally a large-scale network model, a ResNet18 network with better performance is preferably selected as a backbone network of the teacher network model, a Softmax algorithm is adopted to train the teacher network, and S ═ { x ═ x is defined for the pre-constructed training sample set ₁ ，x ₂ ，…，x _m Is a batch of training data, and further calculates the gender classification loss L of the teacher network ₁ ：

m is the total number of images of the training sample, x _i The ith training image; n is the number of classification categories, and the number of classification categories for gender is 2, f _t A backbone network representing a teacher network; w _t A teacher network classifier weight matrix;

representing the y-th column by the weight matrix of the teacher network classifier; bt _y Is a bias of type y. And further adopting a small Batch gradient descent algorithm (Mini-Batch SGD) as a training optimizer, training the teacher network model according to the classification loss, and acquiring the teacher classification network model weight for updating and determining the learning parameters of the teacher network model.

After the teacher network model is built and trained, the invention also needs to build a student network model for gender identification, and it should be noted that the student model needs to adopt a small-structure network model for adapting to small-sized security equipment including but not limited to a camera. The invention prefers a CNN network with 5 layers depth and 3 × 3 convolution window size as the backbone network of the student network model. And further training the student network model by adopting the same Softmax loss function. And updating the learning parameters in the student network model by adopting a gradient descent algorithm. Calculating gender classification loss L in the student network model using the CNN network ₂ ：

representing the column of the y-th class for the student classifier weight matrix; bs _y Is a bias of type y. It should be noted that the gradient descent algorithm (Mini-Batch SGD) is used as a training optimizer to determine the classification lossAnd training the student network model to obtain the weight of the student classification network model, and updating and determining the learning parameters of the student network model. It should be noted that the weight matrix of the classifier in the present invention is a technical means of the CNN network itself, and is set according to the classification category and number, which is not described in detail herein.

After the construction and the training of the teacher network model and the student network model are completed, the training data S of the same batch are respectively input into the trained teacher network model and the trained student network model, and the gender classification loss L of the teacher network model is respectively calculated ₁ And the gender classification loss L of the student network model ₂ . And further adopting a knowledge distillation algorithm to enable the gender prediction effect of the student network model to be closer to that of a large ResNet18 network. By utilizing the student network model and the knowledge distillation method, the student network model can be used as a small model to be closer to a large ResNet18 network on the premise of not changing the model structure on the aspect of predicting the probability problem.

Referring to fig. 2, the predictive probability distillation in the present invention is to further use the predictive probability of the teacher network to guide the output of the predictive gender probability of the student network based on gender classification, so that the student network learns the probability distribution of the teacher network. Specifically, the invention calculates loss functions of the student network models respectively, wherein the calculated losses include gender classification losses L ₂ And distillation loss L ₃ :

Wherein T is a temperature coefficient set to 10; p is a radical of _s (x _i ) And p _t (x _i ) The prediction probability vectors which are respectively output after the same sample data is input by the student network classifier and the teacher network classifier, namely the prediction probability vectors of the student network classifier and the teacher network classifier on the input image are as follows:

predicting the jth element of the probability vector for the student network;

the jth element of the probability vector is predicted for the teacher network.

After the distillation loss calculation of the student network model aiming at the teacher network is completed, the calculated student network model classification loss L is obtained ₂ And distillation loss L ₃ Then further calculating the total loss L of the knowledge distillation ₄ ：

L ₄ ＝αL ₂ +(1-α)L ₃ ；

Wherein α is a constant with a value range of (0,1), and the value of α is preferably 0.5 in the present invention. It should be noted that the knowledge distillation method further includes: and fixing parameters of the teacher network model and the corresponding classifier, and updating the parameters of the student network model and the classifier thereof according to a loss function Softmax algorithm to obtain a final student network model after final knowledge distillation. And taking the final student network model as a model for predicting the sex of the image person and storing the model in security equipment. Including but not limited to student network model weights and biases.

In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section, and/or installed from a removable medium. The computer program, when executed by a Central Processing Unit (CPU), performs the above-described functions defined in the method of the present application. It should be noted that the computer readable medium mentioned above in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. The computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wire segments, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless section, wire section, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

It will be understood by those skilled in the art that the embodiments of the present invention described above and illustrated in the drawings are given by way of example only and not by way of limitation, the objects of the invention having been fully and effectively achieved, the functional and structural principles of the present invention having been shown and described in the embodiments, and that various changes or modifications may be made in the embodiments of the present invention without departing from such principles.

Claims

1. A gender classification method based on predictive probabilistic knowledge distillation, the method comprising the steps of:

acquiring a to-be-predicted image, constructing a ResNet18 network as a teacher backbone network, and constructing a CNN network as a student backbone network;

configuring a gender classifier for the teacher network model, setting classification categories and constructing a weight matrix of the teacher network classifier;

configuring a gender classifier for the student network model, setting classification categories which are the same as those of the teacher network, and constructing a weight matrix of the student network classifier;

calculating gender classification loss and gender distillation loss of the teacher network model and the student network model respectively by adopting a loss function;

acquiring gender prediction probability vectors of the teacher network model and the student network model for respectively outputting the gender prediction to the input to-be-predicted images;

and calculating the total distillation loss according to the gender prediction probability vector, the gender classification loss and the gender distillation loss, and updating parameters of a teacher network model and a student network model through a gradient descent algorithm to obtain the student network model which finally outputs the gender prediction image.

2. The predictive probabilistic knowledge distillation-based gender classification method as claimed in claim 1, wherein the method comprises training the teacher network model, the training method comprising: acquiring an image to be trained to construct a teacher network training sample set, inputting the teacher network training sample set into a ResNet18 network, configuring network parameters to obtain a teacher network model, and calculating the gender classification loss of the teacher network model:

where the input sample set is S, S ═ x ₁ ，x ₂ ，…，x _m }；x _m Representing the mth input image information to be trained, m being the total number of images of the training sample, x _i The ith training image; n is the number of classification categories, and the number of classification categories for gender is 2, f _t A backbone network representing a teacher network; w _t A teacher network classifier weight matrix;

3. The gender classification method based on predictive probabilistic knowledge distillation of claim 1, wherein a random gradient descent algorithm is used as a training optimizer to train a teacher network model, and the teacher network model is trained according to gender classification loss of the teacher network model to obtain classification weights of the teacher network model and update the classification weights of the teacher network model.

4. The distillation gender classification method based on the predictive probability knowledge as claimed in claim 2, wherein the method comprises the steps of constructing a student network model, and the construction method of the student network model comprises the following steps: adopting a CNN network with 5 layers of depth and 3-by-3 convolution size as a backbone network of the student network, and constructing the student network model:

5. The distillation-based gender classification method according to the claim 4, wherein the teacher network model and the student network model respectively calculate the gender classification loss by using Softmax loss function.

6. The gender classification method based on prediction probability knowledge distillation as claimed in claim 5, characterized in that after obtaining the gender prediction probability vectors of the teacher network and the student network respectively outputting the input images to be predicted, the distillation loss of the student network model to the teacher network model is further calculated:

7. The gender classification method based on predictive probability knowledge distillation as claimed in claim 6, wherein the calculation methods of the predictive probability vectors output by the student network classifier and the teacher network classifier are respectively as follows:

p _s (x _i )＝W _s f _s (x _i )+bs

pt(x _i )＝Wtf _t (x _i )+bt；

wherein

Predicting the jth element of the probability vector for the student network;

the jth element of the probability vector is predicted for the teacher network.

8. The gender classification method based on predictive probability knowledge distillation as claimed in claim 7, wherein the method of calculating total distillation loss based on the gender predictive probability vector, gender classification loss and gender distillation loss comprises: l is ₄ ＝αL ₂ +(1-α)L ₃ Wherein alpha is a constant, and the value range of alpha is (0, 1).

9. A gender classification system based on predictive probabilistic knowledge distillation, the system performing a gender classification method based on predictive probabilistic knowledge distillation of any one of claims 1-8.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, which can be executed by a processor to perform a method for gender classification based on predictive probabilistic knowledge distillation according to any of the claims 1-8.