CN115082963A

CN115082963A - Human body attribute recognition model training and human body attribute recognition method and related device

Info

Publication number: CN115082963A
Application number: CN202210745493.6A
Authority: CN
Inventors: 林志凯; 余文杰; 彭京; 李建; 邓彦杰; 李海安
Original assignee: Chongqing Unisinsight Technology Co Ltd
Current assignee: Chongqing Unisinsight Technology Co Ltd
Priority date: 2022-06-27
Filing date: 2022-06-27
Publication date: 2022-09-20

Abstract

The invention provides a human body attribute recognition model training and human body attribute recognition method and a related device, wherein the method comprises the following steps: acquiring a training data set, determining a sub-training data set matched with the human body attributes from the training data set aiming at each human body attribute, and training a first residual error network based on the sub-training data set to obtain a sub-recognition model corresponding to each human body attribute; obtaining a confidence label corresponding to each human body attribute based on the training data set and each sub-recognition model; and training the second residual error network according to the confidence label corresponding to each human body attribute and all human body attribute labeling information to obtain a human body attribute identification model. The invention reduces the adverse effect caused by inconsistent convergence speed during training of different human body attributes based on the training mode of combining the double signal supervision of the artificial label and the soft label, and improves the identification accuracy of the human body attribute identification model.

Description

Human body attribute recognition model training and human body attribute recognition method and related device

Technical Field

The invention relates to the technical field of image recognition, in particular to a human body attribute recognition model training and human body attribute recognition method and a related device.

Background

With the rapid development of artificial intelligence technology and the deployment of massive high-definition monitoring equipment, the real-time human body attribute identification technology based on videos has good application prospects in various fields such as security monitoring, intelligent retail, targeted advertisement delivery and the like.

The existing human body attribute identification technology mainly obtains different human body attribute information including sex, age bracket, jacket color, sleeve length, whether to play mobile phones and other human body attributes by carrying out unified analysis on extracted human body attributes, the number of the human body attributes needing to be identified is increased along with the increasing of the demand, a single model cannot achieve good identification effect of all human body attribute identifications along with the increasing of the number of the human body attributes, the complexity of the model is higher and higher, and the requirements on storage space and calculation cost are increased continuously.

Therefore, how to provide a human body attribute identification model which can ensure that all human body attributes have better identification accuracy rate under lower calculation cost is a problem to be solved.

Disclosure of Invention

One of the objectives of the present invention is to provide a human body attribute recognition model training and human body attribute recognition method and related apparatus, so as to obtain a human body attribute recognition model capable of ensuring that all human body attributes have a better recognition accuracy at a lower calculation cost.

In a first aspect, the present invention provides a human body attribute recognition model training method, including:

acquiring a training data set; the training data set comprises a human body image sample and human body attribute labeling information corresponding to the human body image sample;

for each human body attribute, determining a sub-training data set matched with the human body attribute from the training data set, and training a first residual error network based on the sub-training data set to obtain a sub-recognition model corresponding to each human body attribute;

obtaining a confidence label corresponding to each human body attribute based on the training data set and each sub-recognition model;

and training the second residual error network according to the confidence coefficient label corresponding to each human body attribute and all the human body attribute labeling information to obtain a human body attribute identification model.

In a second aspect, the present invention provides a human body attribute identification method, including:

acquiring an image to be identified;

inputting the image to be recognized into a pre-trained human body attribute recognition model for recognition to obtain confidence coefficients corresponding to the human body attributes; wherein the human body attribute recognition model is obtained by the human body attribute recognition model training method of the first aspect;

aiming at the binary human body attribute, if the confidence coefficient of the human body attribute value of the binary human body attribute is greater than a preset threshold value, outputting the human body attribute value and the confidence coefficient of the human body attribute value;

and outputting the maximum confidence coefficient and the human body attribute value corresponding to the maximum confidence coefficient aiming at the multi-classification human body attribute.

In a third aspect, the present invention provides a training apparatus for a human body attribute recognition model, including:

the acquisition module is used for acquiring a training data set; the training data set comprises a human body image sample and human body attribute labeling information corresponding to the human body image sample;

the training module is used for determining a sub-training data set matched with the human body attributes from the training data set aiming at each human body attribute, and training a first residual error network based on the sub-training data set to obtain a sub-recognition model corresponding to each human body attribute;

a determining module, configured to obtain a confidence label corresponding to each human body attribute based on the training data set and each sub-recognition model;

and the training module is further used for training the second residual error network according to the confidence label corresponding to each human body attribute and all the human body attribute labeling information to obtain a human body attribute identification model.

In a fourth aspect, the present invention provides a human body attribute recognition apparatus, comprising:

the acquisition module is used for acquiring an image to be identified;

the recognition module is used for inputting the image to be recognized into a pre-trained human body attribute recognition model for recognition to obtain confidence coefficients corresponding to the human body attributes; the human body attribute recognition model is obtained according to the human body attribute recognition model training method of the first aspect;

the output module is used for outputting a first predefined attribute value and the confidence coefficient corresponding to the binary human body attribute if the confidence coefficient of the human body attribute value of the binary human body attribute is greater than a preset threshold value aiming at the binary human body attribute; if the confidence coefficient is smaller than the preset threshold value, outputting a second predefined attribute value corresponding to the classified human body attribute and the confidence coefficient;

the output module is further configured to output a maximum confidence level and a predefined attribute value corresponding to the maximum confidence level for the multi-classification human body attributes.

In a fifth aspect, the invention provides an electronic device comprising a processor and a memory, the memory storing a computer program executable by the processor, the processor being configured to execute the computer program to implement the method of the first or second aspect.

In a sixth aspect, the present invention provides a readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of the first or second aspect.

The invention provides a human body attribute recognition model training and human body attribute recognition method and a related device, wherein the method comprises the following steps: obtaining a training data set, then determining a sub-training data set matched with the human body attributes from the training data set aiming at each human body attribute, and training a first residual error network based on the sub-training data set to obtain a sub-recognition model corresponding to each human body attribute; then based on the training data set and each sub-recognition model, obtaining a confidence label corresponding to each human body attribute; and finally, training the second residual error network according to the confidence labels corresponding to each human body attribute and all the human body attribute labeling information to obtain a human body attribute identification model. The embodiment of the invention has two training processes, the model obtained in the first training process can obtain the confidence coefficient of the predefined human body attribute, the obtained confidence coefficient is further used as a soft label, and based on the training mode of combining the artificial label and the double monitoring signal of the soft label, the good recognition performance of the sub-recognition model can be effectively transferred to the human body attribute recognition model of multi-human body attribute recognition, the adverse effect caused by inconsistent convergence speed in the training of different human body attributes is reduced, and the recognition effect of the human body attribute recognition model is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

Fig. 1 is a schematic view of a human body attribute identification scenario provided in an embodiment of the present invention;

fig. 2 is a schematic view of an application scenario of a human body attribute recognition model training method provided in an embodiment of the present application;

fig. 3 is a schematic flowchart of a human body attribute recognition model training method according to an embodiment of the present application;

fig. 4 is a schematic flowchart of step S302 provided in the embodiment of the present invention;

FIG. 5 is a diagram illustrating classification of sub-recognition models provided by an embodiment of the present invention;

fig. 6 is a schematic flowchart of step S304 provided in the embodiment of the present invention;

fig. 7 is a schematic flowchart of a human body attribute identification method according to an embodiment of the present application;

FIG. 8 is a functional block diagram of a human body attribute recognition model training apparatus according to an embodiment of the present application;

fig. 9 is a functional block diagram of a human body attribute identification apparatus according to an embodiment of the present application;

fig. 10 is a block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

In the description of the present invention, it should be noted that if the terms "upper", "lower", "inside", "outside", etc. indicate an orientation or a positional relationship based on that shown in the drawings or that the product of the present invention is used as it is, this is only for convenience of description and simplification of the description, and it does not indicate or imply that the device or the element referred to must have a specific orientation, be constructed in a specific orientation, and be operated, and thus should not be construed as limiting the present invention.

Furthermore, the appearances of the terms "first," "second," and the like, if any, are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.

It should be noted that the features of the embodiments of the present invention may be combined with each other without conflict.

In order to facilitate understanding of the present invention, a description will be given first of all to some technical terms involved in the embodiments of the present invention.

1. The human body attributes are human visual features that can be perceived by a computer and a person at the same time, such as sex, age, jacket texture, jacket color, sleeve type, shirt texture, shirt color, shirt length, and shirt type.

2. The attribute values are predefined and assigned to each body attribute, for example, the body attribute-gender of a man or a woman, the body attribute-age of a baby, a child, a young, a middle-aged, an old, etc., the body attribute-jacket color of a red, yellow, blue, black, mixed color, the body attribute-whether the body attribute value of a hat is worn, yes, no, etc.

3. The human body attribute recognition model is a model capable of recognizing human body attribute values of a plurality of human body attributes at the same time.

4. And the sub-recognition model is a model capable of determining confidence degrees of various human body attributes.

With the rapid development of artificial intelligence technology and the deployment of massive high-definition monitoring equipment, the real-time human body attribute identification technology based on videos has good application prospects in various fields such as security monitoring, intelligent retail, targeted advertisement delivery and the like. For example, fig. 1 is a scene diagram of human attribute recognition according to an embodiment of the present invention, where the human attribute recognition refers to a process of determining which human attribute a human body has after detecting the human body based on a human body image. The human body attributes are most semantic features related to individuals, and the more common human body attributes include sex, age, color style of clothes, and whether there are any accessories such as backpacks, hats, and the like.

With the increasing demand of scenes, the number of human attributes to be identified is also increased, and a single model cannot achieve good identification effect of all human attributes.

In order to improve the accuracy of multi-human body attribute recognition, the related technology provides a method for combining a plurality of single human body attribute recognition models and a single multi-human body attribute recognition model, wherein each single human body attribute recognition model only recognizes the human body attributes of a divided single area, such as a head, a hand and the like, the scheme firstly needs to rely on more models to increase the requirement on storage space along with the increase of the human body attributes, secondly has higher requirement on a human body part division algorithm due to larger uncontrolled human body posture change and has the risk of part division errors, and the scheme is that the single model carries out human body attribute analysis on the whole human body to avoid the risk of human body division errors and has lower space storage requirement.

In order to solve the above problem, an embodiment of the present invention provides a multi-model fusion method, which can fuse a plurality of single human body attribute recognition models with better recognition into a single model, and ensure that all human body attributes have better recognition accuracy at a lower calculation cost.

Firstly, the embodiment of the invention provides a human body attribute recognition model training method, which ensures that a neural network model obtained by training can accurately recognize various human body attributes.

The following introduces an implementation manner of the human body attribute recognition model training method provided by the embodiment of the present application:

the human body attribute recognition model training method provided by the embodiment of the application can be applied to equipment with a model training function, such as terminal equipment, a server and the like. The terminal device may be a smart phone, a computer, a Personal Digital Assistant (PDA), a tablet computer, or the like; the server may specifically be an application server or a Web server, and when the server is deployed in actual application, the server may be an independent server or a cluster server.

In practical application, the terminal device and the server can train the neural network model independently or mutually, when the terminal device and the server train the neural network model interactively, the terminal device can obtain a training data set from the server and then perform model training by using the training data set to obtain the human body attribute recognition model, or the server can obtain the training data set from the terminal and then perform model training by using the training data set to obtain the human body attribute recognition model.

It should be understood that, when the terminal device or the server executes the training method provided in the embodiment of the present application, after obtaining the human body attribute identification model through training, the terminal device or the server may send the human body attribute identification model to other terminal devices, so as to run the human body attribute identification model on the terminal devices, thereby implementing corresponding functions; the body attribute recognition model can also be sent to other servers so as to run the body attribute recognition model on the other servers, and corresponding functions are realized through the servers.

In order to facilitate understanding of the technical solution provided by the embodiment of the present application, a training method provided by the embodiment of the present application is introduced below by taking a server training human attribute recognition model as an example and combining with an actual application scenario.

Referring to fig. 2, fig. 2 is a schematic view of an application scenario of the human body attribute recognition model training method provided in the embodiment of the present application. The scene comprises a terminal device 101 and a server 102 for model training, wherein the terminal device 101 and the server 102 are connected through a network. The terminal device 101 can provide the human body image sample and the human body attribute labeling information corresponding to the human body image sample for the server.

After the server 102 acquires the human body image sample and the human body attribute labeling information corresponding to the human body image sample from the terminal device 101 through the network, the human body image sample and the human body attribute labeling information corresponding to the human body image sample form a training data set, and then the server can execute a two-stage model training process, wherein the first stage of the model training process aims to obtain a sub-recognition model corresponding to each human body attribute and having a single human body attribute recognition function, the second stage of the model training process aims to obtain a human body attribute recognition model having a plurality of human body attribute recognition functions, and the finally obtained human body attribute recognition model can be applied to various scenes.

In the first stage of model training, the server 102 extracts a sub-training data set matched with each human body attribute from the training data set to perform model training until the model meets the convergence condition, so as to obtain a plurality of sub-recognition models.

In the embodiment of the application, the obtained sub-recognition model is used for determining the confidence label of each human body attribute, which provides training information for the subsequent training of the human body attribute recognition model, so that the model training effect is better.

In the second stage of model training, the server 102 performs model training by using the training data set and the obtained confidence labels of the predefined human body attributes until the model meets the convergence condition, so as to obtain a human body attribute identification model.

After the server 102 generates the human body attribute identification model, the human body attribute identification model may be further sent to the terminal device 101, so as to run the human body attribute identification model on the terminal device 101, and implement a corresponding function by using the human body attribute identification models.

It should be noted that the application scenario shown in fig. 2 is only an example, and in practical application, the neural network model training method provided in the embodiment of the present application may also be applied to other application scenarios, and no limitation is made to the application scenario of the neural network model training method here.

Referring to fig. 3, fig. 3 is a schematic flowchart of a human body attribute recognition model training method provided in the embodiment of the present application. For convenience of description, the following embodiments are described with a server as an execution subject, and it should be understood that the execution subject of the human body attribute recognition model training method is not limited to the server, and may be applied to a device having a model training function, such as a terminal device. As shown in fig. 3, the training method of the body-human attribute recognition model includes the following steps:

s301, acquiring a training data set; the training data set comprises human body image samples and human body attribute labeling information corresponding to the human body image samples;

s302, aiming at each human body attribute, determining a sub-training data set matched with the human body attribute from the training data set, and training the first residual error network based on the sub-training data set to obtain a sub-recognition model corresponding to each human body attribute;

s303, obtaining a confidence label corresponding to each human body attribute based on the training data set and each sub-recognition model;

s304, training the second residual error network according to the confidence label corresponding to each human body attribute and all human body attribute labeling information to obtain a human body attribute identification model.

According to the human body attribute recognition model training method provided by the embodiment of the application, a training data set is obtained, then a sub-training data set matched with human body attributes is determined from the training data set aiming at each human body attribute, and a first residual error network is trained on the basis of the sub-training data set to obtain a sub-recognition model corresponding to each human body attribute; then obtaining a confidence label corresponding to each human body attribute based on the training data set and each sub-recognition model; and finally, training the second residual error network according to the confidence coefficient label corresponding to each human body attribute and all human body attribute labeling information to obtain a human body attribute identification model. The embodiment of the invention has two training processes, the model obtained in the first training process can obtain the confidence coefficient of the predefined human body attribute, the obtained confidence coefficient is further used as a soft label, and based on the training mode of combining the artificial label and the double-signal supervision of the soft label, the good recognition performance of the sub-recognition model can be effectively transferred to the human body attribute recognition model of multiple human body attribute recognition, the adverse effect caused by inconsistent convergence speed in the training of different human body attributes is reduced, and the recognition effect of the human body attribute recognition model is improved.

The following describes steps S301 to S304 in detail.

In step S301, a training data set is acquired.

In the embodiment of the application, the training data set comprises a human body image sample and human body attribute labeling information corresponding to the human body image sample.

In the embodiment of the application, the human body image sample can be from a terminal device to acquire a snapshot image in real time, and can also be from a historical human body image or a historical snapshot image which is stored in a server in advance. In the embodiment of the invention, one human body image sample generally corresponds to one human body, so that the pedestrian in the human body image sample can be identified with fine granularity when the human body attribute is identified.

Thus, in an alternative embodiment, the implementation of step S301 may be as follows:

a1, acquiring a plurality of snap-shot images;

a2, carrying out human body detection on each snap-shot image to obtain a human body detection frame corresponding to each detected human body;

a3, cutting each snap-shot image based on the obtained human body detection frame, taking the image corresponding to the human body detection frame as a human body image sample, and forming a training data set based on all the human body image samples and the label information of all the human body attributes corresponding to each human body image sample.

In the embodiment of the present application, the human body attribute labeling information of the human body image samples is manually labeled, the human body attribute labeling information of each human body image sample may include real attribute values corresponding to one or more human body attributes, the human body attributes may include two kinds of predefined attribute values, and the multiple kinds of predefined attribute values may have more than two kinds of predefined attribute values. For example, for a human body image sample, the corresponding human body attribute information is sex male, the color of the jacket is black, the jacket is worn, and the youth is.

It should be noted that in the training data set in the embodiment of the present application, it is not necessary that all body attributes of all body image samples are labeled, some body attribute defects are allowed to exist, but the number of samples included in each body attribute must be sufficient, for example, each body attribute includes no less than 3000 artificial labels. That is, it is assumed that the final purpose of the embodiment of the present application is to enable the human body attribute recognition model to recognize 10 human body attributes, but in the training process, there may be less than 10 kinds of information in the human body attribute labeling information corresponding to each human body image sample.

In an alternative embodiment, after obtaining the training data set, the human body image sample may be further preprocessed, such as normalizing and scaling the human body image sample to a specified size, the preset size may be customized, for example 384 × 128, and the training data set may be further subjected to data enhancement, which may be, but is not limited to: random clipping, random scaling, random horizontal flipping, random color dithering (training color does not use human body attributes), random blurring, and the like, thereby improving the efficiency of model training. Therefore, after the step S301, the following steps may be further included:

b1, determining whether the size of each human body image sample is consistent with the preset size;

b2, if not, adjusting the size of the human body image sample to a preset size;

b3, performing data enhancement on each human body image sample; wherein, the data enhancement comprises any one of the following and the combination thereof: cutting; zooming; turning over; blurring.

In step S302, for each human body attribute, a sub-training data set matching the human body attribute is determined from the training data set, and the first residual error network is trained based on the sub-training data set to obtain a sub-recognition model corresponding to each human body attribute.

In the embodiment of the present application, the sub-training data set matched with each human body attribute represents that the human body attribute labeling information of each human body image sample in the sub-training data set has information of the human body attribute, for example, for a human body attribute-gender, a human body image sample with a gender human body attribute can be extracted from the training data set to form the sub-training data set.

In the embodiment of the present application, since the obtained sub-recognition model is not used for commercialization, the size of the first residual network is not limited, for example, the first residual network may be, but not limited to, Resnet101, and the embodiment of the present invention improves the structure of the first residual network, that is, the output of the last layer, i.e., the fully-connected layer, of the first residual network is changed to 1, that is, only the confidence level of one human attribute is output, so that the input of the first residual network is the sub-training data set, and the output is the confidence level of the human attribute.

In the embodiment of the present application, the sub-recognition model is used to predict the confidence corresponding to a certain human body attribute. It should be noted that the confidence level may represent the confidence level that the self-recognition model accurately recognizes the human body attribute, and may also represent the confidence level of a predefined attribute value corresponding to the human body attribute, for example, taking the human body attribute-gender as an example, when the human body image sample corresponding to a gender male is a positive example, and the human body image sample corresponding to a gender female is a negative example, that is, the tag of the gender male is represented by 1, and the tag of the gender female is represented by 0, the finally obtained confidence level may also be used as the confidence level of the gender male, and the confidence level of the gender female may be obtained by subtracting the confidence level of the gender male from the number 1.

In the training process, aiming at the multi-classification human body attributes, the training is equivalently divided into a plurality of two classifications to obtain a multi-classification human body attribute sub-recognition model, the human body attribute-age is taken as an example, when the young age recognition sub-model is trained, the training can be carried out under the two classification conditions of young and non-young, namely, a sample corresponding to young is taken as a positive example, and samples corresponding to other age attribute values are taken as negative examples, so that the sub-recognition model corresponding to the age group can be obtained, the confidence coefficient obtained by the sub-recognition model is the confidence coefficient corresponding to the age group (also can be young), similarly, if the training is carried out under the two classification conditions of children and non-children, the confidence coefficient corresponding to the children can be obtained, so aiming at the multi-classification human body attributes, a plurality of sub-recognition models can be trained, and each sub-recognition model can obtain the confidence coefficient of each predefined attribute value, if the confidence corresponding to a certain predefined attribute value is utilized to carry out human body attribute model training, the confidence of the predefined attribute value can be finally identified, and the predefined attribute value with the maximum confidence can be output as the identified attribute value aiming at a certain image to be identified.

For example, continuing to take the human body attribute-age as an example, assuming that child, young, old and other sub-recognition models and confidence levels corresponding to the child, young, old and other sub-recognition models can be obtained respectively, a human body attribute recognition model for performing model training by using the confidence level corresponding to the child and the artificial labeling information in the training set, and a human body attribute recognition model for performing model training by using the confidence level corresponding to the young and the artificial labeling information in the training set are obtained, and for the same image to be recognized, the obtained confidence levels are the confidence levels corresponding to the child and young, respectively, and then the attribute value and the human body attribute with the highest confidence level can be output.

Therefore, in a possible implementation manner, the step S302 may be as shown in fig. 4, where fig. 4 is a schematic flowchart of the step S302 provided in the embodiment of the present invention:

s302-1, extracting target human body image samples with target human body attributes in the human body attribute labeling information from the training data set according to each target human body attribute to form a sub-training data set. The target human body attribute is any one of all human body attributes;

in the embodiment of the application, since the human body attribute labeling information includes the human body attribute type and the human body attribute value, all target human body image samples corresponding to each human body attribute can be determined according to the human body attribute type, for example, the human body attribute information corresponding to one human body image sample is sex male, the color of a jacket is black, a hat is worn, and young people are born. The human body image sample can be a human body image sample with the same four human body attributes of gender, jacket color, whether a hat is worn or not and age group.

For each human body attribute, a sub-training data set matched with the human body attribute can be extracted from the training data set, and for example, the human body attribute-gender can be used for inquiring whether a 'gender' label exists in human body attribute labeling information corresponding to each human body image sample in the training data set, and as long as the human body image samples containing the 'gender' label are extracted as the sub-training data set matched with the human body attribute-gender. It is further understood that both positive examples and negative examples are required when constructing the sub-training data set, for example, when constructing the sub-training data set corresponding to the attribute of the human body, i.e., the sub-training data set corresponding to the gender, the positive examples and the negative examples include human body image samples corresponding to both the gender and the gender.

S302-2, calculating the loss value of the first loss function according to the predicted attribute value of the target human body attribute of the target human body image sample and the real attribute value of the target human body attribute in the human body attribute labeling information aiming at each sub-training data set.

S302-3, reversely transmitting the loss value of the first loss function to each layer of the first residual error network to perform iterative updating of the model parameters until a preset condition is reached, and taking the trained first residual error network as a sub-recognition model.

In an embodiment of the present application, the first loss function may satisfy the following relation:

wherein n represents the total number of target human body image samples; y is _i ' characterization of the ith target body mapPredicting attribute values of target human body attributes corresponding to the image samples; y is _i The real attribute value of the target human body attribute corresponding to the ith target human body image sample of the person is represented;

for example, assuming that the human body attribute-coat color corresponding to each target human body image sample is white, in the model training process, the predicted attribute value of the human body attribute-coat color corresponding to each target human body image sample may be white or other colors, and therefore, the loss value of the first loss function may be calculated according to the difference between white and the predicted color.

In the process of model training, some training parameters can be preset to ensure the training effect, for example, the batch size can be set to 128, the learning rate is set to 0.001 as an initial value, the learning rate is attenuated in a stepwise manner at a rate of 0.1, 50% or 75% of the total iteration number is used as attenuation nodes, and the total iteration number is set to 85 epoch.

S302-4, traversing all human body attributes to obtain the sub-recognition models corresponding to each human body attribute.

Through the training mode, the sub-recognition model obtained through training in the embodiment of the invention can be seen from fig. 5, and fig. 5 is a classification schematic diagram of the sub-recognition model provided in the embodiment of the invention, and as can be seen from fig. 5, the sub-model capable of recognizing the predefined human body attribute value can be obtained according to each predefined human body attribute value corresponding to each human body attribute in the embodiment of the invention, so that the accuracy and efficiency of subsequently obtaining the confidence coefficient of each predefined human body attribute are improved.

In step S303, a confidence label corresponding to each human body attribute is obtained based on the training data set and each sub-recognition model.

In the embodiment of the present application, the sub-recognition model corresponding to each human body attribute may be determined, and then the confidence of each human body attribute may be determined based on the obtained sub-recognition model.

For example, for a binary human body attribute, the confidence of the binary human body attribute may be determined by the sub-recognition model, and the confidence may also represent the confidence P of the predefined attribute value of the positive example corresponding to the binary attribute, and then the confidence of the predefined attribute value of the negative example is 1-P.

In the embodiment of the present application, after obtaining the confidence label corresponding to each human body attribute, the confidence may be used as a soft label to train the human body attribute recognition model, and the effect of training by using the confidence as a soft label is as follows:

1. the method is different from the common distillation learning method that the online forward operation prediction samples of the teacher model occupy larger computing resources, particularly along with the increase of iteration times, and the embodiment of the invention only needs one forward operation.

2. Based on the soft label signal supervision training mode, the good performance of the sub-recognition model can be transferred to a multi-body attribute model, namely the body attribute recognition model obtained by subsequent training of the application.

3. Based on the soft label signal supervision training mode, adverse effects caused by mismatching of convergence speeds in different human body attribute training processes due to the fact that the number of different human body attribute samples is unbalanced can be reduced, and human body attribute over-fitting due to sufficient artificial labels and human body attribute under-fitting due to lack of artificial labels are effectively avoided.

4. The training mode based on the combination of the artificial label and the dual-signal supervision of the soft label can reduce the adverse effect of the result with larger prediction error in the soft label on the model training due to the supervision of the artificial signal in the training, and provides the effect of the model training.

In step S304, the second residual error network is trained according to the confidence label corresponding to each human body attribute and all the human body attribute labeling information, so as to obtain a human body attribute identification model.

In the embodiment of the present application, since the obtained human body attribute identification model is used for productization, there should be a limit to the size of the second residual network, for example, the first residual network may be, but is not limited to, Resnet50 or Resnet18, and it is seen that the network structure of the first residual network is larger than that of the second residual network.

The embodiment of the invention improves the structure of the second residual error network, namely, the output of the last layer, namely the full connection layer, of the second residual error network is changed into the number of all human body attributes to be identified.

Therefore, in an alternative implementation, reference may be made to fig. 6 for implementation of the step S304, where fig. 6 is a schematic flowchart of the step S304 provided in an embodiment of the present invention:

s304-1, determining a first difference between the predicted attribute value of each human body attribute corresponding to each human body image sample and the real attribute value of each human body attribute in the human body attribute labeling information.

S304-2, determining a second difference between the predicted attribute value of each human body attribute corresponding to each human body image sample and the confidence label of each human body attribute.

S304-3, calculating a loss value of a second loss function according to the first difference degree, the second difference degree and the respective weights of the first difference degree and the second difference degree;

and S304-4, reversely transmitting the total loss value to each layer of the second residual error network to carry out iterative updating on model parameters until a preset condition is reached, and taking the trained second residual error network as a human body attribute recognition model.

In an embodiment of the present invention, the second loss function may satisfy the following relation:

wherein n represents the total number of target human body image samples; m represents the number of categories of the human body attribute; y is _i,j Representing the real attribute value of the ith human body attribute corresponding to the jth human body image sample; y' _i,j Representing the predicted attribute value of the ith human body attribute corresponding to the jth human body image sample;

z _i,j representing the confidence label of the ith human body attribute corresponding to the jth human body image sample; when the attribute value of the ith human body attribute exists in the human body attribute marking information of the jth human body image sample, w _i,j 1, otherwise, w _i,j 0; λ characterizes the weight of the second degree of difference.

It can be seen that the second loss function is a combination of a cross entropy loss function (the formula on the left side of the plus sign) and a mean square error loss function (the formula on the right side of the plus sign), wherein the cross entropy loss function can be used to determine the first degree of difference, and the mean square error loss function can be used to determine the second degree of difference.

In the process of training the model, the model training parameters may be the same as the model training parameters set in step S302, and are not described herein again.

Based on the human body attribute recognition model obtained through the training, an embodiment of the present invention further provides a human body attribute recognition method, please refer to fig. 7, where fig. 7 is a schematic flowchart of the human body attribute recognition method provided by the embodiment of the present invention, and for convenience of description, the following embodiment describes using a terminal device as an execution subject, it should be understood that the execution subject of the human body attribute recognition model training method is not limited to the terminal device, and as shown in fig. 7, the human body attribute recognition method includes the following steps:

s401, obtaining an image to be identified.

In this embodiment, the implementation of obtaining the image to be recognized may be similar to the implementation of obtaining the human body image sample as described above, and details are not described here.

In an alternative embodiment, after the image to be recognized is acquired, the image to be recognized may also be processed in a data preprocessing manner as described above.

S402, inputting the image to be recognized into a pre-trained human body attribute recognition model for recognition, and obtaining confidence corresponding to each human body attribute.

The human body attribute recognition model is obtained according to the human body attribute recognition model training method provided by the embodiment of the invention, and is not repeated here;

s403, aiming at the binary human body attribute, if the confidence coefficient of the binary human body attribute is greater than a preset threshold value, outputting a first predefined attribute value and the confidence coefficient corresponding to the binary human body attribute value;

s404, if the confidence coefficient of the classified human body attribute is smaller than a preset threshold value, outputting a second predefined attribute value and the confidence coefficient corresponding to the classified human body attribute.

In the embodiment of the present invention, since the confidence corresponding to the two classified human body attributes is the confidence of one of the predefined attribute values, for example, taking gender as an example, in the training process, if a gender male is a positive example sample and a gender female is a negative example sample, the confidence obtained by the obtained sub-recognition model may also be used as the confidence of the gender male, and if the gender male is the negative example sample and the gender female is the positive example sample, the confidence obtained by the obtained sub-recognition model may also be used as the confidence of the gender female.

In an optional embodiment, the first predefined attribute value and the second predefined attribute value may be set based on a tag corresponding to a binary attribute in the human body labeling information, where an attribute value with a tag of 1 is set as the first predefined attribute value, and an attribute value with a tag of 0 is set as the second predefined attribute value.

For example, if the confidence obtained by the sub-recognition model is also the confidence corresponding to gender and woman, the first predefined attribute value is woman, and the second predefined attribute value is man, then it is assumed that the confidence obtained in step S402 is 0.9, the preset threshold is 0.5, the finally output attribute value is woman, and the confidence is 0.9; if the confidence is 0.4, the final output attribute value is male, and the confidence is 0.4.

S405, aiming at the multi-classification human body attributes, outputting the maximum confidence and the human body attribute value corresponding to the maximum confidence.

According to the human body attribute identification method provided by the embodiment of the invention, the image to be identified is classified into the pre-trained human body attribute identification model, so that the human body attribute value corresponding to each human body attribute and the confidence coefficient of the human body attribute value can be obtained, then the human body attribute value corresponding to the confidence coefficient which is greater than the preset threshold value is output aiming at the classified human body attribute, and the human body attribute value corresponding to the maximum confidence coefficient is output aiming at the multi-classified human body attribute, so that a user can conveniently and visually master the accuracy of the identification result.

Based on the same inventive concept, the embodiment of the invention can also obtain the vehicle human body attribute recognition model by a similar training method, in the training process, the human body image sample is replaced by the vehicle image, and the human body attribute and each predefined human body attribute value are replaced by the human body attribute and the predefined human body attribute value corresponding to the vehicle, so that the vehicle human body attribute recognition model can be obtained, and the vehicle human body attribute recognition can be carried out in an actual scene.

The human body attribute recognition model training method provided in the embodiment of the present application may be implemented in a hardware device or in a form of a software module, and when the human body attribute recognition model training method is implemented in a form of a software module, the embodiment of the present application further provides a human body attribute recognition model training method and apparatus, please refer to fig. 8, fig. 8 is a functional block diagram of the human body attribute recognition model training apparatus provided in the embodiment of the present application, and the human body attribute recognition model training apparatus 500 may include:

a first obtaining module 510, configured to obtain a training data set; the training data set comprises human body image samples and human body attribute labeling information corresponding to the human body image samples;

a training module 520, configured to determine, for each human body attribute, a sub-training data set matching the human body attribute from the training data set, and train the first residual error network based on the sub-training data set to obtain a sub-recognition model corresponding to each human body attribute;

a determining module 530, configured to obtain a confidence label corresponding to each human body attribute based on the training data set and each sub-recognition model;

the training module 520 is further configured to train the second residual error network according to the confidence label corresponding to each human body attribute and all the human body attribute labeling information, so as to obtain a human body attribute identification model.

It is appreciated that the first obtaining module 510, the training module 520, and the determining module 530 may cooperatively perform the various steps of fig. 3 to achieve the corresponding technical effect.

In alternative embodiments, training module 520 may also be used to perform the various steps of fig. 4 and 6 to achieve corresponding technical effects.

In an alternative embodiment, the first obtaining module 510 is specifically configured to perform the steps a1 to a3 to achieve corresponding technical effects.

In an alternative embodiment, the human body attribute recognition model training device 500 may further include a preprocessing module for performing the above steps b1 to b3 to achieve the corresponding technical effect.

The human body attribute recognition method provided in the embodiment of the present application may be implemented in a hardware device or in a form of a software module, and when the human body attribute recognition method is implemented in the form of a software module, an apparatus for training a human body attribute recognition model is also provided in the embodiment of the present application, please refer to fig. 9, where fig. 9 is a functional block diagram of the apparatus for training a human body attribute recognition model provided in the embodiment of the present application, and the apparatus 600 for recognizing a human body attribute may include:

a second obtaining module 610, configured to obtain an image to be identified;

the recognition module 620 is configured to input the image to be recognized into a pre-trained human body attribute recognition model for recognition, so as to obtain confidence levels corresponding to the human body attributes; the human body attribute recognition model is obtained according to the human body attribute recognition model training method provided by the embodiment of the invention.

An output module 630, configured to, for the binary human body attribute, output a first predefined attribute value and a confidence level corresponding to the binary human body attribute if the confidence level of the human body attribute value of the binary human body attribute is greater than a preset threshold; and if the confidence coefficient is smaller than the preset threshold value, outputting a second predefined attribute value and the confidence coefficient corresponding to the classified human body attribute.

The output module 630 is further configured to output the maximum confidence and the human attribute value corresponding to the maximum confidence for the multi-classification human attribute.

It is understood that the second obtaining module 610, the identifying module 620 and the outputting module 630 can cooperatively perform the steps of fig. 7 to achieve the corresponding technical effect.

Fig. 10 shows a block diagram of an electronic device according to an embodiment of the present invention, where fig. 10 is a block diagram of the electronic device according to the embodiment of the present invention.

As shown in fig. 10, the electronic device 700 comprises a memory 701, a processor 702 and a communication interface 703, wherein the memory 701, the processor 702 and the communication interface 703 are electrically connected to each other directly or indirectly to enable data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines.

The memory 701 may be used to store software programs and modules, such as instructions/modules of the human body attribute recognition model training device 500 or the human body attribute recognition device 600 provided by the embodiment of the present invention, which may be stored in the memory 701 in the form of software or firmware (firmware) or be fixed in an Operating System (OS) of the electronic device 700, and the processor 702 executes the software programs and modules stored in the memory 701, so as to execute various functional applications and data processing. The communication interface 703 may be used for communicating signaling or data with other node devices.

The Memory 701 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like.

The processor 702 may be an integrated circuit chip having signal processing capabilities. The processor 702 may be a general-purpose processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

It will be appreciated that the configuration shown in fig. 10 is merely illustrative and that electronic device 700 may include more or fewer components than shown in fig. 10 or have a different configuration than shown in fig. 10. The components shown in fig. 10 may be implemented in hardware, software, or a combination thereof.

An embodiment of the present application further provides a readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the human body attribute recognition model training method or the human body attribute recognition method according to any one of the foregoing embodiments. The computer readable storage medium may be, but is not limited to, various media that can store program codes, such as a usb disk, a removable hard disk, a ROM, a RAM, a PROM, an EPROM, an EEPROM, a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A human body attribute recognition model training method is characterized by comprising the following steps:

and training the second residual error network according to the confidence label corresponding to each human body attribute and all the human body attribute labeling information to obtain a human body attribute identification model.

2. The method according to claim 1, wherein for each body attribute, determining a sub-training data set matching the body attribute from the training data set, and training a first residual network based on the sub-training data set to obtain a sub-recognition model corresponding to each body attribute, comprises:

extracting target human body image samples with the target human body attributes in the human body attribute labeling information from the training data set according to the target human body attributes to form a sub-training data set; the target human body attribute is any one of all the human body attributes;

calculating a loss value of a first loss function according to the predicted attribute value of the target human body attribute of the target human body image sample and the real attribute value of the target human body attribute in the human body attribute labeling information for each sub-training data set;

reversely transmitting the loss value of the first loss function to each layer of the first residual error network to perform iterative update of model parameters until a preset condition is reached, and taking the trained first residual error network as the sub-recognition model corresponding to the target human body attribute;

and traversing all the human body attributes to obtain the sub-recognition model corresponding to each human body attribute.

3. The method of claim 1, wherein training the second residual error network according to the confidence label corresponding to each human body attribute and all the human body attribute labeling information to obtain a human body attribute identification model comprises:

determining a predicted attribute value of each human body attribute corresponding to each human body image sample and a first difference degree of a real attribute value corresponding to each human body attribute in the human body attribute labeling information;

determining a second difference between the predicted attribute value of each human body attribute corresponding to each human body image sample and the confidence label corresponding to each human body attribute;

calculating a loss value of a second loss function according to the first difference degree, the second difference degree and the respective weights of the first difference degree and the second difference degree;

and reversely transmitting the total loss value to each layer of the second residual error network to carry out iterative updating of model parameters until a preset condition is reached, and taking the trained second residual error network as the human body attribute recognition model.

4. The method of claim 1, wherein after acquiring the training data set, the method further comprises:

determining whether the size of each human body image sample is consistent with a preset size;

if not, adjusting the size of the human body image sample to the preset size;

performing data enhancement on each human body image sample; wherein the data enhancement comprises any one of the following and combinations thereof: cutting; zooming; turning over; blurring.

5. The method of claim 2, wherein the first loss function satisfies the following relationship:

wherein n represents the total number of the target human body image samples; y is _i Representing a predicted attribute value of the target human body attribute corresponding to the ith target human body image sample; y is _i Representing the ith target human body image sampleThe real attribute value of the corresponding target human body attribute; wherein,

6. the method of claim 3, wherein the second loss function satisfies the following relationship:

wherein n represents the total number of the human body image samples; m represents the number of the human body attributes; y is _i,j 、y' _i,j And z _i,j Respectively representing a real attribute value, a prediction attribute value and a confidence label of the ith human body attribute corresponding to the jth human body image sample; when the attribute value of the ith human body attribute exists in the human body attribute marking information of the jth human body image sample, w _i,j 1, otherwise, w _i,j 0; λ characterizes a weight of the second degree of difference;

7. the method of claim 1, wherein obtaining a training data set comprises:

acquiring a plurality of snap-shot images;

carrying out human body detection on each snapshot image to obtain a human body detection frame corresponding to each detected human body;

and based on the obtained human body detection frame, cutting each snap-shot image, taking the image corresponding to the human body detection frame as the human body image sample, and forming the training data set based on all the human body image samples and all the human body attribute label information corresponding to each human body image sample.

8. A human body attribute identification method is characterized by comprising the following steps:

acquiring an image to be identified;

inputting the image to be recognized into a pre-trained human body attribute recognition model for recognition to obtain confidence corresponding to each human body attribute; the human body attribute recognition model is obtained by the human body attribute recognition model training method according to any one of claims 1 to 7;

aiming at the classified human body attribute, if the confidence coefficient of the classified human body attribute is larger than a preset threshold value, outputting a first predefined attribute value and the confidence coefficient corresponding to the classified human body attribute; if the confidence coefficient is smaller than the preset threshold value, outputting a second predefined attribute value corresponding to the classified human body attribute and the confidence coefficient;

and outputting the maximum confidence coefficient and a predefined attribute value corresponding to the maximum confidence coefficient aiming at the multi-classification human body attribute.

9. A human body attribute recognition model training device is characterized by comprising:

10. A human attribute recognition apparatus, comprising:

the acquisition module is used for acquiring an image to be identified;

the recognition module is used for inputting the image to be recognized into a pre-trained human body attribute recognition model for recognition to obtain confidence coefficients corresponding to the human body attributes; the human body attribute recognition model is obtained by the human body attribute recognition model training method according to any one of claims 1 to 7;

the output module is used for outputting a first predefined attribute value and the confidence coefficient of the classified human body attribute if the confidence coefficient of the classified human body attribute is greater than a preset threshold value aiming at the classified human body attribute; if the confidence coefficient is smaller than the preset threshold value, outputting a second predefined attribute value corresponding to the classified human body attribute and the confidence coefficient;

11. An electronic device comprising a processor and a memory, the memory storing a computer program executable by the processor, the processor being operable to execute the computer program to implement the method of any one of claims 1 to 7 or to implement the method of claim 8.

12. A readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1 to 8.