CN112200005A

CN112200005A - Pedestrian gender identification method based on wearing characteristics and human body characteristics under community monitoring scene

Info

Publication number: CN112200005A
Application number: CN202010966050.0A
Authority: CN
Inventors: 孙浩云; 张卫山; 尹广楹; 张大千; 徐亮; 管洪清
Original assignee: Qingdao Sui Zhi Information Technologies Co ltd
Current assignee: Qingdao Sui Zhi Information Technologies Co ltd
Priority date: 2020-09-15
Filing date: 2020-09-15
Publication date: 2021-01-08

Abstract

The invention relates to the technical field of image retrieval, artificial intelligence and deep learning, and particularly discloses a pedestrian gender identification method based on wearing characteristics and human body characteristics in a community monitoring scene. The D-FMCNN model provided by the invention breaks through the limitation of the previous gender identification method in the aspect of single characteristic, has higher accuracy compared with the original gender identification research result, can still effectively identify the gender of the pedestrian under the image to be detected under the condition of characteristic loss, and shows higher accuracy and precision in the community monitoring scene.

Description

Pedestrian gender identification method based on wearing characteristics and human body characteristics under community monitoring scene

Technical Field

The invention relates to the technical field of image retrieval, artificial intelligence and deep learning, in particular to a pedestrian gender identification method based on wearing characteristics and human body characteristics in a community monitoring scene.

Background

Pedestrian detection is a branch of computer vision research, and pedestrian gender identification is one of the key steps in pedestrian detection. In the early related technologies of pedestrian gender identification, most of research results are structural and computational improvements of identification algorithms (such as principal component analysis, support vector machine, convolutional neural network and the like) on the basis of related technologies of face identification, and gender identification researches aiming at other form features are rare.

The gender identification problem is essentially a two-classification problem, i.e., a problem of determining whether the pedestrian is male or female. Generally, the solution to the two-class problem is generally as follows: and establishing a certain discriminant function for the pictures in the training set, continuously performing iteration operation on all the pictures in the training set in multiple cycles, and correcting parameters of the discriminant function in each iteration process according to the difference between the judgment result and the actual result of the discriminant function. In summary, the learning and correction of the discriminant function are based on the classification label corresponding to each image in the training set, and are a type of supervised learning process, and the accuracy and the fitness of the discriminant function depend on the complexity of the structure thereof, and if the accuracy of gender identification under a certain characteristic is excessively pursued, other effective discriminant information may be lost.

Further, in pedestrian recognition under a complex environment, analysis of a single feature is often limited, and if an algorithm for pedestrian gender recognition is designed only based on human face features, when a pedestrian wears a skirt and the face is blocked or blurred, the gender of the pedestrian cannot be judged according to the image of the pedestrian, which is obviously unrealistic. In order to enable the traditional pedestrian gender identification algorithm to break through the limitation, the method considers the clothing characteristics and the human body characteristics of the pedestrian, and designs a convolutional neural network which can learn each characteristic and integrate the classification result according to the branch relation of the characteristics and the importance degree of the characteristics in gender classification. Due to the structural particularity, the patent names the convolutional neural network with Distributed multi-layer feature fusion, and the convolutional neural network is called D-MFCNN (Distributed multi-layer features fusion neural network) in the following.

Disclosure of Invention

In order to solve the problems of characteristic limitation and research of the conventional pedestrian gender identification method, the invention aims to provide the pedestrian gender identification method based on wearing characteristics and human body characteristics in a community monitoring scene. The method considers fine-grained characteristics of a pedestrian image, and particularly designs a distributed multilayer characteristic fusion convolutional neural network, namely D-MFCNN. The network analyzes the wearing characteristics of the pedestrian images and the specific local characteristics of the human body characteristics at multiple levels and multiple angles, and can perform supervised learning according to the training set images. In addition, for different characteristics, the D-MFCNN can judge the importance degree of the characteristics in the gender identification process, so that the gender classification results of the characteristics can be integrated according to actual conditions. On the basis of the conventional pedestrian gender identification method, the D-MFCNN breaks through the limitation of single characteristics, has higher accuracy compared with the original gender identification research result, and can still effectively identify the pedestrian gender in the image to be detected under the condition of characteristic deficiency.

In order to solve the technical problems, the technical scheme of the invention is as follows:

a pedestrian gender identification method based on wearing characteristics and human body characteristics under a community monitoring scene comprises the following steps:

step 1: performing primary extraction on a pedestrian image data source from a monitoring video; and each extracted image is required to contain only a single pedestrian;

step 2: and calibrating the fine-grained characteristics of each pedestrian image in the training set, wherein each image is correctly calibrated according to the gender label. Establishing a feature alignment model aiming at the calibrated fine-grained feature reference AAM (active application model) idea so as to automatically label and cut an image to be detected;

and step 3: performing data preprocessing again on the training set which is well marked in the step 2, cutting the training set into a plurality of small training sets according to the characteristics, and marking a gender label again for each characteristic image in the small training sets, wherein the gender label is possibly different from the gender label of the original image;

and 4, step 4: establishing a plurality of small convolutional neural networks aiming at the fine-grained features defined in the step 3, training the neural networks by using the corresponding small training set in the step 3 to output a correct gender classification result, wherein the classification result is represented by a two-dimensional vector, and each dimension represents the probability that the gender of the pedestrian is male or female;

and 5: establishing a full-connection layer, wherein the layer takes labels calibrated aiming at fine-grained features in each training set as input and takes labels of an original image corresponding to the features as output for training, and finally, correct full-connection layer representation is obtained; the full connection layer can integrate the gender classification results of all the characteristics to output a final recognition result, and the form of the recognition result is the same as that in the step 4;

step 6: and (4) integrating the small convolutional neural networks trained in the step (4) with the fully-connected layers trained in the step (5) to form a distributed multi-layer feature-fused convolutional neural network, namely D-MFCNN.

And 7: and testing the trained D-MFCNN model, and cutting the image to be tested according to specified fine-grained characteristics to input the model to obtain a final gender classification result.

Preferably, in the step 2, the pedestrian image is framed according to fine-grained features, and the fine-grained features are defined in the patent mainly as a jacket feature, a lower-jacket feature, a shoe feature under the wearing feature, and a face feature, a hair feature, a chest feature and the like in the human body feature. The contour of the feature in each image is calibrated by the feature standard, and an AAM alignment model is built on the basis of the contour. For the gender identification problem, the label marked by each pedestrian image is a male or female label naturally, and is (1,0) or (0,1) expressed by a two-dimensional vector.

Preferably, in the step 3, the original data set is divided into a plurality of small data sets according to the characteristic according to the fine-grained characteristic labeled in the step 2, and the image in each data set is an image with a single characteristic (such as a face data set, a hair data set, etc.). And calibrating the sex label of the segmented image again according to the common sense in reality. The sex label in the small dataset is not necessarily the same as the original image, and for example, a male artist with long hair is taken as an example, and the corresponding global sex label is male, but the long hair is generally female. In addition, most of the clothes have a low tendency to be neutral with respect to their sex, and are (0.5 ) in a two-dimensional vector representation. Therefore, if the small convolutional neural network in step 4 is trained to make the gender determination more detailed, at least in terms of the features (such as wearing features) with less gender tendency, the gender should be calibrated in the form of two-dimensional vectors.

Preferably, in the step 4, the training of the convolutional neural network takes the fine-grained feature image as an input, and takes the two-dimensional vector label in the step 3 as an output. The structure of the convolutional neural network is referred to as ResNet. ResNet overcomes the problems of gradient disappearance and network degradation brought by network training, can judge a redundant layer causing performance degradation in a network structure, and selectively skips the layer in the process of propagation so as to achieve complete circulation of information. By using the ResNet network structure, the degradation problem caused by too long training period or improper structure can be avoided in the training process, and the parameter adjusting process of network training is simplified.

Preferably, in the step 6, a full connection layer is connected to the output layer of the convolutional neural network corresponding to each fine-grained feature. In fact, the fully connected layer represents the importance degree of each feature in the gender classification process, and the fully connected layer balances advantages and disadvantages after receiving the gender classification results corresponding to the features and comprehensively outputs the final gender judgment result. As described in steps 2 and 3, the classification result is expressed as a two-dimensional vector, and each dimension represents the probability that the pedestrian image is classified as male or female.

Preferably, in the step 7, the image to be detected is automatically labeled and cut by the AAM alignment model in the step 2, and the cut fine-grained feature image is input to the convolutional neural network with the corresponding feature, and is subjected to parallel analysis by a plurality of convolutional neural networks. And finally, the full connection layer receives and integrates the analysis results of each network according to the importance degree of the characteristics to obtain the final gender classification result. The above is the principle of gender identification of the D-MFCNN model.

By adopting the technical scheme, the pedestrian gender identification method based on the wearing characteristics and the human body characteristics under the community monitoring scene has the following beneficial effects: the invention can carry out supervised learning according to the training set image by carrying out multi-level and multi-angle analysis on the wearing characteristics of the pedestrian image and the specific local characteristics under the human body characteristics. In addition, for different characteristics, the D-MFCNN can judge the importance degree of the characteristics in the gender identification process, so that the gender classification results of the characteristics can be integrated according to actual conditions. On the basis of the conventional pedestrian gender identification method, the D-MFCNN breaks through the limitation of single characteristics, has higher accuracy compared with the original gender identification research result, and can still effectively identify the pedestrian gender in the image to be detected under the condition of characteristic deficiency.

Drawings

FIG. 1 is a general flow chart of a method for identifying gender of a pedestrian according to an embodiment of the present invention;

FIG. 2 is a flow chart of the modeling of the AAM feature alignment model used in the present invention;

FIG. 3 is a specific framework structure of the D-MFCNN model proposed by the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1, the pedestrian gender identification method based on wearing characteristics and human body characteristics under the community monitoring scene includes the following steps: step 1: performing primary extraction on a pedestrian image data source from a monitoring video; and each extracted image is required to contain only a single pedestrian; step 2: and calibrating the fine-grained characteristics of each pedestrian image in the training set, wherein each image is correctly calibrated according to the gender label. Establishing a feature alignment model aiming at the calibrated fine-grained feature reference AAM (active application model) idea so as to automatically label and cut an image to be detected; and step 3: performing data preprocessing again on the training set which is well marked in the step 2, cutting the training set into a plurality of small training sets according to the characteristics, and marking a gender label again for each characteristic image in the small training sets, wherein the gender label is possibly different from the gender label of the original image; and 4, step 4: establishing a plurality of small convolutional neural networks aiming at the fine-grained features defined in the step 3, training the neural networks by using the corresponding small training set in the step 3 to output a correct gender classification result, wherein the classification result is represented by a two-dimensional vector, and each dimension represents the probability that the gender of the pedestrian is male or female; and 5: establishing a full-connection layer, wherein the layer takes labels calibrated aiming at fine-grained features in each training set as input and takes labels of an original image corresponding to the features as output for training, and finally, correct full-connection layer representation is obtained; the full connection layer can integrate the gender classification results of all the characteristics to output a final recognition result, and the form of the recognition result is the same as that in the step 4; step 6: and (4) integrating the small convolutional neural networks trained in the step (4) with the fully-connected layers trained in the step (5) to form a distributed multi-layer feature-fused convolutional neural network, namely D-MFCNN. And 7: and testing the trained D-MFCNN model, and cutting the image to be tested according to specified fine-grained characteristics to input the model to obtain a final gender classification result.

Aiming at the problem of characteristic limitation of the conventional pedestrian gender identification method, the invention integrates characteristic information possibly used for gender identification, and provides a new model D-MFCNN for a gender identification scene by combining the research results and the technical experience of the predecessor. The model is improved on the basis of the original gender identification method, the accuracy and precision of gender identification in a community complex environment are improved by using fine-grained characteristic information, and good robustness can be kept in a special scene with characteristic loss.

The following is a detailed description of a pedestrian gender identification method based on wearing characteristics and human body characteristics under a community monitoring scene related by the invention. Fig. 1 mainly describes a specific flow of the method in an actual application scenario, the method is based on an AAM method and a ResNet network structure, and changes are made in the direction of fine-grained feature extraction, and the D-MFCNN model provided by the invention is a novel algorithm improved model in a gender identification scenario. The main exemplary implementation methods associated with this example are summarized as follows:

first, a pedestrian image dataset from a surveillance video is subjected to multi-layer extraction and processing. The processing steps mainly comprise: dividing the image by taking the pedestrians as a unit to ensure that each pedestrian image has one and only one pedestrian; then labeling the preliminarily processed pedestrian image data set by using a gender label, and using key feature points to frame fine-grained features of the pedestrian image data set, wherein the features mainly related to the invention are described in detail in the network structure of FIG. 3; and finally, cutting the calibrated training set again according to the calibrated characteristics and dividing the training set into a plurality of small training sets, dividing the training sets according to the characteristics, and calibrating the sex labels again according to common sense in real life for images in the training sets under different characteristics. For the characteristics with low gender sensitivity, gender tendency labeling should be attempted by using a two-dimensional vector form, and a simple use of 0 or 1 for distinguishing may deteriorate the accuracy of the network to be trained. For example, although features such as long hair and skirt, which can clearly determine the sex of the garment, can be marked as male or female, most of the garments are marked with neutral characters, which are not able to clearly distinguish the sex of the garment, and are expressed by two-dimensional vectors, i.e., (0.5 ), i.e., the garment features are likely to belong to male or female.

In addition, since the gender identification process involves the problem of fine-grained feature extraction, correct calibration and extraction of features are the most critical step for a new pedestrian image to be identified. Therefore, in the preprocessing step of the previous training set, for the calibrated feature points, a model for key point feature alignment should be established by referring to the idea of AAM, and through learning the feature point contour and texture feature corresponding to each class of feature in the training set, the AAM model can automatically align on a new image and correctly frame the corresponding fine-grained feature with the key feature points. And cutting the new image on a fine-grained feature level on the basis of the automatic labeling result of the AAM model, and taking the feature images as the input of a corresponding convolutional neural network to obtain a final prediction result. The approximate AAM model building process is shown in fig. 2.

As shown in fig. 3, for multi-feature information extraction, the D-MFCNN model proposed by the present invention specifically establishes a small convolutional neural network for each fine-grained feature, and each network is optimized with reference to ResNet in terms of network structure and training parameters, so that the training efficiency of the network is high and the accuracy is not reduced due to an excessively redundant and complex structure. The ResNet structure references the thought of a residual error network, can spontaneously judge the redundant layer and carry out jump type information transmission, and ensures high integrity transmission of information. In addition, to further improve training efficiency, the training of all small convolutional neural networks is parallelized. And training each convolutional neural network by taking the cut fine-grained characteristic image as input and the sex label calibrated in the characteristic image as input. Meanwhile, in order to integrate the gender analysis results on each feature level, a full connection layer is introduced, and the gender classification results of all the convolutional neural networks are subjected to importance balance and integrated into a final gender classification result. The input of the full connection layer is a gender label corresponding to each characteristic image, the output is a total label of an original pedestrian image corresponding to each characteristic, the two labels are both expressed by two-dimensional vectors, and each dimension of the vectors expresses the probability of classifying into males or females. And carrying out multi-step processing on a data source from the monitoring video, and preparing for establishing an alignment model of the AAM characteristic points and establishing a convolutional neural network under a fine-grained characteristic branch in the D-MFCNN. The AAM model needs to be established on the basis of feature points calibrated in an original pedestrian image, and training of each branch convolutional neural network in the D-MFCNN needs each feature image and a local gender label corresponding to the feature. Meanwhile, the gender label of the characteristic image and the overall label of the pedestrian image are respectively used as the input and the output of the fully connected layer in the D-MFCNN to correct the weight of the layer, the corrected fully connected layer can accurately measure the importance degree of the characteristic, and the gender identification result can be obtained in a more objective angle when the analysis information of each branched neural network is integrated. In order to improve the training efficiency, the above iterative process of the network structure is usually executed concurrently. The AAM model used considers the shape and texture characteristics in the aspect of characteristic point labeling, and can more accurately describe the contour of the key characteristic compared with the prior automatic labeling model. The role of the AAM model is mainly to accurately locate and extract key features in the new image, and the extracted features will be used as input for the D-MFCNN. And marking the sex labels of the total pedestrian image data set and the differentiated feature image data set related to the local features. Wherein, the labeling results of the two are different. Although the gender of a general pedestrian image can be determined in a global view, the characteristics (such as jacket and hair) differentiated from the pedestrian image cannot be regarded as the same as the original image. For example, for a lovers' clothing and a male artist who is left for a long time, the local tags cannot be considered blindly the same as the global tags, and even the tags corresponding to all features cannot be considered as male or female. For the characteristics with weak gender relevance, such as most coats and shoes, the labels are calibrated in a two-dimensional vector mode as much as possible, and neutral is represented by (0.5 ). The model relates to six features in total, and a feature-specific convolutional neural network is established by taking the idea of a ResNet structure as reference for each fine-grained feature. Each feature branch convolutional neural network is trained on the basis of the differentiated feature data set and the feature gender label, and each neural network represents the analysis process of the pedestrian image in the corresponding feature. Compared with the traditional convolutional neural network, the ResNet has the advantages that the performance can be kept not to be reduced in long-term training, a redundant layer causing the performance reduction can be automatically judged, and the layer can be skipped to transmit relatively complete image analysis information, so that the process of adjusting the network training parameters is greatly simplified, and the reduction of the network performance is not worried about due to the fact that the parameters are not adjusted properly. The branch characteristic convolution neural network trained by the method can analyze the gender of the pedestrian at different characteristic angles, and can still keep better robustness in the occasions of characteristic loss or blurring.

In order to integrate the sex analysis results of the characteristic convolutional neural networks of the branches and objectively consider the importance of the characteristic information on sex judgment while integrating the information, the D-MFCNN model finally introduces a full connection layer at the output layer of each branch network. The full connection layer takes the sex label of the characteristic image and the general label of the pedestrian image as input and output data for training, and the trained full connection layer can spontaneously measure the value of each characteristic information generated aiming at the sex recognition problem, so that sex analysis information from different characteristics is integrated to obtain a final sex recognition result. Similarly, the gender identification result is represented in the form of a two-dimensional vector, and the value of each dimension is between 0 and 1, which represents the probability that the gender of the pedestrian is classified as male or female.

It is understood that the D-MFCNN model, while taking advantage of existing advanced technologies, is original in its overall network framework. Before an image to be recognized is input into a D-MFCNN model, firstly, feature alignment and segmentation are carried out through an AAM model, then, the image segmented according to features is input into a convolutional neural network under a corresponding feature branch in the D-MFCNN model, finally, a unique full-connection layer in the D-MFCNN model integrates two-dimensional vector output results of each network, a plurality of classification vectors are synthesized into one classification vector, and a pedestrian gender recognition result generated after a plurality of feature information is considered is represented.

In the model testing process, the new image is automatically marked and cut by fine-grained features through the established AAM model, and then the segmented feature image is input into the convolutional neural network corresponding to different features. Each small convolutional network performs gender analysis on fine-grained features, so that a plurality of gender classification results are obtained on the dimensionality of the features. And finally, the full-connection layer measures the importance degree of each feature, integrates the gender classification results to obtain a final gender classification result, and the result is expressed as a two-dimensional vector, so that the probability that the gender of the pedestrian is male or female in the original pedestrian image can be intuitively reflected. Compared with a method for obtaining a gender classification result by only face analysis, the D-MFCNN provided by the invention considers the situation of multiple features, has higher accuracy and precision of gender identification, can accurately analyze the gender of the pedestrian from multiple angles even under complex scenes and different pedestrian postures, and can still effectively judge the gender of the pedestrian under the condition that individual features are fuzzy or missing.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A pedestrian gender identification method based on wearing characteristics and human body characteristics under a community monitoring scene is characterized by comprising the following steps: the method comprises the following steps:

step 1: performing primary extraction on a pedestrian image data source from a monitoring video, wherein each extracted image only needs to contain a single pedestrian;

step 2: calibrating the fine-grained characteristics of each pedestrian image in the training set, wherein each image is correctly calibrated according to a gender label; establishing a characteristic alignment model aiming at the calibrated fine-grained characteristic reference AAM model idea so as to automatically label and cut an image to be detected;

and step 3: performing data preprocessing again on the training set which is well marked in the step 2, cutting the training set into a plurality of small training sets according to the characteristics, and marking the sex labels again for each characteristic image in the small training sets;

and 5: establishing a full connection layer, wherein the full connection layer takes labels calibrated aiming at fine-grained features in each training set as input and takes labels of an original image corresponding to the features as output for training, and finally, correct full connection layer representation is obtained; the full connection layer can integrate the gender classification results of all the characteristics to output a final recognition result;

step 6: integrating the small convolutional neural networks trained in the step 4 with the fully-connected layers trained in the step 5 to form a distributed multi-layer feature-fused convolutional neural network, namely D-MFCNN;

2. The pedestrian gender identification method based on wearing characteristics and human body characteristics under the community monitoring scene according to claim 1, characterized in that: in the step 1, the method also comprises the steps of carrying out multi-step processing on a data source from the monitoring video, and preparing for establishing an AAM characteristic point alignment model and establishing a convolutional neural network under a fine-grained characteristic branch in the D-MFCNN; the AAM model needs to be established on the basis of feature points calibrated in an original pedestrian image, and training of each branch convolutional neural network in the D-MFCNN needs each feature image and a local gender label corresponding to the feature; meanwhile, the sex label of the characteristic image and the overall label of the pedestrian image are respectively used as the input and the output of the full connection layer in the D-MFCNN so as to correct the weight of the layer, the corrected full connection layer can accurately measure the importance degree of the characteristic, and the sex recognition result can be obtained in a more objective angle when the analysis information of each branched neural network is integrated; in order to improve the training efficiency, the above iterative process of the network structure is usually executed concurrently.

3. The pedestrian gender identification method based on wearing characteristics and human body characteristics under the community monitoring scene according to claim 1, characterized in that: in the step 2, the AAM model considers the shape and texture characteristics in the aspect of characteristic point labeling, and can more accurately describe the contour of the key characteristic compared with the prior automatic labeling model; the role of the AAM model is mainly to accurately locate and extract key features in the new image, and the extracted features will be used as input for the D-MFCNN.

4. The pedestrian gender identification method based on wearing characteristics and human body characteristics under the community monitoring scene according to claim 1, characterized in that: in step 3, labeling the sex label of the total pedestrian image data set and the differentiated local feature-related feature image data set.

5. The pedestrian gender identification method based on wearing characteristics and human body characteristics under the community monitoring scene according to claim 1, characterized in that: in the step 4, establishing a feature-specific convolutional neural network by using the idea of a ResNet structure for each fine-grained feature; each feature branch convolutional neural network is trained on the basis of the differentiated feature data set and the feature gender label, and each neural network represents the analysis process of the pedestrian image in the corresponding feature.

6. The pedestrian gender identification method based on wearing characteristics and human body characteristics under the community monitoring scene according to claim 1, characterized in that: in the step 6, in order to integrate the gender analysis results of the characteristic convolutional neural networks of the branches and objectively consider the importance of the characteristic information on gender judgment while integrating the information, the D-MFCNN model introduces a full connection layer at the end of the output layer of each branch network; the full connection layer takes the sex label of the characteristic image and the general label of the pedestrian image as input and output data for training, and the value of each characteristic information generated aiming at the sex recognition problem can be measured spontaneously by the full connection layer obtained through training, so that sex analysis information from different characteristics is integrated to obtain a final sex recognition result; similarly, the gender identification result is represented in the form of a two-dimensional vector, and the value of each dimension is between 0 and 1, which represents the probability that the gender of the pedestrian is classified as male or female.

7. The pedestrian gender identification method based on wearing characteristics and human body characteristics under the community monitoring scene according to claim 1, characterized in that: in the step 7, before the image to be recognized is input into the D-MFCNN model, firstly, feature alignment and segmentation are performed through the AAM model, then, the image segmented according to features is input into the convolutional neural network under the corresponding feature branch in the D-MFCNN model, finally, the unique full-connection layer in the D-MFCNN model integrates the two-dimensional vector output results of each network, and a classification vector is synthesized by a plurality of classification vectors and represents the pedestrian gender recognition result generated after a plurality of feature information is considered.