CN113963231A

CN113963231A - Pedestrian attribute identification method based on image enhancement and sample balance optimization

Info

Publication number: CN113963231A
Application number: CN202111203433.3A
Authority: CN
Inventors: 韦学艳; 吴杰; 李阳
Original assignee: China University of Petroleum East China
Current assignee: China University of Petroleum East China
Priority date: 2021-10-15
Filing date: 2021-10-15
Publication date: 2022-01-21

Abstract

The invention discloses a new deep learning network-an image enhancement and sample balance optimization model IEBO (image enhancement and sample balance optimization). The model eliminates useless background information while highlighting the core features of the pedestrians through a color enhancement and noise suppression method for extracting the pedestrian body area, and prevents the interference of the model on attribute identification. In addition, the model is optimized aiming at the unbalanced attribute of the sample through weight adjustment, and the recognition capability of the unbalanced attribute is improved. Experiments finally show that the new pedestrian attribute identification model achieves better performance in the Market-1501-attribute data set.

Description

Pedestrian attribute identification method based on image enhancement and sample balance optimization

1. Field of the invention

The invention belongs to a pedestrian attribute identification method, and relates to the technical field of computer vision.

2. Background of the invention

The goal of pedestrian attribute recognition (pedestrian attribute recognition) is to mine the attribute features of pedestrians in a given image, further classify the attributes, and finally predict the attribute labels of the pedestrians in a test set, such as gender, age, clothes color, clothes style, and the like. Attribute recognition has been a popular field in computer vision, and has been widely applied in many directions, for example, face recognition technology based on attribute recognition is well developed and applied in various fields in daily life. With the popularization of monitoring equipment in the world in recent decades, the pedestrian attribute identification task under the monitoring scene is more and more emphasized, but due to the difference of images shot by the monitoring equipment and common equipment, the following problems often exist: (1) the image size is small, and the resolution is low; (2) the pedestrian image is a snapshot image, and the conditions of posture change, shielding, blurring and the like may exist; (3) the camera is a fixed camera and is influenced by external factors such as weather, light, angles and the like; (4) data sets are often acquired from a region over a period of time, and often have severe specificity.

After the occurrence of deep learning, a large number of pedestrian attribute identification models contribute to the above problem. In consideration of the problems of shielding, low resolution and the like, Fabbri and the like propose a model GAM based on a confrontation generation network, and the generation network is utilized to generate and supplement a shielding picture so as to expand the characteristic information of pedestrians; li and the like consider the correlation among attributes, and provide a model VSGR, namely a graph-based reasoning framework is used for modeling the spatial relation and semantic relation from region to region, attribute to attribute, region to attribute, and finally attribute prediction is carried out, so that the attribute identification under the conditions of shielding and low resolution is effectively improved. However, most of the methods are improved aiming at image feature extraction modes, and problems of data sets such as unobtrusive pedestrian subjects, serious background noise interference, and unbalanced distribution of positive and negative samples caused by seasonality and customs in real life are ignored.

From the data set perspective, although the images are single-line human images, due to the randomness of monitoring shooting, the resolution of the images is generally low, and many useless background noise information exists, and the characteristics generally have adverse effects on attribute identification. In addition, with the advent of cross-task models, many other task datasets attempt to be labeled with attribute information for application to pedestrian attribute recognition, such as the Market-1501-attribute dataset, which is improved based on the traditional pedestrian re-recognition dataset, combining the attribute information with pedestrian identity number information for application to both pedestrian attribute recognition and pedestrian re-recognition tasks. However, since the data set is not proposed to solve the task based on attribute identification, the data set usually contains a serious problem of unbalance of positive and negative samples.

In the invention, an image enhancement and sample balance optimization model IEBO (image enhancement and sample balance optimization) is provided, and the aims of enhancing the attribute recognition performance are finally achieved by considering that before image feature extraction, color enhancement is firstly carried out, the interference of background noise on pedestrian information is reduced by extracting a pedestrian main body region, then pedestrian main body feature information is extracted through a deep neural network, and then attribute weight balance optimization is carried out on unbalanced attributes based on identity information.

3. Summary of the invention

The invention aims to solve the problem that in pedestrian attribute identification, the identification rate is low due to low image resolution and sample imbalance.

The technical scheme adopted by the invention for solving the technical problems is as follows:

s1, an image enhancement method based on color enhancement and noise suppression is provided, the main body characteristics of the pedestrian are highlighted, the whole image of the pedestrian is mined, and background noise information is suppressed to prevent the background noise information from interfering attribute identification;

s2, providing a pedestrian attribute sample balance optimization algorithm based on identity information fusion, adjusting the defect of unbalance of positive and negative samples of the cross-task data set, and improving the recognition capability of the model;

s3, obtaining better results in the sample imbalance property of the cross-task Market-1501-attribute data set.

After the image is input into the network, the pixel values are firstly mapped to the [0,1] range, then the RGB channels of the color image are separately processed, then the image is adjusted to obtain the global white balance, F (X) is stretched to the [0,1] range through the maximum and minimum normalization, and then the RGB pixel values are restored:

after image color enhancement, segmenting a pedestrian background by using a deplabv 3 model based on Resnet101, wherein a conventional Resnet101 network structure is adopted by a convolution module in front, then performing four-hole convolution with rate of 2 in a layer, performing pyramid pooling ASPP model, performing parallel processing of four-hole convolution with rate of 1, 6, 12 and 18 and global pooling on features, connecting parallel feature maps, inputting the parallel feature maps into a conventional convolutional neural network, and performing up-sampling to restore the features to an original image.

Since the data set is not created with the attribute identification task, the data set has a problem of unbalance of positive and negative samples in terms of attributes. Therefore, we need to perform weight optimization processing for positive and negative sample imbalance attributes.

Wherein N belongs to 1, 2, …, N, N represents the number of images in training set, L belongs to 1, 2, …, L, L represents the number of attributes, x_nlA predicted value, y, representing the l-th attribute of the n-th image_nlRepresenting true value, L_MRepresenting reasonable proportion of each attribute sample:

wherein I ∈ 1, 2, …, I indicates the number of attribute types, for example, four-classification and eight-classification in common binary classification and multi-classification, and K ∈ 1, 2, …, K indicates the number of attribute values in various classification attributes. The specific values are as follows:

traditional attribute identification may cause the attribute recognition rate of some images to be lower because of light, shelter from, resolution ratio low grade factor, but to the pedestrian attribute data set based on control, the pedestrian image is the different frames that are intercepted from the different sequence fragments of different surveillance videos, every group sequence fragment, it is the same to all have pedestrian's identity information, pedestrian's attribute information is the same, the characteristics that pedestrian's image is inequality, consequently, utilize pedestrian's identity information to assist sample balance optimization can further improve pedestrian's attribute recognition ability:

W₂＝λ*exp(-λ*epoch) (8)

Loss＝Loss_F+Loss_I (9)

wherein I belongs to 1, 2, …, I and I represent the number of the pedestrians in the training set, and x_niPredicted value, y, representing the ith pedestrian identity of the nth image_nAnd (3) representing the true value of the pedestrian identity of the nth image, wherein lambda is 0.3.

The pedestrian attribute identification network based on image enhancement and sample balance optimization provided by the invention comprises an image enhancement module (IE) based on color enhancement and noise suppression and a pedestrian attribute sample balance optimization module (SBO) based on identity information fusion.

Finally, the training method of the pedestrian attribute identification network based on image enhancement and sample balance optimization comprises the following steps:

all models are pre-trained through cross entropy loss, for each image, the color contrast of the image is improved through image enhancement to better obtain the global features of the pedestrian, semantic segmentation is carried out through a depllabv 3 model based on a Resnet101 network, the pedestrian background area is vertically arranged to be 0, background noise information is inhibited, and the interference of the background noise information on attribute recognition is prevented. Pedestrian feature extraction is then performed through a conventional Resnet50 network, while sample imbalance attributes are optimized and assisted with pedestrian identity information.

Compared with the prior art, the invention has the beneficial effects that:

1. the invention provides an image enhancement module based on color enhancement and noise suppression to perform optimization processing on a low-resolution image, highlight important characteristic information and suppress useless noise information, and is beneficial to extraction of global features of pedestrians.

2. The invention provides a pedestrian attribute sample balance optimization algorithm based on identity information fusion, which is used for adjusting the unbalanced attribute of the sample, and performing more training on the attribute with larger sample number difference, thereby being beneficial to the identification of the pedestrian attribute.

4. Description of the drawings

Fig. 1 is a schematic diagram of an IEBO optimization model based on image enhancement and sample balance.

FIG. 2 is a schematic diagram of an image enhancement module based on color enhancement and noise suppression.

Fig. 3 is a schematic diagram of a pedestrian attribute sample balance optimization module based on identity information fusion.

5. Detailed description of the preferred embodiments

The drawings are for illustrative purposes only and are not to be construed as limiting the patent.

The invention is further illustrated below with reference to the figures and examples.

Fig. 1 is a schematic structural diagram of a pedestrian attribute identification network based on image enhancement and sample balance optimization. As shown in fig. 1, after an image is input into a network, color enhancement is performed first, interference of background noise on pedestrian information is reduced by means of extracting a pedestrian body region, core features of pedestrians are extracted through a deep neural network, and finally, sample balance optimization is performed on the basis of identity information and aiming at an imbalance attribute.

Fig. 2 is a schematic diagram of an image enhancement module. As shown in fig. 2, after the image enhancement module inputs the image of the pedestrian, the image is first processed through color enhancement, so as to make the main features of the pedestrian more prominent and the colors more saturated, which is beneficial to the extraction of the characteristic information of the pedestrian.

The first stage adapts to the local image contrast, firstly maps the pixel value to the range of [0,1], then separately processes the RGB channels of the color image, and calculates the F (X) value of each pixel point. The formula is as follows:

where Ω represents the neighborhood of the current pixel and its eight neighboring pixels, x₀The other pixel points except the current pixel in the neighborhood range are expressed and are equivalent to x ^ x₀＝Ω，||x-x₀I (x) represents the Euclidean distance between two pixels, I (x) -I (x)₀) And the difference between two pixel values is represented, G (t) min { max { α t, -1},1}, and is a gradient function, wherein α is an enhancement factor, the greater the value is, the more obvious the detail enhancement is, and the default α is greater than or equal to 1.

The second stage adjusts the image to obtain a global white balance, first stretching f (x) to the [0,1] range by max-min normalization and then restoring to RGB pixel values. The formula is as follows:

where minF is the minimum of the F (x) function and maxF is the maximum of the F (x) function.

The color saturation of the pedestrian image can be improved through color enhancement, which is beneficial to more fully extracting the pedestrian characteristic information, but the image still contains some useless background information which possibly causes interference to the characteristic extraction. Therefore, it is necessary to perform noise suppression processing on the image.

After image color enhancement, segmenting a pedestrian background by using a deplabv 3 model based on Resnet101, processing pictures by cascading and parallel cavity convolution, wherein the convolution module adopts a traditional Resnet101 network structure, performing cavity convolution with a rate of 2 in a layer, performing pyramid pooling ASPP model, and performing parallel processing with a rate of 1, 6, 12 and 18 and a global pooling on features, wherein the rate of 1 is common convolution, connecting parallel feature maps, inputting the connected feature maps into a traditional convolutional neural network, and performing up-sampling to restore the original images.

While the pedestrian segmentation is performed, the background region other than the detected example frame is set to 0. By means of setting the background to be 0, useless information is removed, and interference of background noise to main body characteristics of pedestrians in the characteristic extraction process can be effectively prevented.

FIG. 3 is a schematic diagram of a sample balance optimization module. As shown in fig. 3, since the data set is not created with the attribute identification task, the data set has a problem that positive and negative samples are unbalanced in terms of the attributes. For example, in the attribute of the jacket type, the number of samples of the positive sample jacket with short sleeves is far greater than that of the negative sample jacket with long sleeves; the negative sample "not red" of the attribute "lower garment color is red" is much larger than the number of samples of the positive sample "red color". Therefore, during training, the model often has the problems of insufficient training, low accuracy and the like caused by unbalance of positive and negative samples. Therefore, we perform weight optimization processing for positive and negative sample imbalance attributes. Because the pedestrian attributes are converted into 30 binary problems in the Mrket-1501-attribute data set, the attribute labels are calculated by using a binary cross entropy loss function, and the formula is as follows:

wherein N belongs to 1, 2, …, N, N represents the number of images in training set, L belongs to 1, 2, …, L, L represents the number of attributes, x_nlA predicted value, y, representing the l-th attribute of the n-th image_nlRepresenting true value, L_MRepresenting the reasonable proportion of each attribute sample, which is as follows:

wherein I ∈ 1, 2, …, I indicates the number of attribute types, for example, four-classification and eight-classification in common binary classification and multi-classification, and K ∈ 1, 2, …, K indicates the number of attribute values in various classification attributes. We calculate the number of positive and negative samples of each attribute in the dataset and the positive sample rational proportion for each type of attribute. The positive sample rational proportion means that a reasonable probability value of the positive sample of the attributes, for example, the attribute of "gender" is a two-classification attribute including "male" and "female", the positive sample rational proportion is 50%, and the attribute of "age" is a multi-classification attribute including four options of "child", "adolescent", "adult" and "elderly", and for the sake of calculation, the multi-classification attribute is divided into four two-classification attributes, i.e., "whether it is child", "whether it is adolescent", "whether it is adult" and whether it is elderly ", so that the positive sample rational proportion is 25%.

The specific values are as follows:

in the training process, the weights of the positive and negative samples which are unbalanced and deviate from the reasonable proportion attribute are improved, more training can be performed by striving for the attribute, and therefore a better recognition effect is achieved.

In addition, the traditional attribute identification may cause the attribute identification rate of some images to be low due to factors such as light, shading and low resolution, but for the pedestrian attribute data set based on monitoring, the pedestrian images are different frames intercepted from different sequence segments of different monitoring videos, and each group of sequence segments have the characteristics of identical pedestrian identity information, identical pedestrian attribute information and different pedestrian images. Therefore, the pedestrian attribute identification capability can be further improved by utilizing the pedestrian identity information to assist the sample balance optimization. The data set contains a plurality of pedestrian identity information, so the identity label adopts a multi-classification cross entropy loss function. The formula is as follows:

W₂＝λ*exp(-λ*epoch) (8)

Loss＝Loss_F+Loss_I (9)

And finally, fusing the sample balance loss and the identity loss to be used as the model total loss. The pedestrian attribute sample balance optimization method based on identity information fusion can make up the defect that positive and negative samples of a cross-task data set are unbalanced, and improves the recognition capability of a model.

The invention provides an image enhancement and sample balance optimization model IEBO, which comprises an image enhancement module (IE) based on color enhancement and noise suppression and a pedestrian attribute sample balance optimization module (SBO) based on identity information fusion. In consideration of the fact that before image feature extraction, color enhancement is firstly carried out, interference of background noise on pedestrian information is reduced in a mode of extracting a pedestrian main body region, then pedestrian main body feature information is extracted through a deep neural network, then attribute weight balance optimization is carried out on unbalanced attributes based on identity information, and finally the purpose of improving attribute identification performance is achieved.

Finally, the details of the above-described examples of the present invention are merely examples for illustrating the present invention, and any modification, improvement, replacement, etc. of the above-described examples should be included in the scope of the claims of the present invention for those skilled in the art.

Claims

1. The pedestrian attribute identification method based on image enhancement and sample balance optimization is characterized by comprising the following steps of:

s1, constructing an image enhancement module based on color enhancement and noise suppression.

And S2, constructing a pedestrian attribute sample balance optimization module based on identity information fusion.

2. The method for identifying pedestrian attributes based on image enhancement and sample balance optimization according to claim 1, wherein the specific process of S1 is as follows:

3. The method for identifying pedestrian attributes based on image enhancement and sample balance optimization according to claim 1, wherein the specific process of S2 is as follows:

W₂＝λ*exp(-λ*epoch) (8)

Loss＝Loss_F+Loss_I (9)