CN111507272B

CN111507272B - Pedestrian attribute identification method and system in monitoring scene

Info

Publication number: CN111507272B
Application number: CN202010310527.XA
Authority: CN
Inventors: 黄凯奇; 陈晓棠; 贾健
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2020-04-20
Filing date: 2020-04-20
Publication date: 2023-09-26
Anticipated expiration: 2040-04-20
Also published as: CN111507272A

Abstract

The invention relates to a pedestrian attribute identification method and a system in a monitoring scene, wherein the attribute identification method comprises the following steps: acquiring a pedestrian image to be detected in a monitoring scene; preprocessing the pedestrian image to be detected to obtain a processed image; obtaining the convolution image characteristics of the pedestrian image to be detected through the deep neural network; determining weight parameters of all attribute classifiers according to the full connection layer and the convolution image characteristics; based on the convolution image characteristics and weight parameters, determining network attribute values of the pedestrian image to be detected under different attribute classifiers; determining a predicted value of the corresponding attribute based on each network attribute value; and determining the attribute type of the pedestrian image to be detected according to each predicted value. The invention extracts the convolution image characteristics of the pedestrian image to be detected through the deep neural network, and determines the weight parameters of each attribute classifier; and obtaining network attribute values under different attribute classifiers, and further obtaining predicted values of corresponding attributes so as to accurately determine the attribute types of the pedestrian images to be detected.

Description

Pedestrian attribute identification method and system in monitoring scene

Technical Field

The invention relates to the technical field of visual scene processing analysis, in particular to a pedestrian attribute identification method and system in a monitoring scene.

Background

In recent years, fields such as computer vision, artificial intelligence, machine perception and the like are rapidly developed. With the widespread deployment of cameras, attention is paid to how to perform efficient pedestrian attribute recognition in a monitored scene.

The pedestrian attribute identification in the monitoring scene is to process and analyze the pedestrian pictures in the video by using a computer algorithm, and automatically obtain attribute categories such as age, gender, knapsack, clothing and the like contained in a certain pedestrian. Thereby providing support and assistance for downstream pedestrian picture retrieval and pedestrian re-recognition techniques.

The traditional method obtains the feature expression of the pedestrian picture by constructing the manually designed picture features, and the performance of the feature expression is insufficient to meet the application requirements in actual scenes. In recent years, with the wide use of deep learning, many pedestrian attribute algorithms start from two aspects of better feature expression and attribute relationship modeling, so that a pedestrian attribute identification method in a monitoring scene is continuously improved, and the development of the pedestrian attribute identification field is promoted.

Although a great deal of work before leads the performance of pedestrian attribute identification to be obviously improved by learning the visual characteristic expression with more discrimination capability and the relation between better modeling attributes, each method inevitably increases the parameter quantity and the calculation complexity of the model and increases the difficulty of pedestrian attribute identification.

Disclosure of Invention

In order to solve the above problems in the prior art, that is, to improve pedestrian attribute identification, the invention aims to provide a method and a system for identifying pedestrian attribute in a monitoring scene.

In order to solve the technical problems, the invention provides the following scheme:

a pedestrian attribute identification method in a monitored scene, the attribute identification method comprising:

acquiring a pedestrian image to be detected in a monitoring scene;

preprocessing the pedestrian image to be detected to obtain a processed image;

obtaining the convolution image characteristics of the pedestrian image to be detected according to the processing image through a deep neural network;

determining weight parameters of all attribute classifiers according to the full connection layer and the convolution image characteristics;

based on the convolution image characteristics and weight parameters, determining network attribute values of the pedestrian image to be detected under different attribute classifiers;

determining a predicted value of a corresponding attribute based on the network attribute value of each pedestrian image to be detected;

and determining the attribute type of the pedestrian image to be detected according to the predicted value of each attribute.

Optionally, the preprocessing the pedestrian image to be detected to obtain a processed image specifically includes:

scaling the pedestrian image to be detected to obtain a scaled image;

randomly and horizontally overturning the scaled image to obtain an overturning image;

and filling the overturn image with zero padding to obtain a preprocessed image.

Optionally, the convolution image feature X of the pedestrian image to be detected is obtained according to the following formula _img ：

X _img ＝f _cnn (I _img ；θ _cnn )；

wherein , c is real space where pedestrian features are located _feat For the number of layers of the convolution feature, i.e. the dimension of the pedestrian feature space, f _cnn For deep neural network, I _img To process an image, I _img ∈R ^H×W×C H is the height of the convolution feature map, W is the width of the convolution feature map, C is the number of layers of the processed image input by the deep neural network, and theta _cnn Is a learnable parameter of the deep neural network.

Optionally, determining the weight parameter of each attribute classifier according to the full connection layer and the convolution image feature specifically includes:

according to the full connection layer and the convolution image characteristics, an attribute classifier Cls (X _img ；θ _cls)； wherein ,X_img For convolving image features, θ _cls As a weight parameter of the attribute classifier, c is real space where the weight parameters of the attribute classifier are located _feat Number of layers, N, being convolution characteristics _attr Marking the number of attributes in a database for pre-storing;

determining weight parameters of each attribute classifier based on the attribute classifier identified by the pedestrian image attribute wherein ,/>i is the serial number of the current attribute classifier, the number of the attribute classifiers is consistent with the attribute category, i=1, 2, …, N _attr 。

Optionally, the determining, based on the features of the convolution image and the weight parameters, a network attribute value of the pedestrian image to be detected under different attribute classifiers specifically includes:

respectively carrying out normalization processing on the convolution image characteristics and each weight parameter to obtain corresponding normalization characteristics and each normalization weight parameter;

and determining network attribute values of the pedestrian images to be detected under different attribute classifiers according to the normalization characteristics and the normalization weight parameters.

Optionally, the network attribute values of the pedestrian image to be measured under different attribute classifiers are determined according to the following formula:

wherein ,N_attr For pre-storingThe number of attributes in the database, i is the sequence number of the current attribute classifier,for the network attribute value of the pedestrian image to be detected under the ith attribute classifier, alpha is a scaling factor, < ->For normalizing the characteristics->Normalized weight parameters for i attribute classifiers.

Optionally, the predicted value of the corresponding attribute is determined according to the following formula:

wherein ,N_attr For the number of tag attributes pre-stored in the database, i is the sequence number of the current attribute classifier,for the predicted value of the ith attribute of the pedestrian image to be detected, < >>And (3) network attribute values of the pedestrian images to be detected under the ith attribute classifier, BN (level) is a batch normalization layer processing function, and Sigmoid (level) is a neural network activation function.

In order to solve the technical problems, the invention also provides the following scheme:

a pedestrian attribute identification system in a monitoring scene, the attribute identification system comprising:

the acquisition unit is used for acquiring the pedestrian image to be detected in the monitoring scene;

the preprocessing unit is used for preprocessing the pedestrian image to be detected to obtain a processed image;

the characteristic determining unit is used for obtaining the convolution image characteristics of the pedestrian image to be detected according to the processing image through the deep neural network;

the parameter determining unit is used for determining weight parameters of each attribute classifier according to the full connection layer and the convolution image characteristics;

the computing unit is used for determining network attribute values of the pedestrian image to be detected under different attribute classifiers based on the convolution image characteristics and the weight parameters;

the prediction unit is used for determining a prediction value of a corresponding attribute based on the network attribute value of each pedestrian image to be detected;

and the attribute determining unit is used for determining the attribute type of the pedestrian image to be detected according to the predicted value of each attribute.

a pedestrian attribute identification system in a monitoring scene, comprising:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to:

acquiring a pedestrian image to be detected in a monitoring scene;

preprocessing the pedestrian image to be detected to obtain a processed image;

a computer-readable storage medium storing one or more programs that, when executed by an electronic device comprising a plurality of application programs, cause the electronic device to:

acquiring a pedestrian image to be detected in a monitoring scene;

preprocessing the pedestrian image to be detected to obtain a processed image;

According to the embodiment of the invention, the following technical effects are disclosed:

according to the invention, through the deep neural network, the convolution image characteristics of the pedestrian image to be detected are extracted from the preprocessed processed image, and the weight parameters of each attribute classifier are further obtained; based on the convolution image characteristics and the weight parameters, network attribute values of the pedestrian images to be detected under different attribute classifiers are determined, and further predicted values of corresponding attributes are obtained, so that attribute types of the pedestrian images to be detected can be accurately determined.

Drawings

FIG. 1 is a flow chart of a pedestrian attribute identification method in a monitoring scenario of the present invention;

fig. 2 is a schematic block diagram of a pedestrian attribute recognition system in a monitoring scene according to the present invention.

Symbol description:

the device comprises an acquisition unit-1, a preprocessing unit-2, a characteristic determining unit-3, a parameter determining unit-4, a calculating unit-5, a predicting unit-6 and an attribute determining unit-7.

Detailed Description

Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are merely for explaining the technical principles of the present invention, and are not intended to limit the scope of the present invention.

The invention aims to provide a pedestrian attribute identification method in a monitoring scene, which is characterized in that the characteristic of a convolution image of a pedestrian image to be detected is extracted from a preprocessed processing image through a deep neural network, and the weight parameters of each attribute classifier are further obtained; based on the convolution image characteristics and the weight parameters, determining network attribute values of the pedestrian image to be detected under different attribute classifiers, and further obtaining predicted values of corresponding attributes, so that the attribute type of the pedestrian image to be detected can be accurately determined.

In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.

As shown in fig. 1, the pedestrian attribute identification method in the monitoring scene of the present invention includes:

step 100: acquiring a pedestrian image to be detected in a monitoring scene;

step 200: preprocessing the pedestrian image to be detected to obtain a processed image;

step 300: obtaining the convolution image characteristics of the pedestrian image to be detected according to the processing image through a deep neural network;

step 400: determining weight parameters of all attribute classifiers according to the full connection layer and the convolution image characteristics;

step 500: based on the convolution image characteristics and weight parameters, determining network attribute values of the pedestrian image to be detected under different attribute classifiers;

step 600: determining a predicted value of a corresponding attribute based on the network attribute value of each pedestrian image to be detected;

step 700: and determining the attribute type of the pedestrian image to be detected according to the predicted value of each attribute.

Further, in step 200, the preprocessing is performed on the pedestrian image to be detected to obtain a processed image, which specifically includes:

step 201: and scaling the pedestrian image to be detected to obtain a scaled image. For example, by zooming, the pedestrian image I to be detected _pedes The aspect ratio of (2) is 0.75, but is not limited thereto.

Step 202: and carrying out random horizontal overturn on the scaled image to obtain an overturn image.

Optionally, in step 300, a convolution image feature X of the pedestrian image to be detected is obtained according to the following formula _img ：

X _img ＝f _cnn (I _img ；θ _cnn )；

wherein , c is real space where pedestrian features are located _feat Layer number f being convolution characteristic _cnn For deep neural network, I _img To process an image, I _img ∈R ^H×W×C H is the height of the convolution feature map, W is the width of the convolution feature map, C is the number of layers of the processed image input by the deep neural network, and theta _cnn Is a learnable parameter of the deep neural network.

Preferably, in step 400, the determining a weight parameter of each attribute classifier according to the full connection layer and the convolution image feature specifically includes:

step 401: according to the full connection layer and the convolution image characteristics, establishing attribute scores for pedestrian image attribute identificationClass Cls (X) _img ；θ _cls )。

wherein ,X_img For convolving image features, θ _cls As a weight parameter of the attribute classifier, c is real space where the weight parameters of the attribute classifier are located _feat Number of layers, N, being convolution characteristics _attr The number of attributes is marked for pre-storing in the database.

Step 402: determining weight parameters of each attribute classifier based on the attribute classifier identified by the pedestrian image attribute

wherein ,i is the sequence number of the current attribute classifier, the number of attribute classifiers is consistent with the attribute class, i=1, 2, & gt, N _attr 。

Further, in step 500, the determining, based on the features of the convolution image and the weight parameters, the network attribute values of the pedestrian image to be detected under different attribute classifiers specifically includes:

step 501: and respectively carrying out normalization processing on the convolution image characteristics and the weight parameters to obtain corresponding normalization characteristics and the normalization weight parameters.

wherein ,for normalizing the characteristics->Normalized weight parameters for i attribute classifiers, the expression. And (5) calculating an absolute value.

After normalization processing, the modulo length of the normalized features and each normalized weight parameter is 1.

Step 502: and determining network attribute values of the pedestrian images to be detected under different attribute classifiers according to the normalization characteristics and the normalization weight parameters.

Determining network attribute values of the pedestrian images to be detected under different attribute classifiers according to the following formulas:

wherein ,N_attr For the number of tag attributes pre-stored in the database, i is the sequence number of the current attribute classifier,for the network attribute value of the pedestrian image to be detected under the ith attribute classifier, alpha is a scaling factor, < ->For normalizing the characteristics->Normalized weight parameters for i attribute classifiers.

The invention can balance the loss distribution of positive and negative samples of pedestrian attributes by introducing and introducing the scaling factors. In this embodiment, α has a value of 20. However, the present invention is not limited thereto, and may be adjusted according to actual needs.

The predicted value of a certain attribute obtained by the method can effectively improve the performance of the attribute, and the performance condition of all the attributes is obtained by averaging the predicted values of all the attributes.

Specifically, the prediction threshold is an average value of the prediction values of all the attributes, the prediction value of each attribute is compared with the prediction threshold, and the attribute corresponding to the prediction value larger than the prediction threshold is selected as the attribute type of the pedestrian image to be detected. Among them, attribute types are classified into age, sex, backpack, clothing, etc.

The pedestrian attribute identification in the monitoring scene is realized by a simple and efficient method, so that the pedestrian attribute identification is more suitable for deployment in hardware facilities in the monitoring scene; 2) The invention provides a method for solving the problem that the weight of the pedestrian attribute classifier depends on the prior distribution of pedestrian attributes in a scene by carrying out normalization on the pedestrian attribute picture characteristics and the weights of different attribute classifiers and then carrying out calculation; 3) The invention further solves the problem of unbalanced distribution of positive and negative samples by introducing the scaling factors, so that the network model is easier to optimize, and the performance of the network model is further improved.

In particular, the present invention has several distinct advantages over the prior art:

1) The calculated amount and model parameter of all the current algorithms are significantly higher than those of the invention, and the invention realizes the pedestrian attribute identification performance equivalent to the current optimal method under the condition of using only 63.18% parameter and 46.18% calculated amount.

2) The pedestrian attribute prediction method solves the problem of unbalanced positive and negative samples in pedestrian attribute recognition, and the pedestrian attribute prediction performance is independent of the distribution priori of the pedestrian attribute in the scene by normalizing the pedestrian picture characteristics and the weight of each attribute classifier.

3) The more efficient algorithm and lighter weight model enable the algorithm to be better applied to hardware facilities in a monitoring scenario than other algorithms.

In addition, the invention also provides a pedestrian attribute identification system in the monitoring scene, which can improve pedestrian attribute identification.

As shown in fig. 2, the pedestrian attribute identification system in the monitoring scene of the present invention includes an acquisition unit 1, a preprocessing unit 2, a feature determination unit 3, a parameter determination unit 4, a calculation unit 5, a prediction unit 6, and an attribute determination unit 7.

The acquisition unit 1 is used for acquiring an image of a pedestrian to be detected in a monitoring scene; the preprocessing unit 2 is used for preprocessing the pedestrian image to be detected to obtain a processed image; the feature determining unit 3 is used for obtaining the convolution image features of the pedestrian image to be detected according to the processing image through the deep neural network; the parameter determining unit 4 is used for determining weight parameters of each attribute classifier according to the full connection layer and the convolution image characteristics; the computing unit 5 is used for determining network attribute values of the pedestrian image to be detected under different attribute classifiers based on the convolution image characteristics and weight parameters; the prediction unit 6 is configured to determine a predicted value of a corresponding attribute based on the network attribute values of the pedestrian images to be detected; the attribute determining unit 7 is configured to determine an attribute type of the pedestrian image to be detected according to the predicted value of each attribute.

The invention further provides the following scheme:

a pedestrian attribute identification system in a monitoring scene, comprising:

a processor; and

acquiring a pedestrian image to be detected in a monitoring scene;

preprocessing the pedestrian image to be detected to obtain a processed image;

The invention further provides the following scheme:

acquiring a pedestrian image to be detected in a monitoring scene;

preprocessing the pedestrian image to be detected to obtain a processed image;

Compared with the prior art, the computer readable storage medium and the pedestrian attribute identification system in the monitoring scene have the same beneficial effects as the pedestrian attribute identification method in the monitoring scene, and are not repeated herein.

Compared with the prior art, the image retrieval system and the computer readable storage medium have the same beneficial effects as the image retrieval method, and are not repeated here.

Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will fall within the scope of the present invention.

Claims

1. The pedestrian attribute identification method in the monitoring scene is characterized by comprising the following steps of:

acquiring a pedestrian image to be detected in a monitoring scene;

preprocessing the pedestrian image to be detected to obtain a processed image;

determining weight parameters of all attribute classifiers according to the full connection layer and the convolution image characteristics; the method specifically comprises the following steps:

according to the full connection layer and the convolution image characteristics, an attribute classifier Cls (X _img ；θ _cls)； wherein ,X_img For convolving image features, θ _cls As a weight parameter of the attribute classifier,c is real space where the weight parameters of the attribute classifier are located _feat Layer number for convolution feature，N _attr Marking the number of attributes in a database for pre-storing;

determining weight parameters of each attribute classifier based on the attribute classifier identified by the pedestrian image attribute wherein ,i is the sequence number of the current attribute classifier, the number of attribute classifiers is consistent with the attribute class, i=1, 2, & gt, N _attr ；

2. The method for identifying pedestrian attributes in a monitored scene according to claim 1, wherein the preprocessing the pedestrian image to be detected to obtain a processed image specifically comprises:

scaling the pedestrian image to be detected to obtain a scaled image;

3. The method for identifying pedestrian attributes in a monitored scene as defined in claim 1, wherein the convolved image feature X of the pedestrian image to be detected is obtained according to the following formula _img ：

X _img ＝f _cnn (I _img ；θ _cnn )；

4. The method for identifying pedestrian attributes in a monitored scene according to claim 1, wherein the determining network attribute values of the pedestrian images to be detected under different attribute classifiers based on the convolution image features and weight parameters specifically comprises:

5. The method for identifying pedestrian attributes in a monitored scene as defined in claim 4, wherein network attribute values of the pedestrian image under test under different attribute classifiers are determined according to the following formula:

wherein ,N_attr For the number of tag attributes pre-stored in the database, i is the sequence number of the current attribute classifier,for the network attribute value of the pedestrian image to be detected under the first attribute classifier, alpha is a scaling factor, < ->For normalizing the characteristics->Normalized weight parameters for i attribute classifiers.

6. The method of claim 1, wherein the predicted value of the corresponding attribute is determined according to the following formula:

wherein ,N_attr For the number of tag attributes pre-stored in the database, i is the sequence number of the current attribute classifier,for the predicted value of the ith attribute of the pedestrian image to be detected, < >>And (3) network attribute values of the pedestrian images to be detected under the first attribute classifier are obtained, BN (level) is a batch normalization layer processing function, and Sigmoid (level) is a neural network activation function.

7. A pedestrian attribute identification system in a monitoring scene, the attribute identification system comprising:

the parameter determining unit is used for determining weight parameters of each attribute classifier according to the full connection layer and the convolution image characteristics; the method specifically comprises the following steps:

8. A pedestrian attribute identification system in a monitoring scene, comprising:

a processor; and

acquiring a pedestrian image to be detected in a monitoring scene;

preprocessing the pedestrian image to be detected to obtain a processed image;

according to the full connection layer and the convolution image characteristics, an attribute classifier Cls (X _img ；θ _cls)； wherein ,X_img For convolving image features, θ _cls As a weight parameter of the attribute classifier,c is real space where the weight parameters of the attribute classifier are located _feat Number of layers, N, being convolution characteristics _attr Marking the number of attributes in a database for pre-storing;

9. A computer-readable storage medium storing one or more programs that, when executed by an electronic device comprising a plurality of application programs, cause the electronic device to:

acquiring a pedestrian image to be detected in a monitoring scene;

preprocessing the pedestrian image to be detected to obtain a processed image;