CN107944403B

CN107944403B - Method and device for detecting pedestrian attribute in image

Info

Publication number: CN107944403B
Application number: CN201711230016.1A
Authority: CN
Inventors: 刘浩; 王翔; 王彬; 陈雪梅; 孙英贺
Original assignee: Hisense TransTech Co Ltd
Current assignee: Hisense TransTech Co Ltd
Priority date: 2017-11-29
Filing date: 2017-11-29
Publication date: 2021-03-19
Anticipated expiration: 2037-11-29
Also published as: CN107944403A

Abstract

The embodiment of the application discloses a pedestrian detection method and device in an image, wherein the method comprises the following steps: detecting a pedestrian in an image to be detected to obtain a first pedestrian area comprising the pedestrian; extracting the edges of the pedestrians in the first pedestrian area to obtain the edges of the pedestrians; determining an area including an edge of the pedestrian and smaller than the first pedestrian area as a second pedestrian area; dividing N sub-regions in the second pedestrian region, each sub-region comprising one body sub-region of the pedestrian or comprising an appendage of the pedestrian; and inputting the image to be detected corresponding to the N sub-regions into a convolutional neural network, and outputting the attribute characteristics of the body sub-regions or the attachments in each region.

Description

Method and device for detecting pedestrian attribute in image

Technical Field

The present application relates to the field of machine learning technologies, and in particular, to a method and an apparatus for detecting a pedestrian attribute in an image.

Background

With the development of video monitoring technology, intelligent video monitoring is applied to more and more scenes, such as traffic, markets, hospitals, communities, parks and the like, and the application of the intelligent video monitoring lays a foundation for pedestrian attribute detection through images in various scenes.

The pedestrian attribute identification is a technology for identifying various attributes of pedestrians from videos by inputting videos containing pedestrians to be detected. The technology relates to a plurality of subjects such as computer vision, image processing, pattern recognition, machine learning and the like.

In the traditional pedestrian attribute identification system, the pedestrian attribute can be identified only by using a traditional machine learning method according to the whole pedestrian image of a certain frame in a video. When the positioning of the pedestrian is not accurate, the recognition result is greatly influenced, and the effect achieved by the traditional machine learning method is limited. The city level security video amount is huge, the number of the pedestrian attribute structured recognition models is increased along with the increase of the attribute number, and the calculation amount is huge.

Therefore, how to accurately identify the attributes of the pedestrians and improve the identification efficiency is a problem to be solved urgently at present.

Disclosure of Invention

The embodiment of the application provides a method and a device for detecting pedestrian attributes in an image, which are used for improving the efficiency of detecting the pedestrian attributes.

The embodiment of the application provides a method for detecting the attribute of a pedestrian in an image, which is characterized by comprising the following steps:

detecting a pedestrian in an image to be detected to obtain a first pedestrian area comprising the pedestrian;

extracting the edges of the pedestrians in the first pedestrian area to obtain the edges of the pedestrians;

determining an area including an edge of the pedestrian and smaller than the first pedestrian area as a second pedestrian area;

dividing N sub-regions in the second pedestrian region, each sub-region comprising one body sub-region of the pedestrian or comprising an appendage of the pedestrian;

inputting the N sub-regions into a convolutional neural network, and outputting attribute characteristics of the body sub-regions or the accessory sub-regions included in each sub-region.

One possible implementation, obtaining a first pedestrian region including the pedestrian, includes:

acquiring a gradient direction histogram of the image to be detected;

determining a characteristic vector of the image to be detected according to gradient density distribution in the gradient direction histogram of the image to be detected;

comparing the characteristic vector of the image to be detected with a preset characteristic vector of a sample image including the pedestrian, if the difference value between the characteristic vector of the image to be detected and the characteristic vector of the sample image is determined to be within a preset range, determining that the pedestrian exists in the image to be detected, and calibrating a first pedestrian area where the pedestrian exists.

One possible implementation, the determining the edge of the pedestrian, includes:

determining edges of pedestrians in the first pedestrian zone according to a color connected zone algorithm.

One possible implementation manner, wherein the dividing of the N sub-regions in the second pedestrian region includes:

acquiring a gradient direction histogram of the second pedestrian region;

determining a feature vector of the second pedestrian region according to gradient density distribution in the gradient direction histogram of the second pedestrian region;

for any one of the N sub-regions, if it is determined that a difference value between a feature vector of one sub-region existing in the second pedestrian region and a feature vector of a sub-region calibrated in a preset sample image is within a preset range, it is determined that image features of the sub-region in the second pedestrian region are the same as image features of the sub-region calibrated in the sample image, and the position of the sub-region is marked out from the second pedestrian region.

In one possible implementation, the attribute features of the header include at least one or more of: age, gender, head part, face part; the attribute features of the upper body at least include: a characteristic of the garment; the attribute features of the lower body at least include: clothing, shoes; the attribute features of the hand-held object at least comprise: bag, cart, trunk, pet's hand-held object carry or their color, type.

The embodiment of the application provides a pedestrian attribute detection device in image, the device includes:

the acquisition unit is used for detecting the pedestrians in the image to be detected;

a processing unit for obtaining a first pedestrian area including the pedestrian; extracting the edges of the pedestrians in the first pedestrian area to obtain the edges of the pedestrians; determining an area including an edge of the pedestrian and smaller than the first pedestrian area as a second pedestrian area; dividing N sub-regions in the second pedestrian region, each sub-region comprising one body sub-region of the pedestrian or comprising an appendage of the pedestrian; inputting the N sub-regions into a convolutional neural network, and outputting attribute characteristics of the body sub-regions or the accessory sub-regions included in each sub-region.

In one possible implementation, the processing unit is specifically configured to:

acquiring a gradient direction histogram of the image to be detected; determining a characteristic vector of the image to be detected according to gradient density distribution in the gradient direction histogram of the image to be detected; comparing the characteristic vector of the image to be detected with a preset characteristic vector of a sample image including the pedestrian, if the difference value between the characteristic vector of the image to be detected and the characteristic vector of the sample image is determined to be within a preset range, determining that the pedestrian exists in the image to be detected, and calibrating a first pedestrian area where the pedestrian exists.

In one possible implementation, the processing unit is specifically configured to: determining edges of pedestrians in the first pedestrian zone according to a color connected zone algorithm.

acquiring a gradient direction histogram of the second pedestrian region; determining a feature vector of the second pedestrian region according to gradient density distribution in the gradient direction histogram of the second pedestrian region; for any one of the N sub-regions, if it is determined that a difference value between a feature vector of one sub-region existing in the second pedestrian region and a feature vector of a sub-region calibrated in a preset sample image is within a preset range, it is determined that image features of the sub-region in the second pedestrian region are the same as image features of the sub-region calibrated in the sample image, and the position of the sub-region is marked out from the second pedestrian region.

The present application provides a computer program product, which includes computer readable instructions, when the computer reads and executes the computer readable instructions, the computer executes the method described in any one of the above.

The embodiment of the present application provides a chip, which is connected to a memory and is configured to read and execute a software program stored in the memory, so as to implement the method in any one of the above possible designs.

The embodiment of the application provides a method and a device for detecting the attribute of a pedestrian in an image, wherein the method comprises the steps of detecting the pedestrian in the image to be detected to obtain a first pedestrian area comprising the pedestrian; extracting the edges of the pedestrians in the first pedestrian area to obtain the edges of the pedestrians; determining an area including an edge of the pedestrian and smaller than the first pedestrian area as a second pedestrian area; dividing N sub-regions in the second pedestrian region, each sub-region comprising one body sub-region of the pedestrian or comprising an appendage of the pedestrian; and inputting the image to be detected corresponding to the N sub-regions into a convolutional neural network, and outputting the attribute characteristics of the body sub-regions or the attachments in each region. The second pedestrian area determined by the embodiment of the application is the accurately positioned pedestrian area, so that the N divided sub-areas are more accurate, the pedestrian attribute characteristics corresponding to the N sub-areas are identified at one time by adopting the convolutional neural network, and the detection precision and the detection efficiency are improved.

Drawings

Fig. 1 is a schematic flowchart of a method for detecting a pedestrian attribute in an image according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of pedestrian attribute detection in an image according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram illustrating pedestrian attribute detection in an image according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram illustrating pedestrian attribute detection in an image according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram illustrating pedestrian attribute detection in an image according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of a convolutional neural network according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a device for detecting a pedestrian property in an image according to an embodiment of the present application.

Detailed Description

In the prior art, a pedestrian detection method in an image mainly detects a pedestrian target by using a Histogram of Oriented Gradient (HOG) algorithm, and the obtained result has low detection accuracy and an excessively large pedestrian surrounding frame, so that the problem of low accuracy of the attribute of a pedestrian determined according to a determined pedestrian region is caused. In order to effectively improve the efficiency of detecting the attributes of pedestrians, improve the real-time performance of detecting the attributes of pedestrians and facilitate the overall optimization of detecting the attributes of pedestrians, the embodiment of the application provides a method and a device for detecting the attributes of pedestrians in an image.

The embodiment of the application is applied to the electronic equipment, and the electronic equipment can be a desktop computer, a notebook computer, other intelligent equipment with processing capacity and the like. In addition, the detection of the pedestrian property in the image in the embodiment of the application may be to detect the pedestrian property in the image of the traffic scene, and may also detect the pedestrian property in other scenes of video monitoring, such as a park, an apartment, a supermarket, and the like. The method is widely applied to various services such as video investigation, pedestrian feature search, suspect search and the like.

Fig. 1 is a schematic flowchart of a method for detecting a pedestrian attribute in an image according to an embodiment of the present application, including the following steps:

step 101: detecting a pedestrian in an image to be detected to obtain a first pedestrian area comprising the pedestrian;

step 102: extracting the edges of the pedestrians in the first pedestrian area to obtain the edges of the pedestrians;

step 103: determining an area including an edge of the pedestrian and smaller than the first pedestrian area as a second pedestrian area;

step 104: dividing N sub-regions in the second pedestrian region, each sub-region comprising one body sub-region of the pedestrian or comprising an appendage of the pedestrian;

step 105: and inputting the image to be detected corresponding to the N sub-regions into a convolutional neural network, and outputting the attribute characteristics of the body sub-regions or the attachments in each region.

In step 101, a first pedestrian region including the pedestrian may be determined by a Histogram of Oriented Gradient (HOG) feature extraction algorithm. The specific process is as follows:

step one, acquiring a gradient direction histogram of the image to be detected;

in the image to be detected, the shape characteristics of the pedestrian can be determined according to the direction density distribution of the gradient or the edge. The specific implementation method comprises the following steps:

dividing the image into a filter suitable for detecting pedestrians, and collecting a direction histogram of the gradient or edge of each pixel point in the filter; and the gradient direction of each filter is divided into different directions, and the gradient direction and the amplitude in the filter are used for carrying out weighted projection on all the directions to determine the characteristic vector generated by the filter on the image to be detected.

And step two, comparing the characteristic vector of the image to be detected with a preset characteristic vector of a sample image including the pedestrian, if the difference value between the characteristic vector of the image to be detected and the characteristic vector of the sample image is determined to be within a preset range, determining that the pedestrian exists in the image to be detected, and calibrating a first pedestrian area where the pedestrian exists.

In one possible implementation manner, the sample image may include a positive sample, that is, a sample image for dividing a pedestrian region, where the division manner may be a manual division manner; and a negative sample, wherein the sample image of the pedestrian is not included in the negative sample image. Training is carried out through the characteristics of the pedestrian areas in the positive sample image and the negative sample image, so that the characteristic vector of the pedestrian area is obtained. In the detection process, comparing the feature vector of the image to be detected with the feature vector of the trained pedestrian region, if the difference value between the feature vector of the image to be detected and the feature vector of the pedestrian region is determined to be within a preset range, determining that a pedestrian exists in the image to be detected, and calibrating a first pedestrian region where the pedestrian exists.

The specific comparison method may be performed by a method of a support vector machine, which is the same as the method in the prior art and is not described herein again.

As shown in fig. 2, a first pedestrian area 201 is determined for an embodiment of the present application. The first pedestrian region 201 determined by the HOG feature is a larger range, and the pedestrian attribute is identified only by the first pedestrian region 201 determined by the HOG feature, so that the identification accuracy is low.

In step 102, the extraction of the pedestrian's edge may be determined by a color connected region algorithm. The method specifically comprises the following steps:

step one, converting the sampling format of the image to be detected into a YCbCr format.

YCbCr is a color space that is commonly used in video continuous processing in movies, or in digital photography systems. Y is a luminance value (luminance), and Cb and Cr are chrominance components of blue and red. Through the conversion of the sampling format, the size of the image to be detected can be compressed, so that the size of a model for image processing is reduced, and the speed of image processing is improved.

And step two, obtaining a binary image of the image to be detected, and performing expansion and corrosion treatment on the binary image.

The Binary Image (Binary Image) refers to that each pixel in the Image has only two possible values or grayscale states, and can be represented by a black-and-white, B & W, or monochrome Image. All pixels in a binary image can only be taken from two values, 0 and 1, the two desirable values corresponding to off and on, respectively, the off representing the pixel in the background and the on representing the pixel in the foreground. Only the edge features of the image are reserved through the binary image, the details of the image are ignored, the occupied space of the image is small, and the structural features of the image can be more easily identified. For example, whether the image is a landscape or a pedestrian or the like is identified.

In the specific implementation process, the edges of the image can be smoothed by erosion and expansion processing of the binary image. The process of erosion first and then dilation is called on-operation, which has the effect of eliminating fine objects, separating objects at fine points and smoothing the boundaries of larger objects. The process of expansion followed by erosion is called closed-loop operation, which has the effect of filling small voids in the object, connecting adjacent objects and smoothing the boundary. Specifically, a small binary image (structural element) is moved point by point on the binary image and compared, and corresponding expansion and erosion treatments are performed according to the comparison result. Foreground noise and holes in the area are removed through a morphological filtering and connectivity detection method, noise interference of an image to be detected can be effectively removed, and pedestrian detection accuracy is improved.

And step three, determining a significant boundary of the pedestrian and the surrounding environment, and taking the boundary as the edge of the pedestrian.

In a specific implementation process, pixels with "1" values adjacent to each other are combined into a region from a binary image consisting of only pixels with "1" (foreground points) and pixels with "0" (background points) by a connected region marking method of the binary image, and each connected region is described by boundary information.

As shown in fig. 3, a schematic diagram of an edge 301 of a pedestrian determined according to an embodiment of the present application is shown. It should be noted that, the execution process of step 101 and step 102 may be to perform step 101 first, and after determining the first pedestrian region 201, perform step 102 on the image to be detected in the first pedestrian region 201 to obtain a pedestrian edge 301; step 101 and step 102 may be performed simultaneously, and after the first pedestrian region 201 is determined in step 101 and all edges in the image to be detected are determined in step 102, edges 301 of pedestrians in all the edges of the image to be detected are determined through the determined first pedestrian region 201.

The edge 301 determined by the color connected region is the minimum edge of the pedestrian, and even a case where a body part of the pedestrian or an appendage of the pedestrian is determined to be outside the edge 301 of the pedestrian may occur, and therefore, there is still a certain error. To improve the recognition accuracy, the first pedestrian region 201 determined by the HOG feature is combined with the pedestrian edge 301 to determine a second pedestrian region 401 as shown in fig. 4.

In one possible implementation manner, in step 103, the determination method of the second pedestrian area 401 may be implemented by:

step one, extreme points a, b, c and d in four directions of up, down, left and right are determined according to the edge 301 of the pedestrian.

In one possible implementation manner, the four directions, i.e., the up, down, left, and right directions, are perpendicular to and outward of the 4 frames of the first pedestrian area 201; in each direction, the edge point of the maximum value of the edge 301 of the pedestrian in the direction is determined as the extreme point in the direction.

And step two, determining a third pedestrian area 302 similar to the first pedestrian area 201 according to the four extreme points.

Step three, determining a second pedestrian area 401 according to the first pedestrian area 201 and the third pedestrian area 302;

in one possible implementation, as shown in fig. 4, the second pedestrian area 401 is an area smaller than the first pedestrian area 201 and larger than the third pedestrian area 302.

In one possible implementation, the second pedestrian area 401 is located at a position intermediate the first pedestrian area 201 and the third pedestrian area 302, and the size of the second pedestrian area 401 is an average value of the size of the first pedestrian area 201 and the size of the third pedestrian area 302.

The second pedestrian region 401 determined in step 103 reduces the detection region of the pedestrian to a suitable position, and improves the accuracy and efficiency of the subsequent detection of the pedestrian property.

Through the image to second pedestrian region 401, carry out the detection of pedestrian's subregion, through the pixel point that reduces subregion, reduced the size of dividing the subregion's wave filter, improvement discernment that can be very big speed, and then improve image identification's efficiency.

In particular implementations, the body sub-regions or appendages included in the N regions may include at least one or more of: head, upper body, lower body, hand held, etc.; the selection of the sub-region may be determined according to an actual application scenario, and is not limited herein.

In step 104, dividing N sub-regions in the second pedestrian region 401 may include the following steps:

step one, acquiring a gradient direction histogram of a second pedestrian region 401;

to improve the accuracy of identifying N sub-regions, a possible implementation manner may determine the histogram of the gradient direction of the second pedestrian region 401 through a class 2 filter, which includes the following specific steps:

the method comprises the following steps: convolving the second pedestrian area 401 according to a root filter to obtain a response map of the root filter;

the root filters (root filters) are global filters, each root filter is weighted and superposed on the gradient direction according to a vector machine-based classification model, and the brighter direction of the gradient direction can be interpreted as the higher possibility that a pedestrian has the directional gradient. The response map obtained by the root filter approximately represents the overall characteristics of a pedestrian.

Step two: amplifying the image of the second pedestrian region 401 by 2 times through gaussian pyramid upsampling; respectively convolving the image to be detected corresponding to the second pedestrian area amplified by 2 times according to N sub-area filters (part filters) to obtain response graphs of the N sub-area filters;

and the sub-regions are detected by amplifying the image by 2 times, so that the precision of detecting the N sub-regions is improved. The N sub-region filters are filters of the N sub-regions determined from the trained sample image. The second pedestrian area 401 is used for training the size of the sample image selected from the N sub-area filters, the number of the determined pixel points of the N sub-area filters is small, and the recognition accuracy and the recognition efficiency of the N sub-areas are effectively improved.

Step three: performing gaussian pyramid downsampling on the response graphs of the N sub-region filters, performing weighted average on the response graphs after downsampling and the response graph of the root filter, and determining the direction and the amplitude of the gradient of the second pedestrian region 401, namely the gradient density distribution of the second pedestrian region 401;

and performing fine Gaussian pyramid downsampling processing on the response graphs of the N sub-region filters to ensure that the response graphs of the N sub-region filters and the response graph of the root filter have the same resolution. And performing weighted average on the images to obtain a final response map, and further determining a gradient direction histogram of the second pedestrian region 401.

and thirdly, for any one of the N sub-regions, if the difference value between the feature vector of one sub-region in the second pedestrian region and the feature vector of the sub-region calibrated in the preset sample image is determined to be within a preset range, determining that the image feature of the sub-region in the second pedestrian region is the same as the image feature of the sub-region calibrated in the sample image, and dividing the position of the sub-region from the second pedestrian region.

For example, as shown in fig. 5, the 4 sub-regions of the second pedestrian region 402 determined by step 104 in the embodiment of the present application are a head sub-region 501, an upper body sub-region 502, a lower body sub-region 503, and an attachment sub-region 504.

For the head sub-region 501, the attribute features of the head may include at least one or more of: age, gender, head part, face part, etc.; the head part may be a hat and the face part may be glasses or the like.

For the upper body sub-region 502, the attribute characteristics of the upper body may include at least one or more of: coat color, coat type, and the like;

for the lower body sub-region 503, the attribute features of the lower body may include at least one or more of: the color of the clothing, shoes, the type of clothing, shoes, etc.;

for the adjunct sub-region 504, the attribute characteristics of the adjunct can include at least one or more of: whether or not the accessories such as bags, carts, luggage cases, pets, etc. are carried, or the characteristics of the color, type, etc.

In a specific implementation process, the N detected pedestrian attribute features of the N sub-regions can be determined as required;

for example, as shown in fig. 5 for the 4 sub-regions 501-504, it can be determined that the corresponding detected 4 pedestrian attribute features are: the attribute feature of the pedestrian detected by the sub-area 501 is gender, the attribute feature of the pedestrian detected by the sub-area 502 is jacket color, the attribute feature of the pedestrian detected by the sub-area 503 is jacket color, and the attribute feature of the pedestrian detected by the sub-area 504 is whether to carry a bag.

In step 105, the to-be-detected images corresponding to the N sub-regions are input into a convolutional neural network for the determined N pedestrian attribute features.

In the embodiment of the application, a large number of sample images are adopted to train the convolutional neural network, and a sample image set is formed by the large number of sample images. A rectangular box may be used to determine N sub-regions in each sample image. Each sub-region in the sample image corresponds to one sub-convolution neural network, and each sub-convolution neural network model is used for identifying the pedestrian attribute feature which needs to be identified and corresponds to the sub-region. For example, as shown in the structural diagram of the convolutional neural network shown in fig. 6, the convolutional neural network includes 4 sub-convolutional neural networks 601-604 respectively corresponding to the 4 sub-regions 501-504 of the second pedestrian region 402, that is, the head sub-region 501, the upper body sub-region 502, the lower body sub-region 503, and the attachment sub-region 504.

In this embodiment, the training process of the convolutional neural network may train the convolutional neural network by using all sample images in the sample image set. In order to improve the training efficiency, in the embodiment of the present application, the convolutional neural networks are respectively trained according to the pedestrian attribute features of each sub-region in the sample image.

When the weight coefficient of the sub-convolution neural network model corresponding to any one pedestrian attribute feature is not determined, one pedestrian attribute feature of one sub-region of the N pedestrian attribute features of the N sub-regions can be randomly selected to train the corresponding sub-convolution neural network, and the specific training process comprises the following steps:

selecting a sub-sample image in the sample image set, wherein the sub-sample image is a sample image corresponding to the sub-region; training the sub-convolution neural network by adopting the selected sub-sample image; and continuously updating the weight coefficient of the sub-convolution neural network until the error between the predicted pedestrian attribute feature information and the marked pedestrian attribute feature information is converged.

After a pedestrian attribute feature of at least one sub-region is determined, a partial weight coefficient of the determined pedestrian attribute feature is used as an initial value of a weight coefficient of a pedestrian attribute feature to be trained next, and the initial value is input into a sub-convolution neural network corresponding to the pedestrian attribute feature to be trained for training. Wherein, the weight coefficient of 80% of the determined pedestrian property features can be used as the initial value of the weight coefficient of the pedestrian property feature to be trained next. The specific training process is the same as the training process of the sub-convolution neural network of the determined pedestrian attribute characteristics, and is not repeated herein.

Through the training method, the obtained weight coefficients corresponding to the N pedestrian attribute features of the N sub-regions are partially the same. The calculation amount of the convolutional neural network model is greatly reduced, and the image recognition efficiency is improved.

In step 105, when detecting the image to be detected, the image is directly input into the convolutional neural network trained in advance. Fig. 7 shows a convolutional neural network provided in an embodiment of the present application, where the convolutional neural network includes: the fully-connected convolutional layers can identify pedestrian attribute characteristics corresponding to the sub-regions in a characteristic diagram obtained by the convolutional layers, the down-sampling layers perform down-sampling on each identified sub-region, and the fully-connected convolutional layers determine a characteristic vector corresponding to the sub-region according to a down-sampling result.

For example, as shown in fig. 6, the sub-convolutional neural network 601 is a convolutional neural network model for detecting gender characteristics, the sub-convolutional neural network 602 is a convolutional neural network model for detecting color characteristics of an upper garment, the sub-convolutional neural network 603 is a convolutional neural network model for detecting color characteristics of a lower garment, and the sub-convolutional neural network 604 is a convolutional neural network model for detecting whether to carry the characteristics of a bag. The weight coefficients of the 4 sub-convolutional neural networks 601-604 are partially the same, so the weight coefficients of the 4 sub-convolutional neural network models can be set as the set of all the weight coefficients of the 4 sub-convolutional neural network models. And in the calculation process of the convolution layer in the middle of the sub-convolution neural network model, performing parallel calculation on the 4 sub-convolution neural networks to improve the detection efficiency.

And only in the last fully-connected convolution layer of the 4 sub-convolutional neural network models, classifying and identifying the feature vectors of the 4 pedestrian attribute features corresponding to the 4 sub-convolutional neural network models to output corresponding attribute feature values.

And predicting the probability of whether the sub-region has the corresponding pedestrian attribute characteristic or not according to the pedestrian attribute characteristic graph of the fully-connected convolution layer of the sub-convolution neural network. In this embodiment of the application, when it is predicted that the pedestrian attribute exists in the sub-region, the corresponding probability is 1, otherwise, the corresponding probability is 0, and of course, when it is predicted that the pedestrian attribute feature corresponding to the sub-region exists, the corresponding probability may also be other values greater than 0.

In the specific implementation process, the pedestrian attribute characteristics are judged through the fully-connected convolution layers in the convolution characteristic graph obtained by the last convolution layer. The probability of the pedestrian attribute feature corresponding to the sub-region is determined based on the determined convolution feature map by fully connecting the convolution layers, for example, the probability of the pedestrian attribute feature existing in each sub-region is determined to be 0 or 1 by the convolution feature map, where 0 represents that the pedestrian attribute feature corresponding to the sub-region does not exist, and 1 represents that the pedestrian attribute feature corresponding to the sub-region exists, and of course, the probability of whether the pedestrian attribute feature corresponding to each sub-region exists or not may also be recorded in other manners, for example, a probability threshold is set, and a case that the probability threshold is greater than the probability threshold indicates that the pedestrian attribute feature corresponding to the sub-region exists, and a case that the probability threshold is less than the probability threshold indicates that.

Taking the head sub-region 501 as an example, after the image to be detected of the head sub-region 501 is input into the sub-convolution neural network 601, the feature map of the head sub-region 501 is obtained at the last layer of the fully-connected convolution layer after passing through the convolution layer and the down-sampling layer, the probability that the gender feature of the head sub-region 501 is female is determined to be 0.8 through the fully-connected convolution layer, the probability of male is determined to be 0.2, and the gender of the image to be detected is output as female. The other sub-regions 502-504 and the head sub-region 501 are simultaneously input into the convolutional neural network model for detection, so as to simultaneously obtain 4 pedestrian attribute features corresponding to the 4 sub-regions.

In the embodiment of the present application, the convolutional neural network may be a google-net model, and includes 22 convolutional layers, 5 downsampling layers, and a final layer is a full connection layer. The convolutional neural network has strong feature learning capability, the problem that description is not accurate enough due to manual feature division can be solved, in addition, the network with the N sub-convolutional neural networks with the same weight coefficient is adopted in the embodiment of the application, fewer weight coefficients are adopted, the calculated amount can be greatly reduced on the basis of ensuring the accuracy rate, and N pedestrian attribute features can be obtained at the same time.

As shown in fig. 7, an embodiment of the present application provides a pedestrian property detection apparatus in an image, the apparatus including:

an obtaining unit 701, configured to detect a pedestrian in an image to be detected;

a processing unit 702 for obtaining a first pedestrian region including the pedestrian; extracting the edges of the pedestrians in the first pedestrian area to obtain the edges of the pedestrians; determining an area including an edge of the pedestrian and smaller than the first pedestrian area as a second pedestrian area; dividing N sub-regions in the second pedestrian region, each sub-region comprising one body sub-region of the pedestrian or comprising an appendage of the pedestrian; inputting the N sub-regions into a convolutional neural network, and outputting attribute characteristics of the body sub-regions or the accessory sub-regions included in each sub-region.

In a possible implementation manner, the processing unit 702 is specifically configured to:

In a possible implementation manner, the processing unit 702 is specifically configured to: determining edges of pedestrians in the first pedestrian zone according to a color connected zone algorithm.

The embodiment of the application provides a method and a device for detecting pedestrian attributes in an image. The embodiment of the application provides a method and a device for detecting the attribute of a pedestrian in an image, wherein the method comprises the steps of detecting the pedestrian in the image to be detected to obtain a first pedestrian area comprising the pedestrian; extracting the edges of the pedestrians in the first pedestrian area to obtain the edges of the pedestrians; determining an area including an edge of the pedestrian and smaller than the first pedestrian area as a second pedestrian area; dividing N sub-regions in the second pedestrian region, each sub-region comprising one body sub-region of the pedestrian or comprising an appendage of the pedestrian; and inputting the image to be detected corresponding to the N sub-regions into a convolutional neural network, and outputting the attribute characteristics of the body sub-regions or the attachments in each region. The second pedestrian area determined by the embodiment of the application is the accurately positioned pedestrian area, so that the N divided sub-areas are more accurate, the pedestrian attribute characteristics corresponding to the N sub-areas are identified at one time by adopting the convolutional neural network, and the detection precision and the detection efficiency are improved.

For the system/apparatus embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference may be made to some descriptions of the method embodiments for relevant points.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method of detecting a pedestrian attribute in an image, the method comprising:

dividing N sub-regions in the second pedestrian region, each sub-region comprising one body sub-region of the pedestrian or comprising an appendage of the pedestrian; the method specifically comprises the following steps: convolving the second pedestrian area according to a root filter to obtain a response graph of the root filter; convolving the second pedestrian region according to N sub-region filters to obtain response graphs of the N sub-region filters; determining a gradient direction histogram of the second pedestrian region according to the response graph of the root filter and the response graphs of the N sub-region filters, wherein the gradient direction histogram of the second pedestrian region is used for determining the N sub-regions of the second pedestrian region;

inputting the N sub-regions into a convolutional neural network, and outputting attribute characteristics of body sub-regions or accessory sub-regions included in each sub-region; the convolutional neural network comprises N sub-convolutional neural networks; and aiming at the training process of the sub-convolution neural networks of the pedestrian attribute features corresponding to the N sub-convolution neural networks, taking partial weight coefficients of the determined pedestrian attribute features as initial values of the weight coefficients of the pedestrian attribute features of the sub-convolution neural networks.

2. The method of claim 1, wherein obtaining a first pedestrian area including the pedestrian comprises:

acquiring a gradient direction histogram of the image to be detected;

3. The method of claim 1, wherein the determining the edge of the pedestrian comprises:

4. The method according to claim 1, wherein the dividing N sub-regions in the second pedestrian region comprises:

acquiring a gradient direction histogram of the second pedestrian region;

5. The method according to any of claims 1-4, wherein the body sub-area comprises at least one of: head, upper or lower body; the attribute features of the head include at least one or more of: age, gender, head part, face part; the attribute features of the upper body at least include: a characteristic of the garment; the attribute features of the lower body at least include: clothing, shoes; the appendage sub-region comprises a hand piece; the attribute features of the hand-held object at least comprise: bag, cart, trunk, pet's hand-held object carry or their color, type.

6. An apparatus for detecting a pedestrian property in an image, the apparatus comprising:

a processing unit for obtaining a first pedestrian area including the pedestrian; extracting the edges of the pedestrians in the first pedestrian area to obtain the edges of the pedestrians; determining an area including an edge of the pedestrian and smaller than the first pedestrian area as a second pedestrian area; dividing N sub-regions in the second pedestrian region, each sub-region comprising one body sub-region of the pedestrian or comprising an appendage of the pedestrian; the method specifically comprises the following steps: convolving the second pedestrian area according to a root filter to obtain a response graph of the root filter; convolving the second pedestrian region according to N sub-region filters to obtain response graphs of the N sub-region filters; determining a gradient direction histogram of the second pedestrian region according to the response graph of the root filter and the response graphs of the N sub-region filters, wherein the gradient direction histogram of the second pedestrian region is used for determining the N sub-regions of the second pedestrian region; inputting the N sub-regions into a convolutional neural network, and outputting attribute characteristics of body sub-regions or accessory sub-regions included in each sub-region; the convolutional neural network comprises N sub-convolutional neural networks; and aiming at the training process of the sub-convolution neural networks of the pedestrian attribute features corresponding to the N sub-convolution neural networks, taking partial weight coefficients of the determined pedestrian attribute features as initial values of the weight coefficients of the pedestrian attribute features of the sub-convolution neural networks.

7. The apparatus according to claim 6, wherein the processing unit is specifically configured to:

8. The apparatus according to claim 6, wherein the processing unit is specifically configured to: determining edges of pedestrians in the first pedestrian zone according to a color connected zone algorithm.

9. The apparatus according to claim 6, wherein the processing unit is specifically configured to:

10. The apparatus according to any of claims 6-9, wherein the body sub-region comprises at least one of: head, upper or lower body; the attribute features of the head include at least one or more of: age, gender, head part, face part; the attribute features of the upper body at least include: a characteristic of the garment; the attribute features of the lower body at least include: clothing, shoes; the appendage sub-region comprises a hand piece; the attribute features of the hand-held object at least comprise: bag, cart, trunk, pet's hand-held object carry or their color, type.