CN106778705B

CN106778705B - Pedestrian individual segmentation method and device

Info

Publication number: CN106778705B
Application number: CN201710065013.0A
Authority: CN
Inventors: 王亮; 黄永祯; 宋纯锋
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2017-02-04
Filing date: 2017-02-04
Publication date: 2020-03-17
Anticipated expiration: 2037-02-04
Also published as: CN106778705A

Abstract

The invention discloses a pedestrian individual segmentation method and a device, wherein the method comprises the following steps: carrying out pedestrian segmentation on the image to be processed by utilizing a pre-trained coarse-grained human-shaped contour segmentation model to obtain a block segmentation result; the blocking segmentation result comprises a plurality of blocks marked as a background and a foreground, the block marked as the background in the image to be processed does not contain a pedestrian body, and the block marked as the foreground contains a partial image of the pedestrian body; removing a background image in a part corresponding to the blocked pedestrian segmentation result in the image to be processed to obtain a coarse-grained segmentation image; inputting the coarse-grained segmentation image into a pre-trained fine-grained human-shaped contour segmentation model; outputting an individual pedestrian segmentation result by the pre-trained fine-grained human-shaped contour segmentation model; the coarse-grained human-shaped contour segmentation model and the fine-grained human-shaped contour segmentation model are obtained through full convolution neural network training.

Description

Pedestrian individual segmentation method and device

Technical Field

The invention relates to the technical field of computer vision and pattern recognition, in particular to a pedestrian individual segmentation method and device based on the combination of the thickness and the granularity of a full convolution neural network.

Background

The pedestrian individual segmentation problem is one of the most important problems in the fields of scene understanding, biological feature recognition and the like. Most of the traditional pedestrian segmentation methods require that other pedestrians cannot be contained in the background, and the segmentation result is obtained by distinguishing the difference between the human body and the environment. However, in an actual scene, there are a lot of cases where pedestrians are shielded from each other, and at this time, the conventional pedestrian segmentation method cannot obtain a satisfactory result. The problem can be partially solved by combining individual detection and pedestrian segmentation, but the individual detection is time-consuming and serious, and in many cases, even if the individual position is accurately detected, a perfect individual segmentation result cannot be obtained due to the fact that a plurality of pieces of human body information are contained in the region. The method for combining the fineness and the granularity can better solve the problem.

Disclosure of Invention

Aiming at the problems encountered in individual pedestrian segmentation in the prior art, the invention utilizes a coarse-grained segmentation model to shield other pedestrians appearing in the background in a mode of combining the coarse-grained segmentation model and a fine-grained segmentation model, and performs fine segmentation on the basis by using the fine-grained model to obtain the individual segmentation result. Firstly, training a coarse-grained human-shaped contour segmentation model of a multilayer full-convolution neural network by utilizing a large number of human-shaped images with marks; then, the human figure segmentation results of all images are obtained by utilizing the coarse-grained human figure contour segmentation model, and the background area on the images is subtracted (namely, the corresponding pixels are set to be 0) according to the segmentation results, so that the background area is used as the input of a fine-grained segmentation model; and finally, training a fine-grained human-shaped segmentation model by using the image of the shielding background as input and the fine human-shaped mark as supervision information.

In order to achieve the above object, a first aspect of the present invention provides a pedestrian individual segmentation method, including:

carrying out pedestrian segmentation on the image to be processed by utilizing a pre-trained coarse-grained human-shaped contour segmentation model to obtain a block segmentation result; the block segmentation image comprises a plurality of blocks marked as a background and a foreground, wherein the blocks marked as the background do not contain a pedestrian body, and the blocks marked as the foreground contain partial images of the pedestrian body;

removing a background image in a part corresponding to the blocked pedestrian segmentation result in the image to be processed to obtain a coarse-grained segmentation image;

inputting the coarse-grained segmentation image into a pre-trained fine-grained human-shaped contour segmentation model;

the coarse-grained human-shaped contour segmentation model and the fine-grained human-shaped contour segmentation model are obtained through full convolution neural network training.

The first full convolution neural network corresponding to the coarse-grained human-shaped contour segmentation model comprises a plurality of convolution layers and an anti-convolution layer; the second full convolution neural network corresponding to the fine-grained human-shaped contour segmentation model comprises a plurality of convolution layers and a plurality of anti-convolution layers, and the plurality of convolution layers and the plurality of anti-convolution layers are in centrosymmetric structures and combined into a funnel shape.

The method also comprises a training step of the coarse-grained human-shaped contour segmentation model, which comprises the following steps:

carrying out blocking processing on training samples with pedestrian marks in the training data set to obtain blocking processing results of the training samples;

normalizing training samples for training to be uniform in size, and then sending the normalized training samples into a first full convolution neural network corresponding to a coarse-grained human-shaped contour segmentation model;

comparing the block segmentation result output by the coarse-grained human-shaped contour segmentation model with the block processing result of the corresponding training sample to obtain a prediction error;

and reducing the prediction error by adopting a back propagation algorithm and a random gradient descent method to train a first full convolution neural network corresponding to the coarse-grained human-shaped contour segmentation model, and performing repeated iterative training to obtain a final coarse-grained human-shaped contour segmentation model.

The method also comprises a training step of the fine-grained human-shaped contour segmentation model, which comprises the following steps:

inputting training samples with pedestrian marks in a training data set into a trained coarse-grained human-shaped contour segmentation model to obtain a block segmentation result;

subtracting the background image in the corresponding part of the block segmentation result in the training sample to obtain a coarse-grained segmentation image;

normalizing the coarse-grained segmentation image to a uniform size;

sending the normalized coarse-grained segmentation image into a second full convolution neural network corresponding to the fine-grained human-shaped contour segmentation model;

comparing the fine segmentation result output by the second full convolution neural network with the fine segmentation marking result of the corresponding training sample to obtain a second prediction error;

and reducing a second prediction error by adopting a back propagation algorithm and a random gradient descent method to train a second full convolution neural network corresponding to the fine-grained human-shaped contour segmentation model, and performing repeated iterative training to obtain a final fine-grained human-shaped contour segmentation model.

The monitoring information of the coarse-grained human-shaped contour segmentation model is a segmentation mark for blocking processing, and is used for shielding the background in the image.

A second aspect of the present invention provides a pedestrian individual segmentation apparatus including:

the block segmentation module is configured to perform pedestrian segmentation on the image to be processed by utilizing a pre-trained coarse-grained human-shaped contour segmentation model to obtain a block segmentation image; the block segmentation image comprises a plurality of blocks marked as a background and a foreground, wherein the blocks marked as the background do not contain a pedestrian body, and the blocks marked as the foreground contain partial images of the pedestrian body;

the background removing module is configured to remove a background image in a part corresponding to the blocked pedestrian segmentation result in the image to be processed to obtain a coarse-grained segmentation image;

a fine segmentation module configured to input the coarse-grained segmentation image to a pre-trained fine-grained human-shaped contour segmentation model;

The device also comprises a training module of the coarse-grained human-shaped contour segmentation model, which comprises:

the marking sub-module is configured to perform blocking processing on the training samples with pedestrian marks in the training data set to obtain blocking processing results of the training samples;

a first normalization submodule configured to normalize training samples for training to a uniform size;

the first training submodule is configured to send the normalized training sample into a first full convolution neural network corresponding to the coarse-grained human-shaped contour segmentation model;

the first comparison submodule is configured to compare a block segmentation result output by the coarse-granularity human-shaped contour segmentation model with a blocking processing result of a corresponding training sample to obtain a prediction error;

and the first iteration submodule is configured to reduce the prediction error by adopting a back propagation algorithm and a random gradient descent method so as to train a first full convolution neural network corresponding to the coarse-grained human-shaped contour segmentation model, and obtain a final coarse-grained human-shaped contour segmentation model through multiple iterative training.

The device also comprises a training module of the fine-grained human-shaped contour segmentation model, and the training module comprises:

the blocking sub-module is configured to input training samples with pedestrian marks in the training data set into a trained coarse-grained human-shaped contour segmentation model to obtain a blocking segmentation result;

the background removal submodule is configured to subtract a background image in a part corresponding to the blocking segmentation result in the training sample to obtain a coarse-grained segmentation image;

a second normalization sub-module configured to normalize the coarse-grained segmented image to a uniform size;

the second training submodule is configured to send the normalized coarse-grained segmentation image into a second full convolution neural network corresponding to the fine-grained human-shaped contour segmentation model;

a second comparison sub-module configured to compare the fine segmentation result output by the second full convolution neural network with the fine segmentation marking result of the corresponding training sample to obtain a second prediction error;

and the second iteration submodule is configured to reduce a second prediction error by adopting a back propagation algorithm and a random gradient descent method so as to train a second full convolution neural network corresponding to the fine-grained human-shaped contour segmentation model, and obtain a final fine-grained human-shaped contour segmentation model through multiple times of iterative training.

The pedestrian individual segmentation method based on the combination of the coarse granularity and the fine granularity of the full convolution neural network provided by the invention respectively trains the coarse granularity segmentation model and the fine granularity model by adopting the deep learning technology, and partial background is shielded by using the result of the coarse granularity segmentation model, so that the accuracy of human shape segmentation can be improved, and the pedestrian individual segmentation method is particularly suitable for the condition that other pedestrians are contained in the background. According to the technical scheme, the results of the coarse-and-fine-granularity segmentation model need to be combined at the same time, wherein the results of the coarse-and-fine-granularity segmentation model are used for removing background blocks in the image and are used as the input of the fine-granularity segmentation model, so that the difficulty of fine-granularity segmentation can be greatly reduced, and the segmentation effect is improved; the coarse and fine granularity segmentation models are all full convolution neural networks and only comprise convolution layers and full convolution layer structures, and the coarse and fine granularity segmentation models have the advantages of simple structure and few parameters, so that the operation speed is high; the coarse-grained division network only comprises one deconvolution layer and can predict the blocking division result, and the fine-grained division network is of a front-back symmetrical funnel-shaped structure and can predict the fine division result; the supervision information of the proposed coarse-grained segmentation model is a segmentation label for blocking processing, the background in the image, especially other pedestrians in the background can be effectively shielded through training, and finally, only the region containing one pedestrian is reserved

Drawings

FIG. 1 is a schematic diagram of training data and labeling methods according to the present invention;

FIG. 2 is a flow chart illustrating a pedestrian individual segmentation method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a coarse-grained segmentation model according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a fine-grained segmentation model in an embodiment of the present invention.

Detailed Description

The technical solution of the present invention is further described in detail by the accompanying drawings and examples.

An embodiment of the invention provides a pedestrian individual segmentation method. The method comprises the following steps:

and outputting an individual pedestrian segmentation result by the pre-trained fine-grained human-shaped contour segmentation model.

In an embodiment, the blocking segmentation result is to divide the image to be processed into a plurality of blocks with the same size, and each block is marked as a background block or a foreground block, an image corresponding to the background block does not include an image of a pedestrian subject, and an image corresponding to the foreground block includes a partial image of the pedestrian subject, as shown in fig. 1, d is a block segmentation result, and e is a blocking segmentation image, that is, a blocking segmentation image corresponding to the blocking segmentation result.

In an embodiment, the coarse-grained human-shaped contour segmentation model and the fine-grained human-shaped contour segmentation model are all full convolution neural networks, that is, both the coarse-grained human-shaped contour segmentation model and the fine-grained human-shaped contour segmentation model only comprise convolution layers and full convolution layers, and have the advantages of simple structure and few parameters, so that the operation speed is high.

In an embodiment, the coarse-grained human-shaped contour segmentation model includes a plurality of convolutional layers and a deconvolution layer, and is used for predicting a segmentation result of the blocking, and the fine-grained human-shaped contour segmentation model includes a plurality of convolutional layers and a plurality of deconvolution layers, and a convolutional part and a deconvolution part of the fine-grained human-shaped contour segmentation model are centrosymmetric structures and are combined into a funnel shape, that is, an innermost layer of the fine-grained human-shaped contour segmentation model is a convolutional layer, a first half part is a convolutional layer, a second half part is a rice convolutional layer, and the first half part and the second half part are centrosymmetric structures.

In an embodiment, the supervision information of the coarse-grained human-shaped contour segmentation model is a segmentation label for blocking processing, and the background in the image, especially other pedestrians in the background, can be effectively shielded through training, and finally only a region containing one pedestrian is reserved.

On the basis of coarse-grained model segmentation, the fine-grained model can obtain a very fine pedestrian individual segmentation result through a symmetrical funnel-shaped full convolution network. The method has strong robustness to various background changes in the image, and can better solve the human shape segmentation problem under the condition that multiple lines of people are mutually shielded

In the following, a large human figure segmentation database is taken as an example, and 5000 pedestrian images and corresponding human figure segmentation labels are included in the database.

Fig. 2 is a flowchart of the individual pedestrian segmentation method of the present invention, and as shown in the figure, the present invention specifically includes the following steps:

step S0, block 5000 pieces of pedestrian mark data in the data set, as shown in fig. 1, first divide the pedestrian segmentation mark image into 10 × 5 blocks uniformly, and then determine whether the block is a foreground or background block according to whether the block contains the pedestrian segmentation mark, so as to obtain 10 × 5 blocking division marks, and 5000 pairs of pedestrian images and block division marks are obtained;

step S1, normalize the pedestrian image for training to a uniform size (50 × 25 pixels), and then send the image to a full convolution neural network (i.e. coarse-grained segmentation network), which contains several convolution layers and deconvolution layers, and the specific structure is shown in fig. 3, which contains 4 convolution layers and 1 deconvolution layer in total. The first convolutional layer contains 48 filters (size 3 × 3) with step size of 2; similarly, the second, third, and fourth convolutional layers respectively contain 96/96/128 filters (all 3 × 3 in size) with step size of 2, the fifth layer is an deconvolution layer containing 1 filter (all 10 × 5 in size) with step size of 1, and the output is the coarse-grained segmentation result.

Step S2, outputting the image representation, i.e. the block division result (size 10 × 5), at the last layer of the coarse-grained division network;

step S3, comparing the output block division result with the corresponding block division mark (as shown in d in fig. 1) to obtain a prediction error, and comparing the prediction errors of each point in the 10 × 5 region and summing them to obtain a final prediction error;

step S4, reducing the prediction error by using a back propagation algorithm and a random gradient descent method to train the coarse-grained segmentation network, obtaining a better coarse-grained segmentation model through multiple iterative training, requiring about 10,000 iterations, further reducing the error loss by adjusting the learning rate of the weight until the error is not reduced any more, and finishing the training of the coarse-grained segmentation model;

step S5, inputting the normalized pedestrian image (with a size of 50 × 25) into the trained coarse-grained model to obtain a blocked segmentation result (with a size of 10 × 5), and subtracting the background region on the non-normalized original pedestrian image according to the segmentation result (i.e. setting the corresponding pixel to 0) to obtain a pedestrian image without the background, as shown in fig. 1;

step S6, the background-removed pedestrian image obtained in step S5 is normalized to a uniform size (e.g., 150 × 75 pixels), and then the image is sent to a full convolution neural network (i.e., a fine-grained segmentation network), where the network includes several convolution layers and deconvolution layers, and the convolution portions and deconvolution portions thereof are generally symmetrical structures and combined into a funnel shape, as shown in fig. 4, including 4 convolution layers and 3 deconvolution layers. The first convolutional layer contains 48 filters (size 3 × 3) with step size of 2; similarly, the second, third, and fourth convolutional layers respectively contain 64/96/128 filters (all with a size of 3 × 3), and the step sizes are all 2; the fifth layer is a deconvolution layer, is symmetrical to the third layer of convolution layer, and contains 96 filters (with the size of 3 multiplied by 3) and the step length of 2; the sixth layer is a deconvolution layer, is symmetrical to the second layer of convolution layer, and contains 64 filters (the size is 3 multiplied by 3), and the step length is 2; the seventh layer is a deconvolution layer, is symmetrical to the first layer of convolution layer, contains 1 filter (with the size of 3 multiplied by 3), has the step length of 2, and outputs a subdivision result with the size of 150 multiplied by 75;

step S7, outputting an image representation at the last layer of the fine-grained segmentation network, i.e., a fine segmentation result (size of 150 × 75);

step S8, comparing the output fine segmentation result with the corresponding normalized segmentation flag (with a size of 150 × 75, as shown in b of fig. 1) to obtain a prediction error, where the error is the sum of errors of 150 × 75 pixels; the normalized segmentation markers are normalized into a fine segmentation result size (50 × 75) result from the original samples with exact segmentation markers;

step S9, reducing prediction errors by adopting a back propagation algorithm and a random gradient descent method to train the fine-grained segmentation network, and obtaining a final fine-grained segmentation model through multiple iterative training; because the network scale is large, about 100,000 iterations are usually needed, the error loss can be further reduced by adjusting the learning rate of the weight until the error is not reduced any more, and the training of the fine-grained segmentation model is finished at the moment;

and step S10, testing by using the trained coarse-fine granularity segmentation model. Firstly, normalizing an image to be tested containing pedestrians to 50 × 25 pixels, and sending the image into a coarse-grained segmentation model to obtain a coarse-grained segmentation result (namely a block segmentation result, the size of which is 10 × 5);

step S11, using the obtained block segmentation result in S10 to subtract the background region on the original pedestrian image (i.e. to set the corresponding pixel to 0) to obtain the pedestrian image without the background, then normalizing the pedestrian image to a uniform size (150 × 75 pixels), and finally sending the image into a fine-grained segmentation network;

in step S12, at this time, a refined pedestrian segmentation result can be obtained from the output end of the fine-grained segmentation network.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention and are not intended to limit the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A pedestrian individual segmentation method comprising:

carrying out pedestrian segmentation on the image to be processed by utilizing a pre-trained coarse-grained human-shaped contour segmentation model to obtain a block segmentation result; the blocking segmentation result comprises a plurality of blocks marked as a background and a foreground, the block marked as the background in the image to be processed does not contain a pedestrian body, and the block marked as the foreground contains a partial image of the pedestrian body;

outputting an individual pedestrian segmentation result by the pre-trained fine-grained human-shaped contour segmentation model;

the coarse-grained human-shaped contour segmentation model and the fine-grained human-shaped contour segmentation model are obtained through full convolution neural network training;

2. The method of claim 1, further comprising the step of training the coarse-grained humanoid outline segmentation model, comprising:

3. The method of claim 1, further comprising the step of training the fine-grained humanoid-contour segmentation model, comprising:

normalizing the coarse-grained segmentation image to a uniform size;

4. The method according to claim 1, wherein the supervision information of the coarse-grained human-shaped contour segmentation model is segmentation markers for blocking processing, and the segmentation markers are used for shielding background in the image.

5. A pedestrian individual segmentation apparatus comprising:

the result output module is configured to output an individual pedestrian segmentation result by the pre-trained fine-grained human-shaped contour segmentation model;

6. The apparatus of claim 5, further comprising a training module of the coarse-grained humanoid outline segmentation model, comprising:

7. The apparatus of claim 5, further comprising a training module of the fine-grained humanoid outline segmentation model, comprising: the blocking sub-module is configured to input training samples with pedestrian marks in the training data set into a trained coarse-grained human-shaped contour segmentation model to obtain a blocking segmentation result;

8. The apparatus according to claim 5, wherein the supervision information of the coarse-grained human-shaped contour segmentation model is segmentation markers for blocking processing, and the segmentation markers are used for shielding background in the image.