CN107729805A

CN107729805A - The neutral net identified again for pedestrian and the pedestrian based on deep learning recognizer again

Info

Publication number: CN107729805A
Application number: CN201710780179.0A
Authority: CN
Inventors: 张史梁; 田奇; 高文; 李佳宁; 苏驰
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2017-09-01
Filing date: 2017-09-01
Publication date: 2018-02-23
Anticipated expiration: 2037-09-01
Also published as: CN107729805B

Abstract

The invention discloses the neutral net identified again for pedestrian and the recognizer again of the pedestrian based on deep learning.The neutral net includes：Inputted using the original whole body images of pedestrian as first and export the first nerves network of the first identification feature；Affine transformation image using the human body image extracted from the original whole body images of pedestrian inputs as second and exports the nervus opticus network of the second identification feature, wherein, human body comprises at least head, trunk and four limbs, and the first identification feature and the second identification feature are combined as total identification feature.Pedestrian's characteristic matching ability with more robust, so as to improve correct recognition rata and/or reduce misclassification rate.

Description

Neural network for pedestrian re-recognition and pedestrian re-recognition algorithm based on deep learning

Technical Field

The present invention relates to the field of convolutional neural networks and image recognition. More particularly, the present invention relates to a neural network for pedestrian re-recognition and a pedestrian re-recognition algorithm based on deep learning.

Background

With the popularization of video monitoring technology, the role of pedestrian re-identification technology becomes increasingly important, and the pedestrian re-identification technology can help people to automatically complete the task of searching for specific personnel from massive image or video data.

Feature extraction and feature matching using convolutional neural networks are two important components of pedestrian re-identification technology. However, there are usually large posture changes and complicated viewing angle changes between different images of the pedestrian acquired by different cameras, which both greatly increase the difficulty of matching the pedestrian with the pedestrian re-recognition algorithm.

At least in order to solve the above technical problems, a new pedestrian re-identification algorithm capable of adapting to the posture change needs to be provided, so that the algorithm can better adapt to the posture change of the pedestrian, extract more robust features, and thus improve the final correct identification rate and/or reduce the false identification rate.

Disclosure of Invention

The purpose of the invention is realized by the following technical scheme.

A neural network for pedestrian re-identification, comprising:

a first neural network that uses an original whole-body image of a pedestrian as a first input and outputs a first recognition feature;

a second neural network that uses an affine transformation image of a human body part image extracted from an original whole-body image of a pedestrian as a second input and outputs a second recognition feature,

the human body part at least comprises a head, a trunk and four limbs, and the first identification feature and the second identification feature are combined into a total identification feature.

The neural network for pedestrian re-recognition according to the present invention further comprises:

a feature embedding sub-neural network (FEN) includes a Pose Transformation Network (PTN) for applying an affine transformation to each body part in a body part map to obtain a more robust affine transformation image of the body part map image.

a feature weighting sub-neural network (FWN) for weighting and biasing the second identifying features output by the second neural network to combine the first identifying features and the second identifying features into a total identifying feature.

According to the neural network for pedestrian re-identification of the present invention, the first neural network includes a first Convolutional Neural Network (CNN) and a second CNN, and the second neural network includes a third CNN and a fourth CNN, wherein the first CNN and the third CNN share a weight.

According to the neural network for pedestrian re-recognition of the present invention, the second CNN and the fourth CNN adopt independent weights.

The neural network for pedestrian re-identification according to the invention is characterized in that the first neural network and the second neural network each further comprise a Convolution (CONV) layer and a Global Average Pooling (GAP) layer at the respective output.

The pedestrian re-identification algorithm based on deep learning comprises the following steps:

constructing a neural network for pedestrian re-identification according to the above;

training the neural network for pedestrian re-recognition using a pedestrian re-recognition dataset;

pedestrian re-recognition is performed using a trained neural network for pedestrian re-recognition.

The invention has the advantages that: the pedestrian feature matching method has more robust pedestrian feature matching capability, so that the correct recognition rate can be improved and/or the false recognition rate can be reduced.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the specific embodiments. The drawings are only for purposes of illustrating the particular embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

fig. 1 shows a conceptual schematic diagram of a first neural network for pedestrian re-identification according to an embodiment of the present invention.

Fig. 2 shows an overall schematic diagram of a second type of neural network for pedestrian re-identification according to an embodiment of the invention.

Fig. 3 shows a schematic block diagram of an FEN comprised by a second neural network for pedestrian re-identification according to an embodiment of the present invention.

Fig. 4 shows a partial processing result of the FEN included in the second neural network for pedestrian re-recognition according to the embodiment of the present invention.

Fig. 5 shows a schematic block diagram of a PTN in an FEN included in a second neural network for pedestrian re-recognition according to an embodiment of the present invention.

Fig. 6 shows a detailed view of a third neural network for pedestrian re-identification according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Fig. 1 shows a conceptual schematic diagram of a first neural network 100 for pedestrian re-identification according to an embodiment of the present invention.

As shown in fig. 1, a first type of neural network 100 for pedestrian re-identification includes a first neural network 101 and a second neural network 103.

As shown in fig. 1, the first neural network 101 uses an original whole-body image of a pedestrian as a first input and outputs a first recognition feature F_global。

The second neural network 103 uses an affine transformation image of the human body part image extracted from the original whole-body image of the pedestrian as a second input and outputs a second recognition feature F_part。

Wherein the human body part at least comprises a head, a trunk and four limbs, and a first identification characteristic F_globalAnd a second identifying feature F_partCombined into a total recognition feature F_fusion。

For example, as known to those skilled in the art, the original whole-body image of the pedestrian may be selected from pedestrian re-recognition data sets such as CUHK 03, Market1501, and VIPeR for training the weight of the first neural network or testing the correct recognition rate.

For example, as known to those skilled in the art, the body part image may be manually extracted from the original whole-body image of the pedestrian by an artificial manner, and the affine transformation parameters are manually adjusted to obtain an affine transformation image, which is used as the second input for training the weight of the second neural network or testing the correct recognition rate.

The extraction of the body region image can also be performed by a pose estimation algorithm commonly used in the art (e.g., a pose estimation algorithm based on a Full Convolution Network (FCN)), and specific technical solutions refer to j.long, e.shell, and article "full volumetric networks for segmentation", published by t.darrell in 2015 on CVPR.

Fig. 2 shows an overall schematic block diagram of a second type of neural network 200 for pedestrian re-identification according to an embodiment of the present invention.

As shown in fig. 2, in order to realize a function of automatically acquiring an affine transformation image of a human body part image from an original whole body image of a pedestrian, a second neural network 200 for pedestrian re-recognition includes a FEN 201 in addition to a first neural network 101 and a second neural network 103.

The FEN 201 is configured to automatically acquire a human body part image from an original whole-body image of a pedestrian, then perform affine transformation on the human body part image using PTN, which takes the original whole-body image of the pedestrian as an input, and output the image of the human body part image after arithmetic processing, which has undergone PTN affine transformation, to the second neural network 103.

Fig. 3 shows a schematic block diagram of an FEN 201 comprised by a second neural network 200 for pedestrian re-identification according to an embodiment of the present invention.

As shown in fig. 3, FEN 201 includes an FCN-based pose estimation module 301 using existing techniques (e.g., as disclosed by j. long et al and which may be separately trained and incorporated into FEN 201), a human body position image extraction module 303 using techniques customary in the art, and first, second, third, fourth, and fifth PTNs 305, 307, 309, 311, 313. The first PTN 305, the second PTN 307, the third PTN 309, the fourth PTN 311, and the fifth PTN 313 are respectively configured to perform affine transformation on the images of the left arm, the right arm, the torso, the left leg, and the right leg of the pedestrian extracted by the human body part image extraction module 303, and finally obtain affine transformation images of human body parts other than the head. Finally, the FEN 201 combines the head original image and the affine transformation image of each human body part other than the head, and outputs the combined image as a PTN affine-transformed image.

That is, the FEN 201 includes PTN for applying affine transformation to each human body part in the human body parts map to obtain an affine transformation image of a more robust human body parts map image.

Fig. 4 shows a partial processing result of the FEN 201 included in the second neural network 200 for pedestrian re-recognition according to the embodiment of the present invention.

As shown in fig. 4, its subgraph (a) is a pair of original whole-body images of the same pedestrian (different view angles are adopted for camera shooting). Sub-diagram (b) is a diagram of the human body posture estimation result obtained by the human body posture estimation module 301 using FCN, for example. Sub-diagram (c) is a pair of joint indication diagrams labeled 14 human joints, for example, acquired by pose estimation module 301 using FCN. Sub-diagram (d) is a pixel region corresponding to each of 6 human body parts including a head, a torso, and limbs, which are acquired by the human posture estimation module 301 using FCN, for example. Sub-diagram (e) is, for example, an image including only the above 6 human body parts acquired by using the human body part image extraction module 303. The sub-image (f) is, for example, an affine transformation image obtained by processing the image in the sub-image (e) using PTN. Optionally, the image in (e) may be rotated and scaled, and the image in (f) may be normalized.

Fig. 5 shows a schematic block diagram of a PTN in the FEN 201 included in the second neural network for pedestrian re-recognition according to the embodiment of the present invention.

As shown in fig. 5, the PTN network (i.e., the first PTN 305, the second PTN 307, the third PTN 309, the fourth PTN 311, and the fifth PTN 313 shown in fig. 3) is used to acquire parameters a for performing affine transformation of an image of limbs (left arm, right arm, left leg, and right leg) and an image of a trunk by training_{Theta-limbs or trunk}. The PTN network was modified based on the Spatial Transform Network (STN) disclosed in the article Spatial transform networks 2015 published by m.jaderberg, k.simonyan, a.zisserman et al on NIPS.

As shown in formula (1), the affine transformation parameter A for the human body part image obtained by the PTN network_θIs 6-dimensional, of which theta₁、θ₂、θ₄And theta₅Is the scaling and rotation parameters, and theta₃And theta₆Is a translation parameter. (x)^s，y^s) Is the pixel coordinate in the original image of the body part, and (x)^t，y^t) Is the pixel coordinates in the affine transformation image of the human body part.

Optionally, as shown in fig. 2, the second neural network 200 for pedestrian re-identification further comprises FWN 203.

FWN 203 is used to weight and bias the second identifying features output by the second neural network 103 to combine the first identifying features and the second identifying features into a total identifying feature. The weighting and biasing are implemented by equation (2).

tanh(F_part⊙W+B) (2)

Wherein,w and B are weight and bias vector, respectively, which are associated with the second identifying feature F_partIs the same dimension, ⊙ denotes a Hadamard product operation.

Also, to implement error back-propagation, the following gradient update formula may be defined for FWN 203 for training FWN 203.

Wherein f is_i∈F_fusion(i＝1，2…m+n)，g_j∈|F_global(j＝1，2…m)，p_k∈F_part(k＝1，2…n)，w_kI ∈ W (k ═ 1, 2 … n), B ∈ B (k ═ 1, 2 … n), m and n are F, respectively_globalAnd F_partDimension (d) of (a).

Fig. 6 shows a detailed schematic diagram of a third neural network 300 for pedestrian re-identification according to an embodiment of the present invention.

As shown in FIG. 6, the first neural network used by the third neural network 300 includes a first CNN and a second CNN (CNN in FIG. 6)_g) The second neural network includes a third CNN and a fourth CNN (CNN in FIG. 6)_p) And the first CNN and the third CNN share the weight value.

Alternatively, as shown in fig. 6, the second CNN and the fourth CNN used by the third neural network 300 use independent weights.

Optionally, as shown in fig. 6, the first neural network and the second neural network used by the third neural network 300 each further include a CONV layer and a GAP layer at respective output ends to map output features to appropriate dimensions for subsequent processing and may support input images of different sizes.

The invention also provides a pedestrian re-identification algorithm based on deep learning and combined with the neural network for pedestrian re-identification. The pedestrian re-identification algorithm based on deep learning comprises the following steps:

a neural network for pedestrian re-identification according to the above is constructed.

Training the neural network for pedestrian re-recognition using a pedestrian re-recognition dataset.

The technical scheme of the invention has better capability of feature extraction and feature matching, thereby improving the final correct recognition rate and/or reducing the false recognition rate.

The above description is only an exemplary embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A neural network for pedestrian re-identification, comprising:

2. The neural network for pedestrian re-identification according to claim 1, further comprising:

the characteristic embedding sub-neural network FEN comprises a posture transformation neural network PTN, and the PTN is used for applying affine transformation to each human body part in the human body part map to obtain a more robust affine transformation image of the human body part map image.

3. The neural network for pedestrian re-identification according to claim 1 or 2, further comprising:

the feature weighting sub-neural network FWN is used to weight and bias the second identifying features output by the second neural network, thereby combining the first identifying features and the second identifying features into a total identifying feature.

4. The neural network for pedestrian re-identification according to claim 3, wherein the first neural network comprises a first Convolutional Neural Network (CNN) and a second CNN, and the second neural network comprises a third CNN and a fourth CNN, wherein the first CNN and the third CNN share weight values.

5. The neural network for pedestrian re-identification, as claimed in claim 4, wherein the second CNN and the fourth CNN use independent weights.

6. The neural network for pedestrian re-identification according to claim 1 or 2, characterized in that the first and second neural networks each further comprise a convolutional CONV layer and a global average pooling GAP layer at the respective output.

7. A pedestrian re-identification algorithm based on deep learning, comprising:

constructing a neural network for pedestrian re-identification according to any one of claims 1-6;