CN113627272A

CN113627272A - Serious misalignment pedestrian re-identification method and system based on normalization network

Info

Publication number: CN113627272A
Application number: CN202110812700.0A
Authority: CN
Inventors: 杨华; 何远航
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2021-07-19
Filing date: 2021-07-19
Publication date: 2021-11-09
Anticipated expiration: 2041-07-19
Also published as: CN113627272B

Abstract

The invention discloses a serious misalignment pedestrian re-identification method and a serious misalignment pedestrian re-identification system based on a normalization network, wherein the method comprises the following steps: according to each input image I^wCollecting corresponding randomly clipped non-aligned images I with the same identity^u(ii) a For input image I^w、I^uProcessing and obtaining characteristic diagram

For input image I^w、I^uProcessing and acquiring a posture graph P^w、P^u(ii) a Using a gesture guidance generator, from an input feature map

And attitude map P^w、P^uGenerating a feature map

Using feature maps

And

training the posture guide generator and the posture guide discriminator; for characteristic diagram

Processing and acquiring high-order characteristic diagram

For high-order characteristic diagram

Processing to obtain feature vector

And

based on the attitude map P^w、P^uComputing feature vectors

(ii) visibility of; acting on the feature vector with classification loss and distance loss

And

thereby enabling supervision of network training. The method of the invention is robust and generates alignments for similarity measurementsThe feature map enhances the robustness under complex situations.

Description

Serious misalignment pedestrian re-identification method and system based on normalization network

Technical Field

The invention relates to the technical field of computer vision, in particular to a serious misalignment pedestrian re-identification method and system based on a normalization network.

Background

People Re-identification (Re-ID) is intended to match people images in different camera views. Most research has focused on the traditional Re-ID task. Sometimes, the pedestrian is blocked by an object or other person. Partial Re-ID studied this problem, making it closer to a real scene. It is still assumed that the pedestrian detection is perfect and that the detected pedestrian images are spatially aligned. In fact, a large number of real person images are not cropped out of the exact bounding box, especially when the detector is affected by different camera heights, complex lighting conditions or severe occlusion, resulting in meaningless background filling or severe body part loss. We call this task a severely misaligned pedestrian Re-ID.

When the traditional integral Re-ID tasks (such as Tianlong Chen, Shaojin Ding, Jingyi Xie, Y e Y uon, Wuyang Chen, Y ang Y ang, Zhou Ren, and Zhangyang Wang, Abd-net: extensive but reverse person Re-identification, "in ICCV,2019, pp. 8351-8361.) are directly applied to Re-ID with serious misalignment, the existing methods often cannot work well due to weak alignment capability. Existing partial Re-ID task methods (e.g., Guan' an Wang, Shuo Y ang, Huang u Liu, Zherching Wang, Y ang Y ang, Shuliang Wang, Gang Y u, Erjin Zhou, and Jian Sun, "High-order information models: Learning registration and topology for encapsulated person Re-identification," in CVPR,2020, pp.6449-6458) typically use an attention-based convolution sub-network to segment a feature map into different parts to improve anti-occlusion capability. However, due to the lack of segmentation labels, inaccuracy of pose key points, non-correspondence of spaces between different hierarchical feature maps, and huge data deviation between different scenes, a subnetwork is generally difficult to train well. These limit the stability of the segmentation, especially in complex and realistic scenes where there is severe misalignment.

Disclosure of Invention

The invention provides a serious misalignment pedestrian re-identification method and system based on a normalization network aiming at the problems in the prior art, which summarize the defects of the existing pedestrian re-identification method, do not adopt a mode of dynamically dividing a feature map to align parts, but combine the stability of the whole method and generate an aligned feature map for similarity measurement, thereby enhancing the robustness under a complex situation.

In order to solve the technical problems, the invention is realized by the following technical scheme:

the invention provides a serious misalignment pedestrian re-identification method based on a normalization network, which comprises the following steps:

s11: according to each input image I^wCollecting corresponding randomly clipped non-aligned images I with the same identity^u；

S12: for the input image I^w、I^uProcessing and obtaining a characteristic diagram F_i ^u、F_i ^w；

S13, for the input image I^w、I^uProcessing and acquiring a posture graph P^w、P^u；

S14: using the gesture guidance generator, from the input feature map F_i ^uAnd attitude map P^w、P^uGenerating a feature map

S15: using feature maps F_i ^wAnd

training the posture guide generator and the posture guide discriminator;

s16: for the characteristic diagram F_i ^u、F_i ^wProcessing and acquiring a high-order characteristic diagram F_o ^u、F_o ^w；

S17: for the high-order characteristic diagram

Processing to obtain a feature vector f₁ ^w...f_n ^wAnd f₁ ^u...f_n ^u；

S18: based on the attitude map P^w、P^uCalculating the feature vector f₁ ^u...f_n ^u(ii) visibility of;

s19: by using the effects of classification loss and distance lossThe feature vector f₁ ^w...f_n ^wAnd f₁ ^u...f_n ^uThereby enabling supervision of network training.

Preferably, the S12 includes: using a convolutional neural network to the input image I^w、I^uProcessing and obtaining a characteristic diagram F_i ^u、F_i ^w(ii) a Further, the air conditioner is provided with a fan,

the convolutional neural network in S12 is a ResNet50 front three-layer network.

Preferably, the S13 includes: for the processed input image I by using attitude estimation algorithm^w、I^uProcessing and acquiring a posture graph P^w、P^u(ii) a Further, the air conditioner is provided with a fan,

the attitude estimation algorithm is OpenPose.

Preferably, the posture guidance generator in S14 includes: t blocks, wherein each block contains a spatial recomposition operation.

Preferably, the training process in S15 is applied to the L2 distance loss and discrimination loss; further, the air conditioner is provided with a fan,

the calculation method of the L2 distance loss is as follows:

preferably, the S16 includes: using a convolutional neural network to the feature map F_i ^u、F_i ^wProcessing and acquiring a high-order characteristic diagram F_o ^u、F_o ^w(ii) a Further, the air conditioner is provided with a fan,

the convolutional neural network in S16 is a ResNet50 layer four network.

Preferably, the S17 includes: using convolution and pooling operations on the high-order feature map F_o ^u、F_o ^wProcessing to obtain a feature vector f₁ ^w...f_n ^wAnd f₁ ^u...f_n ^u(ii) a Further, the air conditioner is provided with a fan,

the convolution is a convolution with a kernel size of 1 × 1, and the pooling operation is global average pooling.

Preferably, the classification loss in S19 is a cross-entropy loss, which is structured as follows:

wherein [ a, b ] is the visible interval in [1, n ]; the Class represents that the scale of the feature vector is scaled to a classification scale through a full-connection layer network, and the cross entropy represents cross entropy calculation;

the distance loss is:

preferably, the step S19 is followed by:

s110: the network re-identification effectiveness after S19 was tested using CMC and mag as characterizations.

The invention also provides a serious misalignment pedestrian re-identification system based on the normalization network, which comprises the following components: the system comprises a data collection module, a feature map acquisition module, a posture guide generator, a posture guide training module, a high-order feature acquisition module, a feature vector acquisition module, a component visibility calculation module and a network training supervision module; wherein the content of the first and second substances,

the data gathering module is used for collecting I according to each input image^wCollecting corresponding randomly clipped non-aligned images I with the same identity^u；

The characteristic diagram acquisition module is used for acquiring the input image I^w、I^uProcessing and obtaining a characteristic diagram F_i ^u、F_i ^w；

The attitude map acquisition module is used for acquiring the input image I^w、I^uProcessing and acquiring a posture graph P^w、P^u；

The above-mentionedThe gesture guidance generator is used for utilizing the gesture guidance generator according to the input feature diagram F_i ^uAnd attitude map P^w、P^uGenerating a feature map

The posture guide training module is used for utilizing a feature map F_i ^wAnd

training the posture guide generator and the posture guide discriminator;

the high-order characteristic acquisition module is used for acquiring the characteristic diagram F_i ^u、F_i ^wProcessing and acquiring high-order characteristic diagram

The feature vector acquisition module is used for acquiring the high-order feature map

Processing to obtain a feature vector f₁ ^w...f_n ^wAnd f₁ ^u...f_n ^u；

The component visibility calculation module is to calculate a component visibility based on the pose graph P^w、P^uCalculating the feature vector f₁ ^u...f_n ^u(ii) visibility of;

the network training supervision module is used for acting on the feature vector f by using classification loss and distance loss₁ ^w...f_n ^wAnd f₁ ^u...f_n ^uThereby enabling supervision of network training.

Compared with the prior art, the embodiment of the invention has at least one of the following advantages:

(1) according to the serious misalignment pedestrian re-identification method and system based on the normalization network, the aligned characteristic diagram is obtained through the generation mode, and the variable part positions are positioned in the non-segmentation mode, so that the robustness under the complex environment is enhanced;

(2) according to the serious non-alignment pedestrian re-identification method and system based on the normalization network, the network can be trained through the alignment/non-alignment image pair subjected to random cutting processing, and an accurate human body segmentation data label is not needed;

(3) the serious misalignment pedestrian re-identification method and system based on the normalization network, provided by the invention, are used for feature filtering and alignment through the space reconstruction operation of the attitude guide generator, so that the space dependence among modules is reduced; and the credibility of different generation areas is distinguished through visibility calculation;

(4) according to the serious non-alignment pedestrian re-recognition method and system based on the normalization network, provided by the invention, the training process does not depend on the attitude point information excessively, the attitude point is higher in tolerance, and the training process does not need additional segmentation information to participate in training.

Drawings

Embodiments of the invention are further described below with reference to the accompanying drawings:

FIG. 1 is a flowchart of a method for re-identifying severely misaligned pedestrians based on a normalization network according to an embodiment of the present invention;

FIG. 2 is a block diagram of a preferred embodiment of a gesture guidance generator;

fig. 3 is a schematic diagram illustrating a specific framework and a generation manner of the spatial recomposition module according to a preferred embodiment of the invention.

Detailed Description

The following examples are given for the detailed implementation and specific operation of the present invention, but the scope of the present invention is not limited to the following examples.

Fig. 1 is a flowchart illustrating a method for re-identifying severely misaligned pedestrians based on a normalization network according to an embodiment of the present invention.

Referring to fig. 1, the method for re-identifying severely misaligned pedestrians based on a normalization network of the present embodiment includes:

the method for collecting and primarily processing the paired data specifically comprises the following steps:

In particular, from the input image I^wIn the same data set, other images of the person are collected and randomly cut to obtain an image I^u；

S12: for input image I^w、I^uProcessing and obtaining a characteristic diagram F_i ^u、F_i ^w；

S13 for input image I^w、I^uProcessing and acquiring a posture graph P^w、P^u；

Applying supervision to the gesture generator, specifically comprising:

S15: using feature maps F_i ^wAnd

training the posture guide generator and the posture guide discriminator;

applying supervision to the network end, specifically comprising:

s16: for feature map F_i ^u、F_i ^wProcessing and acquiring high-order characteristic diagram

S17: for high-order characteristic diagram

The treatment is carried out, and the treatment is carried out,obtain the feature vector f₁ ^w...f_n ^wAnd f₁ ^u...f_n ^u；

s19: applying classification loss and distance loss to feature vector f₁ ^w...f_n ^wAnd f₁ ^u...f_n ^uThereby enabling supervision of network training.

According to the embodiment of the invention, a mode of dynamically segmenting the feature map to align the parts is not adopted, but the stability of the whole method is combined, and the aligned feature map is generated for the similarity measurement, so that the robustness under a complex situation is enhanced.

In one embodiment, S12 includes: using a convolutional neural network to the input image I^w、I^uProcessing and obtaining a characteristic diagram F_i ^u、F_i ^w. Preferably, the convolutional neural network in S12 is a ResNet50 top three-layer network. The selection is beneficial to considering the corresponding relation between the feature expression capability and the position of the spatial feature map in the subsequent feature map comparison process.

In one embodiment, S13 includes: using attitude estimation algorithm to input image I^w、I^uProcessing and acquiring a posture graph P^w、P^u. Preferably, the pose estimation algorithm is openpos.

In a preferred embodiment, the gesture guidance generator in S14 includes: and T blocks, as shown in fig. 2, are a specific framework and a schematic diagram of a generation manner of the gesture guidance generator in an embodiment. Each block includes a spatial recomposition module, and fig. 3 is a schematic diagram illustrating a specific framework and a generation manner of the spatial recomposition module in an embodiment. In the attitude guidance generator, two-way characteristic modes of attitude point characteristics and images are used. In the spatial recombination module, the features of a certain height at each stage are obtained by weighting the features of all the heights from the previous stage according to independent weight coefficients. The operation breaks up the dependency relationship between the front and the rear stages in the height direction, and is more beneficial to the spatial recombination in the height direction.

In one embodiment, the training process in S15 applies to the L2 distance loss and discrimination loss. Preferably, the L2 distance loss is calculated by:

the L2 distance loss shortens the space distance of the same ID personnel image, and identifies the characteristic expression capability of the loss constraint to make the loss constraint meet the original classification characteristic.

In one embodiment, S16 includes: using a convolutional neural network, for the feature map F_i ^u、F_i ^wProcessing and acquiring high-order characteristic diagram

Preferably, the convolutional neural network in S16 is a ResNet50 layer four network. Similar to the selection in S12, this selection is beneficial to considering the feature expression ability and the spatial feature map position correspondence relationship in the subsequent feature map comparison process.

In one embodiment, S17 includes: using convolution and pooling operations on high-order feature maps

Processing to obtain a feature vector f₁ ^w...f_n ^wAnd f₁ ^u...f_n ^u. Preferably, the convolution is a convolution with a kernel size of 1 × 1, and the pooling operation is a global average pooling.

In the preferred embodiment, the classification penalty in S19 is a cross-entropy penalty, which is structured as follows:

wherein [ a, b ] is a visible interval in [1, n ], Class represents scaling of the feature vector to a classification scale through a full-connection layer network, and cross entropy represents cross entropy calculation. The classification loss constrains the feature expression ability to satisfy the original classification characteristics.

The distance loss shortens the space distance of the images of the persons with the same ID, and is as follows:

in a preferred embodiment, S19 is followed by:

The technical solution in the above embodiment of the present invention is further described in detail below with reference to a specific example.

The images used in this particular example were from database Market1501 (see: Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian, "Scalable person re-identification: Abenchmark," in ICCV,2015, pp.1116-1124.) for anomaly detection performance evaluation.

As shown in Table 1, the method is a schematic diagram of the pedestrian re-identification result under the final serious misalignment based on the method of the above embodiment of the present invention, and CMC values (Rank-1, Rank-5, Rank-10) and mAP are used as the characteristics. Conventional systemic methods, such as PCB (see Yifang Sun, Huang Zheng, Yi Yang, Qi Tian, and Shengjin Wang, "Beyond and part models: Person regenerative with regenerative partial discharge (and a linear coherent bank)," in ECCV,2018, pp.480-496 "), MGN (see Guianshu Wang, Yoefeng Yuan, Xiong Chen, Jiweli, and Xi Zhou," guiding discrete reactive feeds with multiple nuclear discharge-identification, "in ACM, 2018, 274-282 ], st-Reid (see Guingcon, Jian Yang Cheng, Lang Huang, variance, ye Yuan, Wuyang Chen, Yang Yang Yang, Zhou Ren, and Zhang Yang Wang, "Abd-net: attention but not both sides of the person who is walking, in ICCV,2019, pp.8351-8361.), resolution adaptive algorithm (see Lingxiao He, Jian Liang, Haiqing Li, and Zhang Sun," Deep specific feature acquisition for partial person re-identification: Alignment-free adaptation, "in CVPR,2018, pp.7073-7082)," Lingxiao He, Zhang Sun, Yuha Zhu, and Yunbo, "recording medicine for diagnosis, Yang Yang Yang Yang Yang Yang Yang Yang Yang Yang Yang Yang Yang Yang Yang Yang, Yang Ren, Zhang Yang Ren, Yang Yang Xin, Yang Zhu, and Yang for strengthening, Yang Yang Yang Yang Yang Yang Yang, Yang Yang, Yang Yigan, and Yang Yigan, J-Pose, J-Yang Gui-Pose, J-Yang-Pose, J-P3, J-Pose, J-Yigan, J-Pose, ping Liu, Yuhang Ding, and Yi Yang, "position-defined feature alignment for enclosed person re-identification," in ICCV,2019, pp.542-551 "), posture guidance part pedestrian re-identification method (see Shang Gao, Jingya Wang, Huchuan Lu, and Zimo Liu," position-defined visual feature alignment for enclosed person re, "in CVPR,2020, pp.11744-11752.), (see Guan' Wang, Shuo Yang, Huang Liu, Zhing Wang, Yang Yang Yang Yang Yang, Shuliang Wang, Gang Yu, Erjin Zhou, and Jian Sun, and" High-order features: laser alignment, and CVJ found in the invention, 6449, which is more specific than the other methods.

Table 1 comparison of the performance of the method proposed by the present invention with existing algorithms

In another embodiment of the present invention, a system for re-identifying severely misaligned pedestrians based on a normalized network is further provided, which is used to implement the method for re-identifying severely misaligned pedestrians based on a normalized network of the foregoing embodiment, and includes: the device comprises a data collection module, a feature map acquisition module, a posture guide generator, a posture guide training module, a high-order feature acquisition module, a feature vector acquisition module, a component visibility calculation module and a network training supervision module.

Wherein the data collecting module is used for collecting the data according to each input image I^wCollecting corresponding randomly clipped non-aligned images I with the same identity^u；

The characteristic diagram acquisition module is used for inputting an image I^w、I^uProcessing and obtaining a characteristic diagram F_i ^u、F_i ^w；

The attitude map acquisition module is used for acquiring an input image I^w、I^uProcessing and acquiring a posture graph P^w、P^u；

The gesture guidance generator is used for utilizing the gesture guidance generator according to the input feature diagram F_i ^uAnd attitude map P^w、P^uGenerating a feature map

The posture guiding training module is used for utilizing the feature map F_i ^wAnd

training the posture guide generator and the posture guide discriminator;

the high-order characteristic acquisition module is used for acquiring a characteristic diagram F_i ^u、F_i ^wProcessing and acquiring a high-order characteristic diagram F_o ^u、F_o ^w；

The feature vector acquisition module is used for carrying out high-order feature map F_o ^u、F_o ^wProcessing to obtain a feature vector f₁ ^w...f_n ^wAnd

the component visibility calculation module is used for calculating the visibility of the component based on the attitude map P^w、P^uComputing feature vectors

(ii) visibility of;

the network training supervision module is used for acting on the feature vector f by using classification loss and distance loss₁ ^w...f_n ^wAnd

thereby enabling supervision of network training.

According to the severe misalignment pedestrian re-identification method and system based on the normalized network, the low-order alignment feature map is generated, and cross-image supervision and pre-training integral model supervision is used for carrying out antagonistic training on the network, so that the universality and the robustness of the network are improved; the network can be trained by randomly clipping the aligned/non-aligned image pairs without the need for accurate human segmentation data labels.

In the above embodiments of the present invention, the description of each embodiment has a respective emphasis, and reference may be made to related descriptions of other embodiments for parts that are not described in detail in a certain embodiment.

In the embodiments provided in the present invention, it should be understood that the disclosed technical contents can be implemented in other manners. The above-described embodiments of the system are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed, another point, and the shown or discussed coupling or direct coupling or communication connection between each other may be through some interfaces, indirect coupling or communication connection between units or modules, and may be electrical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software function and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) to execute all or part of the steps of the method according to the embodiments of the present invention. The second aforementioned storage medium includes: u disk, ROM, RAM, removable hard disk, magnetic or optical disk, etc. for storing program codes.

The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, and not to limit the invention. Any modifications and variations within the scope of the description, which may occur to those skilled in the art, are intended to be within the scope of the invention.

Claims

1. A serious misalignment pedestrian re-identification method based on a normalization network is characterized by comprising the following steps:

s11: according to each input image I^wCollecting corresponding randomly cropped non-aligned images of the same identityI^u；

S13: for the input image I^w、I^uProcessing and acquiring a posture graph P^w、P^u；

S15: using feature maps F_i ^wAnd

training the posture guide generator and the posture guide discriminator;

s16: for the characteristic diagram F_i ^u、F_i ^wProcessing and acquiring high-order characteristic diagram

S17: for the high-order characteristic diagram

Processing to obtain feature vector

And

s18: based on the attitude map P^w、P^uCalculating the feature vector

Can seeSex;

s19: acting on the feature vector with classification loss and distance loss

And

thereby enabling supervision of network training.

2. The method for re-identifying severely misaligned pedestrians based on a normalized network as claimed in claim 1, wherein the S12 includes: using a convolutional neural network to the input image I^w、I^uProcessing and obtaining a characteristic diagram F_i ^u、F_i ^w(ii) a The convolutional neural network is a ResNet50 top three-layer network.

3. The method for re-identifying severely misaligned pedestrians based on a normalized network as claimed in claim 1, wherein the S13 includes: using an attitude estimation algorithm on the input image I^w、I^uProcessing and acquiring a posture graph P^w、P^u(ii) a The attitude estimation algorithm is OpenPose.

4. The normalization network based severe misalignment pedestrian re-recognition method according to claim 1, wherein the gesture guidance generator in S14 comprises: t blocks, wherein each block contains a spatial recomposition operation.

5. The method of claim 1, wherein the training in S15 applies to L2 distance loss and discrimination loss; the calculation method of the L2 distance loss is as follows:

6. the method for re-identifying severely misaligned pedestrians based on a normalized network as claimed in claim 1, wherein the S16 includes: using a convolutional neural network to the feature map F_i ^u、F_i ^wProcessing and acquiring high-order characteristic diagram

The convolutional neural network in S16 is a ResNet50 layer four network.

7. The method for re-identifying severely misaligned pedestrians based on a normalized network as claimed in claim 1, wherein the S17 includes: using convolution and pooling operations on the high-order feature map

Processing to obtain feature vector

And

8. The method of claim 1, wherein the classification penalty in S19 is a cross entropy penalty as follows:

wherein [ a, b ] is a visible interval in [1, n ], Class represents that the scale of the feature vector is scaled to a classification scale through a full-connection layer network, and cross entry represents cross entropy calculation;

the distance loss is:

9. the method for re-identifying severely misaligned pedestrians based on a normalized network as claimed in claim 1, wherein said S19 is followed by further comprising:

10. A normalization network based severe misalignment pedestrian re-identification system, comprising: the system comprises a data collection module, a feature map acquisition module, a posture guide generator, a posture guide training module, a high-order feature acquisition module, a feature vector acquisition module, a component visibility calculation module and a network training supervision module; wherein the content of the first and second substances,

The gesture guidance generator is used for utilizing the gesture guidance generator to generate the input feature diagram F_i ^uAnd attitude map P^w、P^uGenerating a feature map

The posture guide training module is used for utilizing a feature map F_i ^wAnd

training the posture guide generator and the posture guide discriminator;

Processing to obtain feature vector

And

the component visibility calculation module is to calculate a component visibility based on the pose graph P^w、P^uCalculating the feature vector

(ii) visibility of;

the network training supervision module is used for acting on the feature vector by using classification loss and distance loss

And

thereby enabling supervision of network training.