Pedestrian re-identification method and system based on background graying
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a pedestrian re-identification method and system based on background graying.
Background
Human re-identification (ReID) is a well studied problem in computer vision, by retrieving images of specific pedestrians from a large number of images taken by different cameras. The method is the most basic visual identification problem in video monitoring and has wide application prospect. The conventional methods proposed by ReID mostly use low-order features of color histograms and texture histograms, and use metric learning to find a distance function that minimizes the distance between images from the same class and maximizes the distance between different classes. ReID has been studied in academia for many years, and has not achieved a significant breakthrough until the last few years with the development of deep learning. However, the problem that cannot be ignored is that the depth convolution characteristic is a high-dimensional characteristic, and is easily influenced by factors such as human postures, object shielding, different illumination intensities, background clutter and the like.
Many methods have been proposed over the past few years to obtain more robust features. But these methods tend to take the entire image directly as input. Such global information contains not only pedestrian features but also clutter features of the background. Currently, effective methods for alleviating the influence of background clutter are mainly divided into two types: 1) methods based on body region detection, such as pose and keypoint estimation, extract body information in images using methods of region detection. 2) A method based on human body segmentation. Recent existing image segmentation methods including FCN, mask R-CNN, jppnet, etc. have been able to obtain excellent effects in terms of background truncation.
But in a practical unconstrained scenario, pedestrian re-identification remains a very challenging task. How to extract the discriminative and robust features of the background clutter invariance is a core problem, because the non-pedestrian part in the background can cause great interference to the features of the foreground information.
Most of the existing methods for solving background interference are based on body region detection or using segmentation to filter out the background, but they still have one of the following limitations. First, additional detection models and segmentation models need to be pre-trained, as well as additional data acquisition efforts. Secondly, potential data set deviation between the attitude estimation and the reid can cause partitioning errors and destroy the original complete body type characteristics of pedestrians. And thirdly, although the background is partially cut off, the background still exists around the pedestrians and participates in model training with the weight equivalent to the pedestrian area. By doing so, these methods do not really address the background clutter problem, which does not present a fundamental solution. Fourthly, the defects brought by the strong and hard segmentation not only destroy the original structure and smoothness of the image, but also completely abandon all background information. Some context information may sometimes be useful context information, and ignoring all of it may ignore some clues about the task of pedestrian re-identification.
Disclosure of Invention
The invention aims to provide a pedestrian re-identification method and system based on background graying, which are beneficial to weakening background interference and improving the accuracy of pedestrian re-identification.
In order to achieve the purpose, the invention adopts the technical scheme that: a pedestrian re-identification method based on background graying comprises the following steps: s1, carrying out background gray processing on the original image, and keeping the foreground information of the image unchanged to obtain a processed BGg image, namely a background gray image;
s2, based on a ResNet50 two-way network, wherein one way is background gray level Stream BGg-Stream, the background gray level image obtained in the step S1 is subjected to feature extraction, and the other way is global Stream G-Stream, and the original image is subjected to feature extraction;
s3, interacting the two networks obtained in the step S2 through cascade connection;
s4, connecting the two networks and combining the two characteristics by using a feature diagram obtained by the two networks at each stage as an input feature diagram of the next layer of convolution of BGg-stream through a space-channel attention mechanism module, namely an SCAB module;
and S5, after feature extraction, updating network parameters by adopting a triple loss function, then carrying out similarity calculation, and finally outputting a permutation sequence.
Further, the input of the BGg-stream is BGg images, and the background gray-scale formula of the images is as follows:
BGg(i,j)=0.299×R(i,j)+0.587×G(i,j)+0.114×B(i,j)
where BGg (i, j) is the pixel value of BGg image at row i and column j, and R, G, B is the three channels of the RGB image.
Further, the SCAB module comprises a channel attention module and a space attention module.
Further, the channel attention module is implemented as follows: given input feature mapping Fi∈RC*H*WFirstly, the spatial information of the feature map is aggregated by using average pooling operation to generate a spatial context descriptor Fi∈RC*1*1Compressing and converting the spatial information into a channel; then the activation size of the concealment is set to Fi∈RC/r*1*1To reduce the parameter overhead, wherein R is the reduction rate; thus, the channel attention module is represented as:
Fii=σ(θ(R(ζ(δ(Fi)))))
wherein F
iThe method comprises the steps that the characteristics of BGg-Stream of each stage are represented, sigma is a sigmoid activation function, and R is a ReLU activation function; δ is the average pool number; θ and ζ are represented as two distinct tie layers;
representing element-wise multiplication.
Further, the space attention module is implemented as follows: mapping F to input featuresi∈RC*H*WWhereinC is the total number of channels, H × W is the size of the feature map, then the space attention module is represented as:
Fiii=σ(C(Fi))
wherein sigma is sigmoid activation function, and the output of the space attention module is Fiii∈R1*H*W。
Further, a local feature response maximization strategy, namely an LRM strategy, is applied in the testing phase; in the test process, the feature map is divided into n regions with proper number according to the level, and the feature with the maximum response is extracted from each part of features to be used as the feature of the part.
The invention also provides a pedestrian re-identification system based on background graying, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein when the computer program is run by the processor, the steps of the method are realized.
Compared with the prior art, the invention has the following beneficial effects:
1. the present invention does not require additional training and data sets.
2. The invention can keep the integrity and effectiveness of the original body type information of the pedestrian.
3. The invention can accurately position the human body area and process all backgrounds including the whole body range of the pedestrian.
4. The invention can not be interfered by strong color in the background and can leave useful information.
5. The method and the device enable the model to be concentrated in the learning of the foreground information in the learning process, and further weaken background interference.
Drawings
FIG. 1 is a flow chart of a method implementation of an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of a channel attention module in the embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a space attention module according to an embodiment of the present invention.
Detailed Description
The invention is further explained below with reference to the drawings and the embodiments.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
As shown in fig. 1, the present embodiment provides a pedestrian re-identification method based on background graying, which includes the following steps:
s1, performing background graying processing on the original image by using mask, while the foreground information of the image remains unchanged, to obtain a processed BGg image, i.e., a background grayed image (an image in which the foreground is an RGB image and the background is a grayscale image).
S2, based on a ResNet50 two-way network, wherein one way is background gray level Stream BGg-Stream, the background gray level image obtained in the step S1 is subjected to feature extraction, and the other way is global Stream G-Stream, and the original image is subjected to feature extraction;
s3, interacting the two networks obtained in the step S2 through cascade connection;
s4, connecting the two networks and combining the two characteristics by using a feature diagram obtained by the two networks at each stage as an input feature diagram of the next layer of convolution of BGg-stream through a space-channel attention mechanism module, namely an SCAB module;
and S5, after feature extraction, updating network parameters by adopting a triple loss function, then carrying out similarity calculation, and finally outputting a permutation sequence.
In this embodiment, the input of BGg-stream is BGg image, and the background gray-scale formula of the image is:
BGg(i,j)=0.299×R(i,j)+0.587×G(i,j)+0.114×B(i,j)
where BGg (i, j) is the pixel value of BGg image at row i and column j, and R, G, B is the three channels of the RGB image.
The SCAB module comprises a channel attention module and a space attention module.
In this embodiment, the complete structure of the channel attention module is shown in fig. 2. The channel attention module is realized as follows: given input feature mapping Fi∈RC*H*WFirstly, the spatial information of the feature map is aggregated by using average pooling operation to generate a spatial context descriptor Fi∈RC*1*1Compressing and converting the spatial information into a channel; then the activation size of the concealment is set to Fi∈RC/r*1*1To reduce the parameter overhead, wherein R is the reduction rate; thus, the channel attention module is represented as:
Fii=σ(θ(R(ζ(δ(Fi)))))
wherein F
iThe method comprises the steps that the characteristics of BGg-Stream of each stage are represented, sigma is a sigmoid activation function, and R is a ReLU activation function; δ is the average pool number; θ and ζ are represented as two distinct tie layers;
representing element-wise multiplication.
In this embodiment, the complete structure of the space attention module is shown in fig. 3. The space attention module is realized as follows: mapping F to input featuresi∈RC*H*WWhere C is the total number of channels and H × W is the size of the feature map, the spatial attention module is expressed as:
Fiii=σ(C(Fi))
wherein sigma is sigmoid activation function, and the output of the space attention module is Fiii∈R1*H*W。
In this embodiment, a local feature response maximization strategy, i.e. an LRM strategy, is applied in the testing phase; in the test process, the feature map is horizontally divided into n regions of an appropriate number, where n is 8 in this embodiment, and the feature with the largest response is extracted for each partial feature as the feature of the partial feature.
The embodiment also provides a pedestrian re-identification system based on background graying, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein when the computer program is run by the processor, the steps of the method are realized.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing is directed to preferred embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow. However, any simple modification, equivalent change and modification of the above embodiments according to the technical essence of the present invention are within the protection scope of the technical solution of the present invention.