CN110110689B - Pedestrian re-identification method - Google Patents
Pedestrian re-identification method Download PDFInfo
- Publication number
- CN110110689B CN110110689B CN201910403777.5A CN201910403777A CN110110689B CN 110110689 B CN110110689 B CN 110110689B CN 201910403777 A CN201910403777 A CN 201910403777A CN 110110689 B CN110110689 B CN 110110689B
- Authority
- CN
- China
- Prior art keywords
- pedestrian
- feature map
- channel
- feature
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the disclosure relates to a pedestrian re-identification method, which comprises the following steps: extracting a pedestrian CNN characteristic diagram from a plurality of pictures; simulating the situation that the discriminant area of the CNN feature map of the pedestrian is blocked by adopting an anti-erasure learning mode to perform model training to obtain a training model; and carrying out pedestrian re-recognition by combining the training model with the target pedestrian image and the pedestrian image to be recognized to obtain a pedestrian re-recognition result. The method provided by the embodiment of the disclosure provides a feature level data enhancement strategy, the input feature map of the auxiliary classifier is partially erased, the variation of pedestrian features is increased, the situation that pedestrians are blocked is resisted, and the generalization capability of a deep pedestrian re-recognition model is improved.
Description
Technical Field
The disclosure relates to the technical field of computer vision, in particular to a pedestrian re-recognition method.
Background
The pedestrian re-identification is to match and identify the identity of the pedestrian under a non-overlapping multi-camera monitoring system, and plays an important role in intelligent video monitoring, crime prevention, social security maintenance and the like. However, when the human body attributes such as the posture, the gait, the clothes and the like and the environmental factors such as illumination, background and the like are changed, the appearance of the same pedestrian is obviously different under different monitoring videos, and the appearances of different pedestrians are similar under a certain condition.
In recent years, a deep learning method is widely used, and the deep learning can achieve better performance than the conventional manual design method. However, deep pedestrian re-recognition models typically have a large number of network parameters, but are optimized over a limited data set, which increases the risk of overfitting and reduces generalization capability. Improving the generalization ability of the model is therefore a significant and important issue for deep pedestrian re-identification.
To improve the generalization ability of the deep convolutional neural network, variations of the training data set can be increased and a large number of pedestrian images containing occlusion situations can be collected, but only the data enhancement at the image level can be realized, and the data enhancement at the aspect beyond the image level cannot be provided so as to improve the generalization ability of the deep convolutional neural network.
The above drawbacks are to be overcome by those skilled in the art.
Disclosure of Invention
First, the technical problem to be solved
In order to solve the above-mentioned problems of the prior art, the present disclosure provides a pedestrian re-recognition method that can perform data enhancement in terms of feature level to improve generalization ability of a deep convolutional neural network.
(II) technical scheme
In order to achieve the above purpose, the main technical scheme adopted in the present disclosure includes:
an embodiment of the present disclosure provides a pedestrian re-recognition method, including:
extracting a pedestrian CNN characteristic diagram from a plurality of pictures;
simulating the situation that the discriminant area of the CNN feature map of the pedestrian is blocked by adopting an anti-erasure learning mode to perform model training to obtain a training model;
and carrying out pedestrian re-recognition by combining the training model with the target pedestrian image and the pedestrian image to be recognized to obtain a pedestrian re-recognition result.
In one embodiment of the present disclosure, the extracting a pedestrian CNN feature map from a plurality of pictures includes:
randomly selecting the plurality of pictures from a training dataset;
inputting the pictures into a plurality of different semantic layers of a ResNet50 model for extraction to obtain feature images of a plurality of channels;
processing the feature graphs of the channels by using a channel attention module to obtain a feature graph processed by the channels;
and processing the spatial context information of the feature map processed by the channel at different positions by using a spatial attention module to obtain the pedestrian CNN feature map.
In one embodiment of the disclosure, the processing the feature maps of the plurality of channels by using the channel attention module, to obtain a feature map processed by the channels includes:
obtaining a channel characteristic descriptor according to the characteristic diagram of each channel in the characteristic diagrams of the plurality of channels;
obtaining a channel attention feature map through activating function operation on the channel feature descriptors;
multiplying the channel attention profile by the aggregated profile to obtain the channel processed profile.
In one embodiment of the disclosure, the feature descriptors include statistics of the plurality of channels, and the feature descriptors are:
the statistics for each channel are:
wherein N is the number of channels, N is the number of channels, and A and B are the length and width of the feature map respectively;
the channel attention profile is:
e=σ(W 2 δ(W 1 (s)))
wherein sigma, delta represent the Sigmod activation function and the ReLU activation function, respectively,is the weight of the first fully-connected layer Fc1, -/->Is the weight of the second fully connected layer Fc2, r is a multiple of the attenuation.
In one embodiment of the disclosure, the processing, by using the spatial attention module, spatial context information of the channel-processed feature map at different positions, to obtain the pedestrian CNN feature map includes:
carrying out 1X 1 convolution operation on the feature map processed by the channel to obtain a first spatial information feature map T and a second spatial information feature map U;
performing matrix multiplication operation on the transpose of the first spatial information feature map T and the second spatial information feature map U to obtain a spatial attention feature map;
carrying out 1X 1 convolution operation on the feature map processed by the channel to obtain a third spatial information feature map V;
performing matrix multiplication operation on the third spatial information feature map V and the transpose of the spatial attention feature map to obtain a feature map subjected to spatial processing;
and obtaining the pedestrian CNN characteristic diagram according to the channel processing and the space processing.
In one embodiment of the present disclosure, the model training for the situation that the discriminative area of the pedestrian CNN feature map is blocked is simulated by using the manner of resisting erasure learning, and obtaining the training model includes:
inputting the pedestrian CNN feature images into a main classifier and an auxiliary classifier respectively for classification training, and outputting feature images exclusive to pedestrian categories from the main classifier and the auxiliary classifier;
performing partial erasure on the auxiliary classifier to obtain an erased feature map;
calculating the exclusive characteristic diagram of the pedestrian category output by the main classifier and the erased characteristic diagram output by the auxiliary classifier through a loss function to obtain a loss value;
and updating parameters of the training model according to the loss value.
In one embodiment of the disclosure, the primary classifier and the secondary classifier include the same number of convolution layers and global average pooling layers, and the number of channels of the convolution layers is the same as the number of pedestrian categories in the training dataset, and each channel of the pedestrian category-specific feature map represents a body response heat map when pedestrian images belong to different categories.
In one embodiment of the disclosure, the performing the partial erasure at the secondary classifier includes:
determining a region with the heat map value higher than a set anti-erasure threshold value in the body response heat map as a discrimination region;
and erasing the part corresponding to the discriminant region in the characteristic diagram exclusive to the pedestrian category output by the auxiliary classifier by a countermeasure mode that the response value is replaced by 0.
In one embodiment of the present disclosure, the step of performing pedestrian re-recognition by combining the training model with the target pedestrian image and the pedestrian image to be recognized, where obtaining the pedestrian re-recognition result includes:
inputting the target pedestrian image and the pedestrian image to be identified into the training model for training to obtain corresponding depth characteristics respectively;
calculating cosine distance according to the depth features of the target pedestrian image and the depth features of the pedestrian image to be identified;
and determining the similarity between the target pedestrian image and the pedestrian image to be identified according to the cosine distance, wherein the pedestrian image to be identified with the maximum similarity is the pedestrian re-identification result.
In one embodiment of the present disclosure, a calculation formula for calculating a cosine distance according to a depth feature of the target pedestrian image and a depth feature of the pedestrian image to be identified is:
wherein, the feature 1 is the depth characteristic of the target pedestrian image, and the feature 2 is the depth characteristic of the pedestrian image to be identified.
(III) beneficial effects
The beneficial effects of the present disclosure are: according to the pedestrian re-recognition method provided by the embodiment of the disclosure, the input feature map of the auxiliary classifier is partially erased by providing a feature level data enhancement strategy, so that the variation of pedestrian features and the situation of resisting the shielding of pedestrians are increased, and the generalization capability of the deep pedestrian re-recognition model is improved.
Drawings
FIG. 1 is a flow chart of a pedestrian re-identification method provided in one embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a network architecture for implementing the method of FIG. 1 in one embodiment of the present disclosure;
FIG. 3 is a flowchart illustrating step S110 in FIG. 1 according to one embodiment of the present disclosure;
FIG. 4 is a flowchart of step S303 in FIG. 3 according to one embodiment of the present disclosure;
FIG. 5 is a schematic illustration of channel attention in one embodiment of the present disclosure;
FIG. 6 is a schematic diagram of spatial attention in one embodiment of the present disclosure;
FIG. 7 is a flowchart of step S304 in FIG. 3 according to one embodiment of the present disclosure;
FIG. 8 is a schematic diagram of anti-erasure learning in an embodiment of the present disclosure;
FIG. 9 is a flowchart of step S120 in FIG. 1 according to one embodiment of the present disclosure;
fig. 10 is a flowchart of step S130 in fig. 1 according to an embodiment of the disclosure.
Detailed Description
For a better explanation of the present disclosure, for ease of understanding, the present disclosure is described in detail below by way of specific embodiments in conjunction with the accompanying drawings.
All technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. The terminology used in the description of the disclosure herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.
In other embodiments of the present disclosure, adding a variant of the training data set is an effective way to increase the generalization ability of the deep convolutional neural network. However, unlike the visual task of object recognition, pedestrian re-recognition needs to collect image data across cameras, and pedestrian labeling is very difficult, so that a large enough data set is often required to be built for pedestrian re-recognition, and the existing data set is small in pedestrian labeling amount. To address this problem, data enhancement may use only the current data set to augment the variation of the training set samples without the additional cost. Recent data enhancement studies have used an countermeasure generation network (Generative Adversarial Networks, GAN for short) to generate pedestrian images of different human body postures and camera styles, but this approach has problems of long training time, difficult convergence, low quality of generated images, and the like. In addition to explicitly generating new images, the usual approach may also enhance data on training images by dithering pixel values, random cropping, flipping the original image, etc.
In addition, occlusion is also an important factor affecting the generalization ability of convolutional neural networks. Collecting a large number of pedestrian images containing occlusion situations is one way to effectively solve the occlusion problem, but this also requires a high cost investment. Another reasonable approach is to accurately simulate the situation where pedestrians are occluded. For example, a rectangular box of random size and random position is used on the training image to occlude the training image and the pixel values of this rectangular area are replaced with random values to simulate occlusion to increase the variation of the dataset. However, the above-mentioned shielding area is randomly selected, optionally, a pedestrian re-recognition classification model is trained, then the image discriminant area is found with the aid of network visualization and multiple classifiers, and the discriminant area is shielded on the original image to generate a new sample, and finally the new sample is added into the original data set to re-train the pedestrian re-recognition model.
Based on the two methods, the variation of the sample is added on the original pedestrian image through shielding, and the method belongs to the image-level data enhancement method.
Fig. 1 is a flowchart of a pedestrian re-recognition method according to an embodiment of the present disclosure, as shown in fig. 1, the method includes the following steps:
as shown in fig. 1, in step S110, a pedestrian CNN feature map is extracted from a plurality of pictures;
as shown in fig. 1, in step S120, model training is performed by simulating the situation that the discriminative area of the pedestrian CNN feature map is blocked in a manner of resisting erasure learning, so as to obtain a training model;
as shown in fig. 1, in step S130, the training model is used to combine the target pedestrian image and the pedestrian image to be identified to perform pedestrian re-identification, so as to obtain a pedestrian re-identification result.
The specific implementation of the steps of the embodiment shown in fig. 1 is described in detail below:
with reference to the flow chart shown in fig. 1, fig. 2 is a schematic diagram of a network structure for implementing the method shown in fig. 1 in an embodiment of the present disclosure, as shown in fig. 2, a complementary attention of channel attention and spatial attention is required to be used in a processing process for each channel, and then anti-erasure learning and Softmax loss calculation are performed on the obtained feature map. In addition, as shown in fig. 2, three channels are divided into two channels, a middle-level semantic branch and a high-level semantic branch.
In step S110, a pedestrian CNN feature map is extracted from a plurality of pictures.
Fig. 3 is a flowchart of step S110 in fig. 1 according to an embodiment of the disclosure, which specifically includes the following steps:
as shown in fig. 3, in step S301, the plurality of pictures are randomly selected from the training dataset.
As shown in fig. 3, in step S302, the plurality of pictures are input to a plurality of different semantic layers of the res net50 model for extraction, so as to obtain a feature map of a plurality of channels.
In one embodiment of the present disclosure, first, an image is input, and a batch number of picture sets, i.e., a plurality of pictures, are randomly selected on a training dataset. Next, the picture sizes are adjusted to 384×128, and the pictures are sent to different semantic layers (res_conv5a, res_conv5b, res_conv5c) of the backbone network res net50, as shown in fig. 2, where res_conv5a, res_conv5b correspond to the middle semantic branches and res_conv5c corresponds to the high semantic branches) to extract the pedestrian CNN feature map.
As shown in fig. 3, in step S303, the channel attention module is used to process the feature maps of the channels, so as to obtain a feature map processed by the channels.
Fig. 4 is a flowchart of step S303 in fig. 3 according to an embodiment of the present disclosure, specifically including the following steps:
as shown in fig. 4, in step S401, a channel feature descriptor is obtained according to a feature map of each channel in the feature maps of the plurality of channels.
As shown in fig. 4, in step S402, a channel attention profile is obtained by performing an activation function operation on the channel feature descriptors.
As shown in fig. 4, in step S403, the channel attention profile is multiplied by the aggregated profile to obtain the channel processed profile.
In one embodiment of the present disclosure, step S303 may employ a channel attention module to explore the links between channels of the pedestrian CNN profile, capturing and describing areas of input image discrimination.
Fig. 5 is a schematic view of channel attention in an embodiment of the disclosure, as shown in fig. 5, for each channel, feature maps obtained by extraction are respectively shown in a and B, where a and B are the length and width of the feature maps, N is the number of channels, and N is the number of channels.
First, feature graphs are aggregated using GAP operationsSpatial information for each channel, generating a feature descriptor for the channel: />It can be seen that the feature descriptors include statistics for the plurality of channels, the statistics for each channel being:
secondly, s is passed through a threshold mechanism module to obtain a channel attention feature map
e=σ(W 2 δ(W 1 (s))) formula (2)
Wherein sigma, delta represent the Sigmod activation function and the ReLU activation function, respectively,is the weight of the first fully-connected layer Fc1, -/->Is the weight of the second fully connected layer Fc2, r is a multiple of the attenuation.
Finally, multiplying the channel attention e with the original input feature map S to obtain a corrected feature map
Because the channel attention profile e-code contains dependencies and relative importance between channel profiles, the neural network will ignore less reused profiles by dynamically updating e to learn important types of profiles.
As shown in fig. 3, in step S304, the spatial context information of the channel-processed feature map at different positions is processed by using a spatial attention module, so as to obtain the pedestrian CNN feature map.
In one embodiment of the present disclosure, step S304 may use a spatial attention module to integrate spatial context information of different positions of the feature map into the pedestrian local feature, so as to enhance the spatial correlation of the pedestrian local area. Fig. 6 is a schematic diagram of spatial attention in an embodiment of the disclosure, as shown in fig. 6, the feature images after the channel processing are respectively convolved to obtain a first spatial information feature image T, a second spatial information feature image U and a third spatial information feature image V, the first spatial information feature image T is transposed and then multiplied by U to obtain D, the D and the V are multiplied to obtain X, and the X is scaled in a certain proportion and then added to the feature images after the channel processing to implement spatial processing on the feature images, so as to obtain a final pedestrian CNN feature image.
Fig. 7 is a flowchart of step S304 in fig. 3 according to an embodiment of the present disclosure, which specifically includes the following steps:
as shown in fig. 7, in step S701, a convolution operation of 1×1 is performed on the channel-processed feature map, so as to obtain a first spatial information feature map T and a second spatial information feature map U. Channel attention corrected feature map (i.e., channel processed feature map)Feeding a convolution f of 1 x 1 key And f query Obtaining two characteristic diagrams T and U, wherein
As shown in fig. 7, in step S702, a matrix multiplication operation is performed on the transpose of the first spatial information feature map T and the second spatial information feature map U, so as to obtain a spatial attention feature map. Adjusting the T and U shapes toWhere z=a×b, representing the number of features, then transpose T and matrix multiply with U, applying a Softmax function in the row direction to obtain a spatial attention profile D e R Z×Z Each element D of D j,i Can be represented asThe method comprises the following steps:
wherein d j,i Representing the correlation of the ith position to the jth position feature, the more similar the feature expressions of the two positions are, the higher the correlation between the two.
As shown in fig. 7, in step S703, a convolution operation of 1×1 is performed on the channel-processed feature map, to obtain a third spatial information feature map V.
The channel-processed feature map S' is fed into a 1 x 1 convolutional layer f value Obtaining a new characteristic diagramAnd adjust its shape to +.>
As shown in fig. 7, in step S704, the third spatial information feature map V is subjected to matrix multiplication with the transpose of the spatial attention feature map, and a spatially processed feature map is obtained.
In this step, the transpose of V and D is first subjected to matrix multiplication, and the shape of the result is adjusted toAnd passing it through a convolution f of 1 x 1 up Obtaining a characteristic diagram->
As shown in fig. 7, in step S705, the pedestrian CNN feature map is obtained from the passage processing, that is, the steps S401 to S403, and the space processing, that is, the steps S701 to S704.
In this step X is multiplied by a scaling parameter alpha and added to the channel-processed feature map S' by element to obtain a feature mapNamely:
based on the above, the elements of each position of the feature map s″ can be expressed as:
where α is a learnable parameter, initially set to 0, and progressively greater weights may be learned starting from 0. As can be seen from equation (6), the feature S "of each position of the feature map S' j Is a feature map S 'of features of all positions and processed by channels' j So that it contains a global receptive field, based on each element D in the spatial attention profile D j,i Selectively aggregating the associated local regions V i And thereby may enhance the link between the different local features of the pedestrian.
Based on the foregoing steps, it is more efficient to use the modified CNN signature in series with channel attention and spatial attention, letting the neural network automatically focus on which types of features and which locations of features. Thus, in the present disclosure, the channel attention module and the spatial attention module are used in combination, giving full play to both. As shown in fig. 5, the characteristic diagram of the present disclosureThe correction of the complementary attention is realized by the channel attention module and the space attention module firstly:
S'=M c (S)
S”=M s (S') equation (7)
In step S120, model training is performed by simulating the situation that the discriminative area of the pedestrian CNN feature map is blocked in a manner of resisting erasure learning, so as to obtain a training model.
Fig. 8 is a schematic diagram of anti-erasure learning in an embodiment of the present disclosure, as shown in fig. 8, the pedestrian CNN feature map is processed by convolution, GAP and Softmax loss functions through the main classifier and the auxiliary classifier, respectively.
Fig. 9 is a flowchart of step S120 in fig. 1 according to an embodiment of the disclosure, which specifically includes the following steps:
as shown in fig. 9, in step S901, the pedestrian CNN feature map is input to a main classifier and an auxiliary classifier, respectively, for classification training, and feature maps specific to pedestrian categories are output from the main classifier and the auxiliary classifier.
The main classifier and the auxiliary classifier comprise the same number of convolution layers and global average pooling layers (Global Average Pooling, GAP for short), the number of channels of the convolution layers is the same as the number of pedestrian categories in the training data set, and each channel of the characteristic map specific to the pedestrian category represents a body response heat map when the pedestrian image belongs to different categories.
In the step, the full connection layer of the classification model is replaced by a 1×1 convolution layer to form a classification model based on a full convolution network, and the corrected feature image (namely, pedestrian CNN feature image) is fed into the 1×1 convolution layer to directly obtain the feature image exclusive to the pedestrian category. In the training stage, a pedestrian image category label can be obtained, and the characteristic diagram of the channel corresponding to the category label is indexed out to obtain a characteristic diagram exclusive to the pedestrian category, namely a body response heat diagram of the pedestrian image.
As shown in fig. 9, in step S902, the sub classifier is partially erased, so as to obtain an erased feature map.
Firstly, determining a region with a heat map value higher than a set anti-erasure threshold value in the body response heat map as a discrimination region; and secondly, erasing the part corresponding to the discriminant area in the characteristic diagram exclusive to the pedestrian category output by the auxiliary classifier by a countermeasure mode that the response value is replaced by 0.
In the step, the input feature map of the auxiliary classifier is partially erased, the main classifier in the step S901 generates a feature map exclusive for the pedestrian category, and the part with the body response heat map value higher than the threshold of the countermeasure erasure is set as the discriminatory region, and the corresponding region in the input feature map of the auxiliary classifier is erased by the countermeasure mode that the response value is replaced by 0. The feature map input by the auxiliary classifier is partially erased, so that the variation of the feature map can be increased, and the situation that pedestrians are blocked is simulated.
As shown in fig. 9, in step S903, a loss function is used to calculate the characteristic map specific to the pedestrian category output by the main classifier and the erased characteristic map output by the auxiliary classifier, so as to obtain a loss value.
As shown in fig. 9, in step S904, the training model is updated with parameters according to the loss value.
In the step, parameter updating is carried out on both branches of the main classifier and the auxiliary classifier under the supervision of a Softmax loss function, and the loss function expression is as follows:
where P represents the size of the bulk samples, M represents the number of branches, K represents the number of classifiers in the challenge-erase learning (2 in this embodiment), C represents the number of classes,represents the first of the kth classifier of the mth branch of the nth sample when the full convolution classification network is used p A node value of Softmax input, where l p Is the class of the p-th sample. The first classifier of each branch is a main classifier, the second classifier is an auxiliary classifier, and the parameter lambda is a parameter lambda k Is assigned to the two classifier lossesWeight, wherein parameter lambda 1 Corresponding to =1 is the main classifier, parameter λ 2 Corresponding to=0.5 is a secondary classifier.
In step S130, the training model is used to combine the target pedestrian image and the pedestrian image to be identified to perform pedestrian re-identification, so as to obtain a pedestrian re-identification result.
Fig. 10 is a flowchart of step S130 in fig. 1 according to an embodiment of the disclosure, specifically including the following steps:
as shown in fig. 10, in step S1001, corresponding depth features are obtained respectively according to the target pedestrian image and the pedestrian image to be identified being input into the training model for training. In this step, the target pedestrian image and the pedestrian image to be identified are sent to the CNN model trained in step 2 to extract image features, specifically, features (res_conv5a, res_conv5b, res_conv5c) with different semantic levels in fig. 2 are connected in series as final feature descriptors.
As shown in fig. 10, in step S1002, a cosine distance is calculated according to the depth feature of the target pedestrian image and the depth feature of the pedestrian image to be identified, where the calculation formula is:
wherein, the feature 1 is the depth characteristic of the target pedestrian image, and the feature 2 is the depth characteristic of the pedestrian image to be identified.
As shown in fig. 10, in step S1003, the similarity between the target pedestrian image and the pedestrian image to be identified is determined according to the magnitude of the cosine distance, wherein the pedestrian image to be identified with the greatest similarity is the pedestrian re-identification result.
Since the similarity between the graph opposition composed of the target pedestrian image and the pedestrian image to be identified is in a negative linear correlation relationship with the feature cosine distance, the smaller the feature cosine distance is, the higher the similarity of the graph opposition is. Based on the above, the cosine distances can be obtained and then arranged in ascending order according to the sizes, that is, the images are ordered in descending order of the sizes of the similarity, and the pedestrian image to be identified with the maximum similarity is used as the pedestrian re-identification result.
In summary, by adopting the pedestrian re-recognition method provided by the embodiment of the present disclosure, on one hand, by providing a feature level data enhancement strategy, the input feature map of the auxiliary classifier is partially erased, so as to increase the variation of the pedestrian features and resist the situation that the pedestrian is blocked, and improve the generalization capability of the deep pedestrian re-recognition model. On the other hand, the spatial attention model in the disclosure integrates spatial context information into the local features of pedestrians, enhances the spatial correlation of different positions of the pedestrians, forms a complementary attention model with the channel attention model, and corrects the feature map from two directions of the channel and the space by combining the two models, so that the discriminant region can be better captured. The classification model based on the full convolution network can directly obtain a body response heat map in the forward propagation process, guide erasure of a discriminatory body region and realize data enhancement of feature level data.
It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a touch terminal, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.
Claims (6)
1. A pedestrian re-recognition method, characterized in that it comprises:
extracting a pedestrian CNN feature map from a plurality of pictures, wherein the pedestrian CNN feature map comprises the following steps:
randomly selecting the plurality of pictures from a training dataset;
inputting the pictures into a plurality of different semantic layers of a ResNet50 model for extraction to obtain feature images of a plurality of channels;
processing the feature graphs of the channels by using a channel attention module to obtain a feature graph processed by the channels;
processing the spatial context information of the feature map processed by the channel at different positions by using a spatial attention module to obtain the pedestrian CNN feature map; simulating the situation that the discriminant area of the CNN feature map of the pedestrian is blocked by adopting an anti-erasure learning mode to carry out model training to obtain a training model, wherein the method comprises the following steps:
inputting the pedestrian CNN feature images into a main classifier and an auxiliary classifier respectively for classification training, and outputting feature images exclusive to pedestrian categories from the main classifier and the auxiliary classifier;
the auxiliary classifier is an auxiliary classifier added on the basis of the resnet 50;
the main classifier and the auxiliary classifier comprise the same number of convolution layers and global average pooling layers, the number of channels of the convolution layers is the same as the number of pedestrian categories in the training data set, and each channel of the characteristic map exclusive to the pedestrian category represents a body response heat map when the pedestrian image belongs to different categories; performing partial erasure on the auxiliary classifier to obtain an erased feature map;
the performing partial erasure at the secondary classifier includes:
determining a region with the heat map value higher than a set anti-erasure threshold value in the body response heat map as a discrimination region;
the part corresponding to the discriminant area in the characteristic diagram exclusive to the pedestrian category output by the auxiliary classifier is erased in a countermeasure mode that the response value is replaced by 0;
calculating the exclusive characteristic diagram of the pedestrian category output by the main classifier and the erased characteristic diagram output by the auxiliary classifier through a loss function to obtain a loss value;
parameter updating is carried out on the training model according to the loss value;
and carrying out pedestrian re-recognition by combining the training model with the target pedestrian image and the pedestrian image to be recognized to obtain a pedestrian re-recognition result.
2. The pedestrian re-recognition method of claim 1 wherein the processing the feature map of the plurality of channels with the channel attention module to obtain a channel processed feature map comprises:
obtaining a channel characteristic descriptor according to the characteristic diagram of each channel in the characteristic diagrams of the plurality of channels;
obtaining a channel attention feature map through activating function operation on the channel feature descriptors;
multiplying the channel attention profile by the aggregated profile to obtain the channel processed profile.
3. The pedestrian re-recognition method of claim 2 wherein the feature descriptors include statistics of the plurality of channels, the feature descriptors being:
the statistics for each channel are:
wherein N is the number of channels, N is the number of channels, and A and B are the length and width of the feature map respectively;
the channel attention profile is:
e=σ(W 2 δ(W 1 (s)))
4. The pedestrian re-recognition method of claim 1, wherein the processing the spatial context information of the channel-processed feature map at different locations with the spatial attention module to obtain the pedestrian CNN feature map comprises:
carrying out 1X 1 convolution operation on the feature map processed by the channel to obtain a first spatial information feature map T and a second spatial information feature map U;
performing matrix multiplication operation on the transpose of the first spatial information feature map T and the second spatial information feature map U to obtain a spatial attention feature map;
carrying out 1X 1 convolution operation on the feature map processed by the channel to obtain a third spatial information feature map V;
performing matrix multiplication operation on the third spatial information feature map V and the transpose of the spatial attention feature map to obtain a feature map subjected to spatial processing;
and obtaining the pedestrian CNN characteristic diagram according to the channel processing and the space processing.
5. The pedestrian re-recognition method according to claim 2, wherein the step of performing pedestrian re-recognition by combining the target pedestrian image and the pedestrian image to be recognized by using the training model, the step of obtaining a pedestrian re-recognition result includes:
inputting the target pedestrian image and the pedestrian image to be identified into the training model for training to obtain corresponding depth characteristics respectively;
calculating cosine distance according to the depth features of the target pedestrian image and the depth features of the pedestrian image to be identified;
and determining the similarity between the target pedestrian image and the pedestrian image to be identified according to the cosine distance, wherein the pedestrian image to be identified with the maximum similarity is the pedestrian re-identification result.
6. The pedestrian re-recognition method of claim 5, wherein the calculation formula for calculating the cosine distance from the depth features of the target pedestrian image and the depth features of the pedestrian image to be recognized is:
wherein, the feature 1 is the depth characteristic of the target pedestrian image, and the feature 2 is the depth characteristic of the pedestrian image to be identified.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910403777.5A CN110110689B (en) | 2019-05-15 | 2019-05-15 | Pedestrian re-identification method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910403777.5A CN110110689B (en) | 2019-05-15 | 2019-05-15 | Pedestrian re-identification method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110110689A CN110110689A (en) | 2019-08-09 |
CN110110689B true CN110110689B (en) | 2023-05-26 |
Family
ID=67490255
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910403777.5A Active CN110110689B (en) | 2019-05-15 | 2019-05-15 | Pedestrian re-identification method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110110689B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110516603B (en) * | 2019-08-28 | 2022-03-18 | 北京百度网讯科技有限公司 | Information processing method and device |
CN112633459B (en) * | 2019-09-24 | 2024-09-20 | 华为技术有限公司 | Method for training neural network, data processing method and related device |
CN112784648B (en) * | 2019-11-07 | 2022-09-06 | 中国科学技术大学 | Method and device for optimizing feature extraction of pedestrian re-identification system of video |
CN111160096A (en) * | 2019-11-26 | 2020-05-15 | 北京海益同展信息科技有限公司 | Method, device and system for identifying poultry egg abnormality, storage medium and electronic device |
CN111198964B (en) * | 2020-01-10 | 2023-04-25 | 中国科学院自动化研究所 | Image retrieval method and system |
CN111461038B (en) * | 2020-04-07 | 2022-08-05 | 中北大学 | Pedestrian re-identification method based on layered multi-mode attention mechanism |
CN111582587B (en) * | 2020-05-11 | 2021-06-04 | 深圳赋乐科技有限公司 | Prediction method and prediction system for video public sentiment |
CN111814618B (en) * | 2020-06-28 | 2023-09-01 | 浙江大华技术股份有限公司 | Pedestrian re-recognition method, gait recognition network training method and related devices |
CN112131943B (en) * | 2020-08-20 | 2023-07-11 | 深圳大学 | Dual-attention model-based video behavior recognition method and system |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
IT201600068348A1 (en) * | 2016-07-01 | 2018-01-01 | Octo Telematics Spa | Procedure for determining the status of a vehicle by detecting the vehicle's battery voltage. |
CN107563313A (en) * | 2017-08-18 | 2018-01-09 | 北京航空航天大学 | Multiple target pedestrian detection and tracking based on deep learning |
CN107679483A (en) * | 2017-09-27 | 2018-02-09 | 北京小米移动软件有限公司 | Number plate recognition methods and device |
CN107992882A (en) * | 2017-11-20 | 2018-05-04 | 电子科技大学 | A kind of occupancy statistical method based on WiFi channel condition informations and support vector machines |
WO2018153322A1 (en) * | 2017-02-23 | 2018-08-30 | 北京市商汤科技开发有限公司 | Key point detection method, neural network training method, apparatus and electronic device |
CN109583379A (en) * | 2018-11-30 | 2019-04-05 | 常州大学 | A kind of pedestrian's recognition methods again being aligned network based on selective erasing pedestrian |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109359559B (en) * | 2018-09-27 | 2021-11-12 | 天津师范大学 | Pedestrian re-identification method based on dynamic shielding sample |
CN109583502B (en) * | 2018-11-30 | 2022-11-18 | 天津师范大学 | Pedestrian re-identification method based on anti-erasure attention mechanism |
-
2019
- 2019-05-15 CN CN201910403777.5A patent/CN110110689B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
IT201600068348A1 (en) * | 2016-07-01 | 2018-01-01 | Octo Telematics Spa | Procedure for determining the status of a vehicle by detecting the vehicle's battery voltage. |
WO2018153322A1 (en) * | 2017-02-23 | 2018-08-30 | 北京市商汤科技开发有限公司 | Key point detection method, neural network training method, apparatus and electronic device |
CN107563313A (en) * | 2017-08-18 | 2018-01-09 | 北京航空航天大学 | Multiple target pedestrian detection and tracking based on deep learning |
CN107679483A (en) * | 2017-09-27 | 2018-02-09 | 北京小米移动软件有限公司 | Number plate recognition methods and device |
CN107992882A (en) * | 2017-11-20 | 2018-05-04 | 电子科技大学 | A kind of occupancy statistical method based on WiFi channel condition informations and support vector machines |
CN109583379A (en) * | 2018-11-30 | 2019-04-05 | 常州大学 | A kind of pedestrian's recognition methods again being aligned network based on selective erasing pedestrian |
Non-Patent Citations (1)
Title |
---|
基于稀疏学习的行人重识别算法;张文文 等;《数据采集与处理》;第33卷(第5期);第855-864页 * |
Also Published As
Publication number | Publication date |
---|---|
CN110110689A (en) | 2019-08-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110110689B (en) | Pedestrian re-identification method | |
CN109740419B (en) | Attention-LSTM network-based video behavior identification method | |
CN108830252B (en) | Convolutional neural network human body action recognition method fusing global space-time characteristics | |
Sun et al. | Lattice long short-term memory for human action recognition | |
CN108960080B (en) | Face recognition method based on active defense image anti-attack | |
CN108229338A (en) | A kind of video behavior recognition methods based on depth convolution feature | |
CN110889375B (en) | Hidden-double-flow cooperative learning network and method for behavior recognition | |
US20220292394A1 (en) | Multi-scale deep supervision based reverse attention model | |
CN112434608B (en) | Human behavior identification method and system based on double-current combined network | |
CN110378208B (en) | Behavior identification method based on deep residual error network | |
CN114549913B (en) | Semantic segmentation method and device, computer equipment and storage medium | |
CN112861970B (en) | Fine-grained image classification method based on feature fusion | |
Zhu et al. | Attentive multi-stage convolutional neural network for crowd counting | |
CN110852199A (en) | Foreground extraction method based on double-frame coding and decoding model | |
CN106874825A (en) | The training method of Face datection, detection method and device | |
Cai et al. | A real-time smoke detection model based on YOLO-smoke algorithm | |
CN116229323A (en) | Human body behavior recognition method based on improved depth residual error network | |
CN114241456A (en) | Safe driving monitoring method using feature adaptive weighting | |
Gao et al. | Adaptive random down-sampling data augmentation and area attention pooling for low resolution face recognition | |
CN110728238A (en) | Personnel re-detection method of fusion type neural network | |
CN114492634A (en) | Fine-grained equipment image classification and identification method and system | |
CN112528077B (en) | Video face retrieval method and system based on video embedding | |
Kumar et al. | Content based movie scene retrieval using spatio-temporal features | |
CN110852272B (en) | Pedestrian detection method | |
CN116433909A (en) | Similarity weighted multi-teacher network model-based semi-supervised image semantic segmentation method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |