CN110659589A - Pedestrian re-identification method, system and device based on attitude and attention mechanism - Google Patents

Pedestrian re-identification method, system and device based on attitude and attention mechanism Download PDF

Info

Publication number
CN110659589A
CN110659589A CN201910840108.4A CN201910840108A CN110659589A CN 110659589 A CN110659589 A CN 110659589A CN 201910840108 A CN201910840108 A CN 201910840108A CN 110659589 A CN110659589 A CN 110659589A
Authority
CN
China
Prior art keywords
pedestrian
feature
attention
image
map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910840108.4A
Other languages
Chinese (zh)
Other versions
CN110659589B (en
Inventor
王坤峰
王飞跃
李雪松
刘雅婷
颜拥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
State Grid Zhejiang Electric Power Co Ltd
Original Assignee
Institute of Automation of Chinese Academy of Science
State Grid Zhejiang Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science, State Grid Zhejiang Electric Power Co Ltd filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201910840108.4A priority Critical patent/CN110659589B/en
Publication of CN110659589A publication Critical patent/CN110659589A/en
Application granted granted Critical
Publication of CN110659589B publication Critical patent/CN110659589B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of image recognition, and particularly relates to a pedestrian re-recognition method, system and device based on a posture and attention mechanism, aiming at solving the problems that the information of key points of an image cannot be accurately acquired due to the deviation of data sets of different tasks, and the re-recognition precision of pedestrians cannot meet the expected requirement. The method comprises the following steps: extracting the postures of the pedestrians and generating pedestrian key points; deleting redundant background information and correcting a pedestrian detection frame; extracting a first characteristic diagram and acquiring a hard attention diagram by adopting a hard attention mechanism module; fusing the first feature map and the hard attention map to obtain a second feature map; acquiring a soft attention diagram by adopting a soft attention mechanism module and then fusing again; and carrying out global average pooling and feature dimension reduction on the fused third feature map to obtain a feature vector for pedestrian re-identification. The invention combines a hard attention mechanism and a soft attention mechanism, effectively enhances the foreground information of the characteristic diagram, inhibits background noise and improves the accuracy and stability of pedestrian re-identification.

Description

Pedestrian re-identification method, system and device based on attitude and attention mechanism
Technical Field
The invention belongs to the technical field of computer image recognition, and particularly relates to a pedestrian re-recognition method, system and device based on a posture and attention mechanism.
Background
The pedestrian re-identification is a technology for finding out the same target under different cameras by utilizing a computer vision technology, is considered as a sub-problem of image retrieval, is widely applied to the fields of intelligent video monitoring, intelligent security and the like, and is an indispensable part for building a smart city.
Pedestrian re-identification techniques have received increasing attention. With the development of computer vision theory and the support of hardware system, the pedestrian re-identification technology has been greatly developed. Early pedestrian re-identification techniques utilized conventional methods to manually design features, but could only be applied to specific scenes. The feature representation capability is insufficient, and the model generalization capability is not strong. With the development of deep learning technology, a great number of deep learning technologies are applied to a pedestrian re-identification task, and the deep learning technologies are mainly divided into two methods, namely learning based on features and learning based on distance measurement. Although there is a great improvement in recognition accuracy, there are still some drawbacks. The main problems faced in pedestrian re-identification are: viewing angle changes, pedestrian mismatching due to detection inaccuracies, occlusion, and similar appearance, among others. Although some methods also use attitude information or attention mechanisms to solve these problems, the attitude estimation network is trained on the data set of the attitude estimation, and has a certain deviation from the data set of the pedestrian re-recognition, for example, a key point of the pedestrian cannot be accurately acquired on some images, and the deviation may cause the performance of the pedestrian re-recognition to be reduced.
In general, under the condition that data set deviations exist among different tasks, the prior art cannot accurately acquire the key point information of the image, and the re-identification precision of pedestrians cannot meet the expected requirement.
Disclosure of Invention
In order to solve the above problems in the prior art, that is, the prior art cannot accurately acquire the image key point information and the accuracy of pedestrian re-identification cannot meet the expected requirements under the condition that data set deviations exist among different tasks, the invention provides a pedestrian re-identification method based on an attitude and attention mechanism, which comprises the following steps:
step S10, acquiring a pedestrian image to be recognized as a first image;
step S20, extracting pedestrian attitude information of the first image by adopting an attitude estimation network, and generating pedestrian key points;
step S30, based on the pedestrian key points, deleting the redundant background information of the first image, and correcting a pedestrian detection frame to obtain a second image;
step S40, generating a feature map from the second image through a feature extraction network to obtain a first feature map; generating a hard attention diagram with the same size as the feature diagram by Gaussian, binary and normalization of the pedestrian key points;
step S50, fusing the first feature map and the hard attention map to obtain a second feature map;
step S60, acquiring a soft attention map with the same size as the second feature map through a soft attention network, and fusing the soft attention map with the second feature map to obtain a third feature map;
and step S70, performing global average pooling and feature dimension reduction on the third feature map to obtain a feature vector for calculating similarity to realize pedestrian matching, namely the feature vector for re-identifying pedestrians.
In some preferred embodiments, the redundant background information of the first image is:
and the upper, lower, left and right regions of the pedestrian in the first image.
In some preferred embodiments, in step S50, "fuse the first feature map and the hard attention map to obtain a second feature map", the method includes:
Figure BDA0002193412770000031
wherein, F1Is a first characteristic diagram, F2Is a second characteristic diagram, MaskhIn an attempt to achieve a hard attention, the user is forced to,
Figure BDA0002193412770000032
respectively, element-by-element multiplication and element-by-element addition.
In some preferred embodiments, in step S60, "obtain a soft attention map with the same size as the second feature map through a soft attention network, and obtain a third feature map by fusing with the second feature map", the method includes:
step S61, obtaining a soft attention map with the same size as the second feature map through a soft attention network:
Masks=Sigmoid(BN(Conv(ReLU(Conv(F2)))))
wherein, MasksRepresenting a soft attention map, F2Conv represents a 1 × 1 convolution operation, BN represents batch normalization, and Sigmoid and ReLU represent activation functions for a second characteristic diagram;
step S62, fusing the obtained soft attention map and the second feature map to obtain a third feature map:
Figure BDA0002193412770000033
wherein, F2Is a second characteristic diagram, F3Is a third characteristic diagram, MasksIn order to achieve a soft attention-force diagram,
Figure BDA0002193412770000034
respectively, element-by-element multiplication and element-by-element addition.
In some preferred embodiments, in the training process of the network model, after "performing global average pooling and feature dimension reduction on the third feature map to obtain a feature vector for calculating similarity to realize pedestrian matching, i.e., a feature vector for pedestrian re-identification" in step S70, a step of supervised training is further provided, in which the method includes:
and performing supervised training on the extracted feature vectors on the acquired data set labeled with the pedestrian category by adopting cross entropy loss and triple loss.
In some preferred embodiments, the cross-entropy penalty is:
Figure BDA0002193412770000041
wherein L issoftmaxRepresenting a cross-entropy function, wkWeight, w, representing the k-th classiIs the weight corresponding to the ith image in one Bathsize, C represents the number of the pedestrian categories in the acquired data set marked with the pedestrian categories, N represents the number of the images contained in one Bathsize, fiRepresenting the feature vector corresponding to the ith image in one Bathsize.
In some preferred embodiments, the triplet penalty is:
Figure BDA0002193412770000042
wherein L istripletRepresents the loss function of the triplet in the form of,
Figure BDA0002193412770000043
representing a feature vector extracted from any reference pedestrian image in the training image set;representing a feature vector extracted from another image representing the same person as the reference pedestrian as a positive sample;
Figure BDA0002193412770000045
feature direction extracted from image representing other personAmount, as negative sample; α represents a threshold of the triplet constraint; p indicates that P IDs exist in one Bathsize, and K indicates that K images are selected from one ID.
In another aspect of the invention, a pedestrian re-identification system based on a posture and attention mechanism is provided, and comprises an image acquisition module, a posture extraction module, a correction module, a hard attention diagram generation module, a soft attention diagram generation module, a fusion module, a feature vector acquisition module and an output module;
the image acquisition module is configured to acquire a pedestrian image to be identified as a first image and input the first image to the attitude extraction module;
the attitude extraction module is configured to extract pedestrian attitude information of the first image sent by the image acquisition module by adopting an attitude estimation network, and generate pedestrian key points;
the correction module is configured to delete the redundant background information of the first image based on the pedestrian key point, and correct a pedestrian detection frame to obtain a second image;
the hard attention map generation module is configured to generate a feature map from the second image through a feature extraction network to obtain a first feature map; generating a hard attention diagram with the same size as the feature diagram by Gaussian, binary and normalization of the pedestrian key points;
the fusion module is configured to fuse the first feature map and the hard attention map to obtain a second feature map;
the soft attention map generation module is configured to acquire a soft attention map with the same size as the second feature map through a soft attention network, and fuse the soft attention map and the second feature map by using the fusion module to obtain a third feature map;
the feature vector acquisition module is configured to perform global average pooling and feature dimension reduction on the third feature map to obtain feature vectors for pedestrian re-identification;
the output module is configured to output the obtained feature vectors for calculating the similarity to realize pedestrian matching, namely the feature vectors for re-identifying the pedestrians.
In a third aspect of the invention, a storage device is provided, in which a plurality of programs are stored, the programs being adapted to be loaded and executed by a processor to implement the above-described pedestrian re-identification method based on the attitude and attention mechanism.
In a fourth aspect of the present invention, a processing apparatus is provided, which includes a processor, a storage device; the processor is suitable for executing various programs; the storage device is suitable for storing a plurality of programs; the program is adapted to be loaded and executed by a processor to implement the above-described pedestrian re-identification method based on the attitude and attention mechanism.
The invention has the beneficial effects that:
(1) the pedestrian re-identification method based on the posture and the attention mechanism adopts the hard attention mechanism and the soft attention mechanism to combine the characteristics of image extraction for feature fusion, solves the problem of inaccurate extraction of image key point information caused by data set deviation among different tasks, can effectively enhance the foreground information of a feature map, inhibits background noise, enhances the discriminability and robustness of the extracted characteristics, and thus improves the accuracy and stability of pedestrian re-identification.
(2) The pedestrian re-identification method based on the posture and attention mechanism trains the acquired characteristic vector for pedestrian re-identification followed by cross entropy loss and triple loss, promotes the intra-class distance to be smaller and the inter-class distance to be larger, and improves the robustness of pedestrian re-identification.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is a schematic flow diagram of a pedestrian re-identification method based on an attitude and attention mechanism in accordance with the present invention;
FIG. 2 is a schematic diagram of a hard attention mechanism and a visualization effect diagram of an embodiment of a pedestrian re-identification method based on attitude and attention mechanisms according to the invention;
FIG. 3 is a schematic diagram of a soft attention mechanism and a visualization effect diagram of an embodiment of a pedestrian re-identification method based on attitude and attention mechanisms according to the invention;
FIG. 4 is a network diagram of a combination of a hard attention mechanism and a soft attention mechanism according to an embodiment of the pedestrian re-identification method based on attitude and attention mechanisms.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
The invention discloses a pedestrian re-identification method based on an attitude and attention mechanism, which comprises the following steps of:
step S10, acquiring a pedestrian image to be recognized as a first image;
step S20, extracting pedestrian attitude information of the first image by adopting an attitude estimation network, and generating pedestrian key points;
step S30, based on the pedestrian key points, deleting the redundant background information of the first image, and correcting a pedestrian detection frame to obtain a second image;
step S40, generating a feature map from the second image through a feature extraction network to obtain a first feature map; generating a hard attention diagram with the same size as the feature diagram by Gaussian, binary and normalization of the pedestrian key points;
step S50, fusing the first feature map and the hard attention map to obtain a second feature map;
step S60, acquiring a soft attention map with the same size as the second feature map through a soft attention network, and fusing the soft attention map with the second feature map to obtain a third feature map;
and step S70, performing global average pooling and feature dimension reduction on the third feature map to obtain a feature vector for calculating similarity to realize pedestrian matching, namely the feature vector for re-identifying pedestrians.
In order to more clearly explain the pedestrian re-identification method based on the posture and attention mechanism of the present invention, the following describes the steps in the embodiment of the method of the present invention in detail with reference to fig. 1.
The pedestrian re-identification method based on the attitude and attention mechanism comprises the steps of S10-S70, wherein the steps are described in detail as follows:
in step S10, an image of a pedestrian to be recognized is acquired as a first image.
Common pedestrian re-identification datasets are DukeMTMC-reiD, Market1501, CUHK03, MSMT17, LPW, etc.
In one embodiment of the invention, two pedestrian data sets, namely Market-1501 and DukeMTMC-reiD, are selected as the images of the pedestrians to be identified.
And step S20, extracting pedestrian attitude information of the first image by adopting an attitude estimation network, and generating pedestrian key points.
Pose information may be extracted using a pose estimation network alphapos or openpos pre-trained on the COCO dataset. In one embodiment of the invention, the attitude estimation network AlphaPose is trained in advance on the COCO data set to extract attitude information and generate key points of pedestrians.
The pedestrian key points include:
nose, left eye, right eye, left ear, right ear, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist, left waist, right waist, left knee, right knee, left ankle, right ankle.
And step S30, based on the pedestrian key points, deleting the redundant background information of the first image, and correcting a pedestrian detection frame to obtain a second image.
The redundant background information of the first image is:
four areas of the pedestrian in the first image are up, down, left and right.
And image preprocessing is carried out, redundant background information is removed, a pedestrian detection frame is corrected, the alignment of pedestrians is facilitated, and the identification precision is further improved. In one embodiment of the invention, the processed second image has a size of 384 × 128 × 3.
Step S40, generating a feature map from the second image through a feature extraction network to obtain a first feature map; and generating a hard attention diagram with the same size as the feature diagram by Gaussian transformation, binarization and normalization of the pedestrian key points.
In one embodiment of the invention, a preprocessed 384 × 128 × 3 second image is input into a feature extraction network ResNet-50 to generate a 2048 × 24 × 8 first feature map, which represents that the feature map has 24 × 8 dimensions and 2048 channels; the gaussian is gaussian with the key points of 17 pedestrians as the center to generate 17 gaussian maps, and the standard variance value σ of the gaussian is set to 16. Setting the threshold value to be 0.8, and carrying out binarization to obtain 17 binary images. The 17 binary images were summed and normalized to produce a 24 × 8 hard attention map of the same size as the first feature map.
As shown in fig. 2, a schematic diagram and a visual effect diagram of a hard attention mechanism according to an embodiment of a pedestrian re-identification method based on a posture and attention mechanism of the present invention are provided, in which a posture estimation network is used to extract a pedestrian posture of an input image and generate a pedestrian key point, and the pedestrian key point is gaussian, binarized and normalized to obtain the hard attention diagram, wherein a white dot in the middle of the diagram is a gaussian binary region generated by the pedestrian key point, that is, a hard attention region.
Step S50, fusing the first feature map and the hard attention map to obtain a second feature map, as shown in formula (1):
Figure BDA0002193412770000091
wherein, F1Is a first characteristic diagram, F2Is a second characteristic diagram, MaskhIn an attempt to achieve a hard attention, the user is forced to,respectively representing element-by-element multiplicationAnd added element by element.
In one embodiment of the invention, a first feature map generated based on a feature extraction network and a hard attention map are fused to generate a second feature map, and the size of the second feature map is 2048 × 24 × 8.
And step S60, acquiring a soft attention map with the same size as the second feature map through a soft attention network, and fusing the soft attention map with the second feature map to obtain a third feature map.
Step S61, obtaining a soft attention map with the same size as the second feature map through a soft attention network, as shown in formula (2):
Masks=Sigmoid(BN(Conv(ReLU(Conv(F2)))))
formula (2)
Wherein, MasksRepresenting a soft attention map, F2For the second characteristic diagram, Conv represents a 1 × 1 convolution operation, BN represents batch normalization, Sigmoid, ReLU represent activation functions.
Step S62, fusing the obtained soft attention map with the second feature map to obtain a third feature map, as shown in equation (3):
wherein, F2Is a second characteristic diagram, F3Is a third characteristic diagram, MasksIn order to achieve a soft attention-force diagram,
Figure BDA0002193412770000094
respectively, element-by-element multiplication and element-by-element addition.
In an embodiment of the present invention, a third feature map is generated by fusing the fused second feature map and the soft attention map, and the size of the third feature map is 2048 × 24 × 8.
As shown in fig. 3, for a schematic diagram of a soft attention mechanism and a visualization effect diagram of a pedestrian re-identification method based on a posture and attention mechanism according to an embodiment of the present invention, an input image is used to extract a feature diagram through a convolutional neural network, and a series of operations such as convolution, ReLU activation, convolution, batch normalization, and Sigmoid activation are performed on the feature diagram through the soft attention network, so as to obtain a soft attention diagram. Wherein ConV represents convolution operation, ReLU represents activation operation, BN represents batch normalization operation, and Sigmoid represents activation operation.
And step S70, performing global average pooling and feature dimension reduction on the third feature map to obtain a feature vector for calculating similarity to realize pedestrian matching, namely the feature vector for re-identifying pedestrians.
In one embodiment of the invention, the third feature map obtained by final fusion is subjected to global average pooling to obtain 2048-dimensional feature vectors, and the size of the global average pooled kernel is consistent with the height and width of the feature map. And then reducing the dimension of the 2048-dimensional feature vector to obtain a 256-dimensional feature vector, wherein the feature dimension reduction adopts 1 × 1 convolution operation, and then a Batch Normalization layer is used for carrying out feature Normalization operation and a ReLU activation function is used for carrying out nonlinear mapping operation on a feature map.
FIG. 4 is a schematic diagram of a network combining a hard attention mechanism and a soft attention mechanism according to an embodiment of the pedestrian re-identification method based on the posture and attention mechanism of the present invention, wherein F1、F2、F3Respectively a first characteristic diagram, a second characteristic diagram, a third characteristic diagram, G1、G2Respectively representing the feature vector after global tie pooling and the feature vector after feature dimensionality reduction; the pose estimation network is trained offline on a pose estimation dataset, the pose information of input pedestrian images is extracted in the schematic diagram, pedestrian key points are obtained, wherein GAP represents global average pooling,
Figure BDA0002193412770000101
respectively, element-by-element multiplication and element-by-element addition.
In the training process of the network model, after "performing global average pooling and feature dimension reduction on the third feature map to obtain a feature vector for calculating similarity to realize pedestrian matching, i.e., a feature vector for pedestrian re-recognition" in step S70, a step of supervised training is further provided, in which the method includes:
and performing supervised training on the extracted feature vectors on the acquired data set labeled with the pedestrian category by adopting cross entropy loss and triple loss.
In one embodiment of the invention, a single-card GPU (Nvidia 1080p) is used in the training process, the Batchsize is set to be 32, the optimizer uses Adam, the number of training rounds is set to be 500, the initial learning rate is set to be 2e-4, the learning rate continuously decreases with the increase of the number of training rounds, and the accuracy rate increases.
The feature vectors are used for representing pedestrians, the same pedestrian can have multiple pictures, the feature vectors extracted from the pictures of the same pedestrian are expected to be close in a vector space, and the feature vectors of different classes are expected to be farther in the vector space as well as better. Different pictures of the same pedestrian have the same pedestrian ID.
Calculating the cross entropy of the pedestrian category predicted by the extracted feature vector and the pedestrian category label corresponding to the feature vector, wherein the formula (4) is as follows:
Figure BDA0002193412770000111
wherein L issoftmaxRepresenting a cross-entropy function, wkWeight, w, representing the k-th classiIs the weight corresponding to the ith image in one Bathsize, C represents the number of the pedestrian categories in the acquired data set marked with the pedestrian categories, N represents the number of the images contained in one Bathsize, fiRepresenting the feature vector corresponding to the ith image in one Bathsize.
The triple loss is shown in equation (5):
Figure BDA0002193412770000112
wherein L istripletRepresents the loss function of the triplet in the form of,
Figure BDA0002193412770000113
representing a feature vector extracted from any reference pedestrian image in the training image set;
Figure BDA0002193412770000114
representing a feature vector extracted from another image representing the same person as the reference pedestrian as a positive sample;the feature vector extracted from the image representing the other person is used as a negative sample; α represents a threshold of the triplet constraint; p indicates that P IDs exist in one Bathsize, and K indicates that K images are selected from one ID.
In the model testing stage, the obtained 256-dimensional feature vectors are input into the pedestrian images of the query library and the image library, cosine similarity or Euclidean similarity is directly calculated, matching and sequencing are carried out according to the similarity, the pedestrian images with high similarity are more likely to be the same target, the pedestrian images with low similarity are less likely to be the same target, and therefore pedestrian re-identification is achieved.
The pedestrian re-recognition system based on the attitude and attention mechanism comprises an image acquisition module, an attitude extraction module, a correction module, a hard attention diagram generation module, a soft attention diagram generation module, a fusion module, a feature vector acquisition module and an output module;
the image acquisition module is configured to acquire a pedestrian image to be identified as a first image and input the first image to the attitude extraction module;
the attitude extraction module is configured to extract pedestrian attitude information of the first image sent by the image acquisition module by adopting an attitude estimation network, and generate pedestrian key points;
the correction module is configured to delete the redundant background information of the first image based on the pedestrian key point, and correct a pedestrian detection frame to obtain a second image;
the hard attention map generation module is configured to generate a feature map from the second image through a feature extraction network to obtain a first feature map; generating a hard attention diagram with the same size as the feature diagram by Gaussian, binary and normalization of the pedestrian key points;
the fusion module is configured to fuse the first feature map and the hard attention map to obtain a second feature map;
the soft attention map generation module is configured to acquire a soft attention map with the same size as the second feature map through a soft attention network, and fuse the soft attention map and the second feature map by using the fusion module to obtain a third feature map;
the feature vector acquisition module is configured to perform global average pooling and feature dimension reduction on the third feature map to obtain feature vectors for pedestrian re-identification;
the output module is configured to output the obtained feature vectors for calculating the similarity to realize pedestrian matching, namely the feature vectors for re-identifying the pedestrians.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiments, and will not be described herein again.
It should be noted that, the pedestrian re-identification system based on the gesture and attention mechanism provided in the above embodiment is only illustrated by the division of the above functional modules, and in practical applications, the above functions may be allocated to different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the above embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the above described functions. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.
A storage device of a third embodiment of the present invention has stored therein a plurality of programs adapted to be loaded and executed by a processor to implement the above-described pedestrian re-identification method based on the attitude and attention mechanism.
A processing apparatus according to a fourth embodiment of the present invention includes a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is adapted to be loaded and executed by a processor to implement the above-described pedestrian re-identification method based on the attitude and attention mechanism.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Those of skill in the art would appreciate that the various illustrative modules, method steps, and modules described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that programs corresponding to the software modules, method steps may be located in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.
The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (10)

1. A pedestrian re-identification method based on attitude and attention mechanisms is characterized by comprising the following steps:
step S10, acquiring a pedestrian image to be recognized as a first image;
step S20, extracting pedestrian attitude information of the first image by adopting an attitude estimation network, and generating pedestrian key points;
step S30, based on the pedestrian key points, deleting the redundant background information of the first image, and correcting a pedestrian detection frame to obtain a second image;
step S40, generating a feature map from the second image through a feature extraction network to obtain a first feature map; generating a hard attention diagram with the same size as the feature diagram by Gaussian, binary and normalization of the pedestrian key points;
step S50, fusing the first feature map and the hard attention map to obtain a second feature map;
step S60, acquiring a soft attention map with the same size as the second feature map through a soft attention network, and fusing the soft attention map with the second feature map to obtain a third feature map;
and step S70, performing global average pooling and feature dimension reduction on the third feature map to obtain a feature vector for calculating similarity to realize pedestrian matching, namely the feature vector for re-identifying pedestrians.
2. The pedestrian re-identification method based on the attitude and attention mechanism according to claim 1, wherein the redundant background information of the first image is:
and the upper, lower, left and right regions of the pedestrian in the first image.
3. The pedestrian re-identification method based on the attitude and attention mechanism according to claim 1, wherein in step S50, "fuse the first feature map and the hard attention map to obtain a second feature map", the method comprises:
Figure FDA0002193412760000011
wherein, F1Is a first characteristic diagram, F2Is a second characteristic diagram, MaskhIn an attempt to achieve a hard attention, the user is forced to,
Figure FDA0002193412760000021
Figure FDA0002193412760000022
respectively, element-by-element multiplication and element-by-element addition.
4. The pedestrian re-identification method based on the posture and attention mechanism according to claim 1, wherein in step S60, "obtaining the soft attention map with the same size as the second feature map through the soft attention network and fusing the soft attention map with the second feature map to obtain a third feature map" is performed by:
step S61, obtaining a soft attention map with the same size as the second feature map through a soft attention network:
Masks=Sigmoid(BN(Conv(ReLU(Conv(F2)))))
wherein, MasksRepresenting a soft attention map, F2Conv represents a 1 × 1 convolution operation, BN represents batch normalization, and Sigmoid and ReLU represent activation functions for a second characteristic diagram;
step S62, fusing the obtained soft attention map and the second feature map to obtain a third feature map:
wherein, F2Is a second characteristic diagram, F3Is a third characteristic diagram, MasksIn order to achieve a soft attention-force diagram,
Figure FDA0002193412760000024
respectively, element-by-element multiplication and element-by-element addition.
5. The pedestrian re-identification method based on the attitude and attention mechanism according to claim 1, wherein in step S70, after "global average pooling and feature dimensionality reduction on the third feature map to obtain feature vectors for pedestrian re-identification", there is further provided a step of enhancing identification, and the method comprises:
and performing supervised training on the extracted feature vectors on the acquired data set labeled with the pedestrian category by adopting cross entropy loss and triple loss.
6. The pedestrian re-identification method based on the attitude and attention mechanism according to claim 6, wherein the cross entropy penalty is:
Figure FDA0002193412760000031
wherein L issoftmaxRepresenting a cross-entropy function, wkWeight, w, representing the k-th classiIs the weight corresponding to the ith image in one Bathsize, C represents the number of the pedestrian categories in the acquired data set marked with the pedestrian categories, N represents the number of the images contained in one Bathsize, fiRepresenting the feature vector corresponding to the ith image in one Bathsize.
7. The pedestrian re-identification method based on the attitude and attention mechanism according to claim 1, wherein the triplet penalty is:
Figure FDA0002193412760000032
wherein L istripletRepresents the loss function of the triplet in the form of,
Figure FDA0002193412760000033
representing a feature vector extracted from any reference pedestrian image in the training image set;
Figure FDA0002193412760000034
representing a feature vector extracted from another image representing the same person as the reference pedestrian as a positive sample;
Figure FDA0002193412760000035
the feature vector extracted from the image representing the other person is used as a negative sample; α represents a threshold of the triplet constraint; p indicates that P IDs exist in one Bathsize, and K indicates that K images are selected from one ID.
8. A pedestrian re-recognition system based on a posture and attention mechanism is characterized by comprising an image acquisition module, a posture extraction module, a correction module, a hard attention diagram generation module, a soft attention diagram generation module, a fusion module, a feature vector acquisition module and an output module;
the image acquisition module is configured to acquire a pedestrian image to be identified as a first image and input the first image to the attitude extraction module;
the attitude extraction module is configured to extract pedestrian attitude information of the first image sent by the image acquisition module by adopting an attitude estimation network, and generate pedestrian key points;
the correction module is configured to delete the redundant background information of the first image based on the pedestrian key point, and correct a pedestrian detection frame to obtain a second image;
the hard attention map generation module is configured to generate a feature map from the second image through a feature extraction network to obtain a first feature map; generating a hard attention diagram with the same size as the feature diagram by Gaussian, binary and normalization of the pedestrian key points;
the fusion module is configured to fuse the first feature map and the hard attention map to obtain a second feature map;
the soft attention map generation module is configured to acquire a soft attention map with the same size as the second feature map through a soft attention network, and fuse the soft attention map and the second feature map by using the fusion module to obtain a third feature map;
the feature vector acquisition module is configured to perform global average pooling and feature dimension reduction on the third feature map to obtain feature vectors for pedestrian re-identification;
the output module is configured to output the obtained feature vectors for calculating the similarity to realize pedestrian matching, namely the feature vectors for re-identifying the pedestrians.
9. A storage device having stored thereon a plurality of programs, wherein the programs are adapted to be loaded and executed by a processor to implement the method of pedestrian re-identification based on the gesture and attention mechanism of any one of claims 1 to 7.
10. A treatment apparatus comprises
A processor adapted to execute various programs; and
a storage device adapted to store a plurality of programs;
wherein the program is adapted to be loaded and executed by a processor to perform:
a pedestrian re-identification method based on attitude and attention mechanisms as claimed in any one of claims 1 to 7.
CN201910840108.4A 2019-09-06 2019-09-06 Pedestrian re-identification method, system and device based on attitude and attention mechanism Expired - Fee Related CN110659589B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910840108.4A CN110659589B (en) 2019-09-06 2019-09-06 Pedestrian re-identification method, system and device based on attitude and attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910840108.4A CN110659589B (en) 2019-09-06 2019-09-06 Pedestrian re-identification method, system and device based on attitude and attention mechanism

Publications (2)

Publication Number Publication Date
CN110659589A true CN110659589A (en) 2020-01-07
CN110659589B CN110659589B (en) 2022-02-08

Family

ID=69038056

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910840108.4A Expired - Fee Related CN110659589B (en) 2019-09-06 2019-09-06 Pedestrian re-identification method, system and device based on attitude and attention mechanism

Country Status (1)

Country Link
CN (1) CN110659589B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401265A (en) * 2020-03-19 2020-07-10 重庆紫光华山智安科技有限公司 Pedestrian re-identification method and device, electronic equipment and computer-readable storage medium
CN111428675A (en) * 2020-04-02 2020-07-17 南开大学 Pedestrian re-recognition method integrated with pedestrian posture features
CN111488797A (en) * 2020-03-11 2020-08-04 北京交通大学 Pedestrian re-identification method
CN111652035A (en) * 2020-03-30 2020-09-11 武汉大学 Pedestrian re-identification method and system based on ST-SSCA-Net
CN111898431A (en) * 2020-06-24 2020-11-06 南京邮电大学 Pedestrian re-identification method based on attention mechanism part shielding
CN112463924A (en) * 2020-11-27 2021-03-09 齐鲁工业大学 Text intention matching method for intelligent question answering based on internal correlation coding
CN112528059A (en) * 2021-02-08 2021-03-19 南京理工大学 Deep learning-based traffic target image retrieval method and device and readable medium
CN112800967A (en) * 2021-01-29 2021-05-14 重庆邮电大学 Posture-driven shielded pedestrian re-recognition method
WO2022160772A1 (en) * 2021-01-27 2022-08-04 武汉大学 Person re-identification method based on view angle guidance multi-adversarial attention
US20230098817A1 (en) * 2021-09-27 2023-03-30 Uif (University Industry Foundation), Yonsei University Weakly supervised object localization apparatus and method

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107484017A (en) * 2017-07-25 2017-12-15 天津大学 Supervision video abstraction generating method is had based on attention model
US20180114435A1 (en) * 2016-10-26 2018-04-26 Microsoft Technology Licensing, Llc Pedestrian alerts for mobile devices
CN108345837A (en) * 2018-01-17 2018-07-31 浙江大学 A kind of pedestrian's recognition methods again based on the study of human region alignmentization feature representation
CN108520226A (en) * 2018-04-03 2018-09-11 东北大学 A kind of pedestrian's recognition methods again decomposed based on body and conspicuousness detects
CN108764308A (en) * 2018-05-16 2018-11-06 中国人民解放军陆军工程大学 A kind of recognition methods again of the pedestrian based on convolution loop network
CN108805078A (en) * 2018-06-11 2018-11-13 山东大学 Video pedestrian based on pedestrian's average state recognition methods and system again
CN108829677A (en) * 2018-06-05 2018-11-16 大连理工大学 A kind of image header automatic generation method based on multi-modal attention
CN109598225A (en) * 2018-11-29 2019-04-09 浙江大学 Sharp attention network, neural network and pedestrian's recognition methods again

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180114435A1 (en) * 2016-10-26 2018-04-26 Microsoft Technology Licensing, Llc Pedestrian alerts for mobile devices
CN107484017A (en) * 2017-07-25 2017-12-15 天津大学 Supervision video abstraction generating method is had based on attention model
CN108345837A (en) * 2018-01-17 2018-07-31 浙江大学 A kind of pedestrian's recognition methods again based on the study of human region alignmentization feature representation
CN108520226A (en) * 2018-04-03 2018-09-11 东北大学 A kind of pedestrian's recognition methods again decomposed based on body and conspicuousness detects
CN108764308A (en) * 2018-05-16 2018-11-06 中国人民解放军陆军工程大学 A kind of recognition methods again of the pedestrian based on convolution loop network
CN108829677A (en) * 2018-06-05 2018-11-16 大连理工大学 A kind of image header automatic generation method based on multi-modal attention
CN108805078A (en) * 2018-06-11 2018-11-13 山东大学 Video pedestrian based on pedestrian's average state recognition methods and system again
CN109598225A (en) * 2018-11-29 2019-04-09 浙江大学 Sharp attention network, neural network and pedestrian's recognition methods again

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
徐阳: "基于卷积神经网络的行人重识别算法研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111488797A (en) * 2020-03-11 2020-08-04 北京交通大学 Pedestrian re-identification method
CN111488797B (en) * 2020-03-11 2023-12-05 北京交通大学 Pedestrian re-identification method
CN111401265B (en) * 2020-03-19 2020-12-25 重庆紫光华山智安科技有限公司 Pedestrian re-identification method and device, electronic equipment and computer-readable storage medium
CN111401265A (en) * 2020-03-19 2020-07-10 重庆紫光华山智安科技有限公司 Pedestrian re-identification method and device, electronic equipment and computer-readable storage medium
CN111652035A (en) * 2020-03-30 2020-09-11 武汉大学 Pedestrian re-identification method and system based on ST-SSCA-Net
CN111428675A (en) * 2020-04-02 2020-07-17 南开大学 Pedestrian re-recognition method integrated with pedestrian posture features
CN111898431B (en) * 2020-06-24 2022-07-26 南京邮电大学 Pedestrian re-identification method based on attention mechanism part shielding
CN111898431A (en) * 2020-06-24 2020-11-06 南京邮电大学 Pedestrian re-identification method based on attention mechanism part shielding
CN112463924A (en) * 2020-11-27 2021-03-09 齐鲁工业大学 Text intention matching method for intelligent question answering based on internal correlation coding
WO2022160772A1 (en) * 2021-01-27 2022-08-04 武汉大学 Person re-identification method based on view angle guidance multi-adversarial attention
US11804036B2 (en) 2021-01-27 2023-10-31 Wuhan University Person re-identification method based on perspective-guided multi-adversarial attention
CN112800967B (en) * 2021-01-29 2022-05-17 重庆邮电大学 Posture-driven shielded pedestrian re-recognition method
CN112800967A (en) * 2021-01-29 2021-05-14 重庆邮电大学 Posture-driven shielded pedestrian re-recognition method
CN112528059A (en) * 2021-02-08 2021-03-19 南京理工大学 Deep learning-based traffic target image retrieval method and device and readable medium
US20230098817A1 (en) * 2021-09-27 2023-03-30 Uif (University Industry Foundation), Yonsei University Weakly supervised object localization apparatus and method

Also Published As

Publication number Publication date
CN110659589B (en) 2022-02-08

Similar Documents

Publication Publication Date Title
CN110659589B (en) Pedestrian re-identification method, system and device based on attitude and attention mechanism
Ranjan et al. Unconstrained age estimation with deep convolutional neural networks
Carmona et al. Human action recognition by means of subtensor projections and dense trajectories
CN111310705A (en) Image recognition method and device, computer equipment and storage medium
Ravì et al. Real-time food intake classification and energy expenditure estimation on a mobile device
Khan et al. A review of human pose estimation from single image
Nuevo et al. RSMAT: Robust simultaneous modeling and tracking
US10007678B2 (en) Image processing apparatus, image processing method, and recording medium
Liu et al. Adaptive cascade regression model for robust face alignment
Liu et al. Adaptive compressive tracking via online vector boosting feature selection
CN110909565B (en) Image recognition and pedestrian re-recognition method and device, electronic and storage equipment
Gawande et al. SIRA: Scale illumination rotation affine invariant mask R-CNN for pedestrian detection
CN111291612A (en) Pedestrian re-identification method and device based on multi-person multi-camera tracking
Kim et al. Real-time facial feature extraction scheme using cascaded networks
CN110826534B (en) Face key point detection method and system based on local principal component analysis
CN115690803A (en) Digital image recognition method and device, electronic equipment and readable storage medium
CN111104911A (en) Pedestrian re-identification method and device based on big data training
Du et al. Discriminative hash tracking with group sparsity
CN114168768A (en) Image retrieval method and related equipment
CN116778533A (en) Palm print full region-of-interest image extraction method, device, equipment and medium
Dutra et al. Re-identifying people based on indexing structure and manifold appearance modeling
JP6486084B2 (en) Image processing method, image processing apparatus, and program
CN114202659A (en) Fine-grained image classification method based on spatial symmetry irregular local region feature extraction
Ye et al. Person Re-Identification for Robot Person Following with Online Continual Learning
Li et al. Object tracking based on bit-planes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220208

CF01 Termination of patent right due to non-payment of annual fee