CN115620338A - Method and device for re-identifying clothes-changing pedestrians guided by black clothes and head images - Google Patents
Method and device for re-identifying clothes-changing pedestrians guided by black clothes and head images Download PDFInfo
- Publication number
- CN115620338A CN115620338A CN202211258905.XA CN202211258905A CN115620338A CN 115620338 A CN115620338 A CN 115620338A CN 202211258905 A CN202211258905 A CN 202211258905A CN 115620338 A CN115620338 A CN 115620338A
- Authority
- CN
- China
- Prior art keywords
- clothes
- pedestrian
- black
- image
- branch
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/42—Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- Human Computer Interaction (AREA)
- Databases & Information Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a method and a device for re-identifying clothes-changing pedestrians guided by black clothes and head images, wherein the method comprises the following steps: firstly, processing an original image by using a newly designed clothes shielding method to obtain a corresponding black clothes image, and putting the obtained black clothes image in a black clothes branch for pre-training; then, performing combined learning on the frame, putting the original pedestrian image into the original branch, and guiding the original branch learning by using the pre-trained black clothing branch; meanwhile, the pedestrian head image is put into the head branch so as to obtain pedestrian features with finer granularity. According to the method, the clothes of all pedestrians are shielded, so that black clothes images are obtained, the colors of the clothes of the pedestrians are uniform, the model focuses more on the parts except the colors of the clothes, and the robustness of the model is improved; and the information in the original image can be effectively utilized, the information loss in the black clothes image acquisition process is effectively reduced, and the robustness of the characteristics is improved.
Description
Technical Field
The invention relates to the technical field of pedestrian re-identification, in particular to a clothes-changing pedestrian re-identification method and device guided by black clothes and head images.
Background
The purpose of pedestrian re-identification is to solve the problem of pedestrian retrieval under different conditions, such as different cameras, different lights or different observation angles. The pedestrian re-identification research has various methods, such as light weight networks, domain generalization, unsupervised learning and other sub-fields, and good effects are achieved in recent years. These methods generally assume that one's clothing remains consistent over a long period of time.
However, in the real world, the clothes of people do not become unchanged. For example, people always wear different clothing for a long period of time, and some suspects may evade tracking by changing their clothing for a short period of time. Therefore, a different version of the pedestrian re-identification problem is proposed, which is called long-term dressing change pedestrian re-identification and becomes a hot problem today. In the long run, re-identification of clothes changing pedestrians is a hot problem today. The core for solving the problem of re-identification of clothes-changing pedestrians is to extract relevant features which are only relevant to identities and have identifiability. To remove the disturbing items of clothing, researchers typically employ two general strategies.
The first is data policy. A common approach is to construct a large data set where each person should have multiple pictures with a large number of different clothes and then force the model to learn clothes-independent features from these pictures. However, constructing such dressing change data sets purely by human power is very laborious and almost impossible. Thus, some researchers have extended the original data set using GAN or other means.
The second is a feature separation strategy. A common operation is to separate the garment features from other identity features. By doing so, other features besides clothing may be used for identity determination. For example, poplar et al take the pedestrian's outline as a query and gallery and use polar coordinates to better obtain the pedestrian's outline features. However, while learning from the profile may result in a clothing-independent feature, it also discards a significant portion of the clothing-independent feature (e.g., the head). In addition to this, flood et al propose the use of appearance branches and shape branches to extract fine-grained features. However, this method is often affected by different colors of the clothing and does not extract more robust features that are not related to clothing.
The main problems existing in the prior art are as follows:
1. the existing clothes-changing pedestrian re-identification method needs a large amount of image generation work and training time.
2. The existing clothes-changing pedestrian re-identification method is often influenced by different colors of clothes, and more stable features irrelevant to the clothes cannot be extracted.
3. Most of the existing clothes-changing pedestrian re-identification methods adopt a traditional convolutional neural network as a training network, and the traditional convolutional neural network brings certain loss due to downsampling and pooling.
4. The existing clothes-changing pedestrian re-identification method mostly ignores the influence of the head characteristics on the overall judgment.
Disclosure of Invention
The invention provides a method and a device for re-identifying clothes-changing pedestrians guided by black clothes and head images, which aim at the problems in the background art, uses a non-GAN method to extract features irrelevant to clothes in images, provides a new clothes-shielding strategy to enable clothes of all pedestrians to tend to be consistent, forces a model to learn steady features irrelevant to clothes, adopts an improved Transformer as a training network, and separately designs head branches to obtain fine-grained head features of an original image.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention provides a clothes changing pedestrian re-identification method guided by black clothes and head images, which comprises the following steps:
step 1: deleting clothes features from the original pedestrian image to obtain a black clothes pedestrian image;
step 2: processing the original pedestrian image by using a pre-trained HRNet to obtain a pedestrian head image in the original pedestrian image;
and step 3: constructing a coat-changing pedestrian re-identification network, wherein the network consists of three network branches, namely an original branch, a black coat branch and a head branch, and is respectively used for learning original pedestrian image characteristics, black clothes pedestrian image characteristics and pedestrian head image characteristics; the three network branch main networks have the same structure and do not share parameters;
and 4, step 4: inputting the black pedestrian image of the clothes obtained in the step 1 into a black clothing branch for training to obtain a pre-trained black clothing branch;
and 5: training an original branch under the guidance of a pre-trained black clothes branch to obtain features related to pedestrians but not related to clothes in an original pedestrian image;
and 6: inputting the pedestrian head image obtained in the step 2 into a head branch for learning, combining the learned pedestrian head image features with the features which are related to the pedestrian and unrelated to the clothes obtained in the step 5 to obtain the overall features which are related to the pedestrian and unrelated to the clothes, and finishing the clothing changing pedestrian re-recognition network training;
and 7: and carrying out re-identification on the clothes-changing pedestrians based on the trained re-identification network for the clothes-changing pedestrians.
Further, the step 1 comprises:
adopting a pre-trained human body analytic model to obtain body part images of pedestrians in an original pedestrian image, and recombining the obtained body part images to obtain six parts: background, head, jacket, trousers, arms and legs, extracting pixels of the jacket and trousers images to form a clothes area; all pixels of the clothing area are set to zero, resulting in a black clothing pedestrian image.
Further, the three network branch trunk networks are all imViT networks, two outputs are used for extracting global features and local features in each imViT network respectively, the trunk networks are optimized by triple losses and ID losses on the global features and the local features respectively, and the ID losses are cross entropy losses without label smoothness.
Further, the step 5 comprises:
training branches of the original image under the guidance of pre-trained black clothes branches according to a knowledge distillation algorithm, and normalizing the training of the original branches by adopting mean square error loss so as to train more characteristics which are related to identity but not related to clothes.
In another aspect, the present invention provides a pedestrian re-identification device for changing clothes guided by black clothes and head images, comprising:
the black clothes image obtaining module is used for deleting clothes characteristics from the original pedestrian image to obtain a black clothes pedestrian image;
the head image obtaining module is used for processing the original pedestrian image by using the pre-trained HRNet to obtain a pedestrian head image in the original pedestrian image;
the clothes changing pedestrian re-identification network construction module is used for constructing a clothes changing pedestrian re-identification network, and the network consists of three network branches, namely an original branch, a black clothes branch and a head branch, and is respectively used for learning original pedestrian image characteristics, black clothes pedestrian image characteristics and pedestrian head image characteristics; the three network branch main networks have the same structure and do not share parameters;
the black clothing branch training module is used for inputting the black clothing pedestrian image obtained by the black clothing image obtaining module into the black clothing branch for training to obtain a pre-trained black clothing branch;
the original branch training module is used for training an original branch under the guidance of the pre-trained black clothes branch to obtain the characteristics related to the pedestrian but not related to clothes in the original pedestrian image;
the head branch training module is used for inputting the pedestrian head image obtained by the head image obtaining module into a head branch for learning, combining the learned pedestrian head image characteristics with the characteristics which are obtained by the original branch training module and are related to pedestrians but not related to clothes to obtain the overall characteristics which are related to pedestrians but not related to clothes, and finishing the clothes-changing pedestrian re-recognition network training;
and the clothes changing pedestrian re-recognition module is used for carrying out clothes changing pedestrian re-recognition based on the trained clothes changing pedestrian re-recognition network.
Further, the black clothes image obtaining module is specifically configured to:
adopting a pre-trained human body analytic model to obtain body part images of pedestrians in an original pedestrian image, and recombining the obtained body part images to obtain six parts: background, head, jacket, trousers, arms and legs, extracting pixels of the jacket and trousers images to form a clothes area; all pixels of the clothing area are set to zero, resulting in a black clothing pedestrian image.
Further, the three network branch trunk networks are all imViT networks, two outputs are used for extracting global features and local features in each imViT network respectively, the trunk networks are optimized by triple losses and ID losses on the global features and the local features respectively, and the ID losses are cross entropy losses without label smoothness.
Further, the original branch training module is specifically configured to:
training the branches of the original image under the guidance of the pre-trained black clothes branches according to a knowledge distillation algorithm, and normalizing the training of the original branches by adopting mean square error loss so as to train more identity-related but clothes-unrelated features.
Compared with the prior art, the invention has the following beneficial effects:
1. in contrast to methods that generate extensive dressing change image expansion datasets using GAN, the present invention uses a non-GAN approach. The method utilizes the human body semantic analysis model to obtain the coat and trousers parts of the human body, and shields the coat and trousers parts by using the method of the invention, so that the model can learn the characteristics irrelevant to clothes more intensively without expanding a data set, thereby saving space and time.
2. Compared with other methods for separating clothes characteristics from identity characteristics, the method has the advantages that all the clothes of the pedestrians are shielded, so that black clothes images are obtained, the colors of the clothes of the pedestrians are uniform, the model pays more attention to parts except the colors of the clothes, and the robustness of the model is improved.
3. Compared with the method of directly learning the features from the images after the clothes are removed, the method of the invention utilizes the black clothes branch to guide the original branch to directly learn the identity features irrelevant to the clothes from the original RGB images, thus effectively utilizing the information in the original images, effectively reducing the information loss in the process of obtaining the black clothes images and improving the robustness of the features.
4. Compared with a method for directly extracting clothes-irrelevant features from an original image, the method has the advantages that the special processing of the head image blocks is added, so that the more discriminant fine-grained features can be extracted, and the method has a better complementary effect on the global features.
5. The test result on the PRCC data set shows that the method of the invention achieves excellent re-identification effect on clothes-changing pedestrians.
Drawings
FIG. 1 is a basic flow chart of a pedestrian re-identification method for changing clothes guided by black clothes and head images according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a pedestrian re-identification network architecture for clothes change constructed according to an embodiment of the present invention;
FIG. 3 is a comparative example of a feature activation map derived by an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a pedestrian re-identification device for changing clothes guided by black clothes and head images according to an embodiment of the invention.
Detailed Description
The invention is further illustrated by the following examples in conjunction with the drawings and the accompanying drawings:
as shown in fig. 1, a method for re-identifying a clothed pedestrian guided by black clothes and head images, includes:
step 1: deleting clothes features from the original pedestrian image to obtain a black clothes pedestrian image (simply referred to as a black clothes image);
step 2: processing the original pedestrian image by using a pre-trained HRNet to obtain a pedestrian head image in the original pedestrian image;
and step 3: constructing a clothes-changing pedestrian re-identification network, wherein the network consists of three network branches, namely an original branch, a black clothes branch and a head branch, and is respectively used for learning original pedestrian image characteristics, black clothes pedestrian image characteristics and pedestrian head image characteristics as shown in fig. 2; the three network branch main networks have the same structure and do not share parameters, so that the distinguishing feature space is better utilized;
and 4, step 4: inputting the black pedestrian image of the clothes obtained in the step 1 into a black clothing branch for training to obtain a pre-trained black clothing branch;
and 5: training an original branch under the guidance of a pre-trained black clothes branch to obtain features related to pedestrians but not related to clothes in an original pedestrian image;
step 6: inputting the pedestrian head image obtained in the step 2 into a head branch for learning, combining the learned pedestrian head image features with the features which are related to the pedestrian and unrelated to the clothes obtained in the step 5 to obtain the overall features which are related to the pedestrian and unrelated to the clothes, and finishing the clothing changing pedestrian re-recognition network training;
and 7: and carrying out re-identification on the clothes-changing pedestrians based on the trained re-identification network for the clothes-changing pedestrians.
Further, the step 1 comprises:
adopting a pre-trained human body analytic model to obtain body part images of pedestrians in an original pedestrian image, and recombining the obtained body part images to obtain six parts: a background, a head, a jacket, pants, arms, and legs, from which pixels of the jacket and pants images are extracted to form a clothing region; all pixels of the clothing area are set to zero, resulting in a black clothing pedestrian image.
Further, the three network branch trunk networks are all imViT networks, two outputs are used for extracting global features and local features in each imViT network respectively, the trunk networks are optimized by triple losses and ID losses on the global features and the local features respectively, and the ID losses are cross entropy losses without label smoothness.
Further, the step 5 comprises:
training the branches of the original image under the guidance of the pre-trained black clothes branches according to a knowledge distillation algorithm, and normalizing the training of the original branches by adopting mean square error loss so as to train more identity-related but clothes-unrelated features.
Specifically, the method comprises the following steps:
1. obtaining black clothing image
In order to eliminate the influence of clothes in feature extraction, clothes features are deleted from an original image, and a black clothes pedestrian image is obtained. First, we use HRNet, a pre-trained human body analysis model, to obtain images of parts of the body. The prediction of this model divides the body into 20 parts, since this model divides 20 parts, we recombine them to get six parts: background, head, jacket, pants, arms, and legs. We extract the jacket and pants from it to form the garment area. We use only two of them (jacket, pants).
First, for a given input batch of samples x i [i=1....B]Where B is the batch size, x i Is in fact an image, x i ∈R H×W×C Wherein H, W, C represents its height, width, and number of channels, respectively. First, we will x i The semantic graph analyzed by the human body analysis model is represented as s i [i=1....B],s i ∈R 1×H×W 。s i Is defined as s i E {0,1,2,3,4,5} represents six parts of the body, respectively. s i Is defined as 0,1,2,3,4 or 5, respectively representing six parts of the bodyAnd (4) dividing.
Second, the pixels of the jacket and the pants are obtained. x is the number of i Is defined as v j Having c values, x for each input sample i There is a total of W × H pixel vectors. And we will x i All pixel vectors for the middle jacket and pants are expressed as:
B(clothes and pants)={v j |v j =x i [s i ==2||s i ==3],i∈[1,B],j∈[1,N]} (1)
wherein N is each x i Total number of trousers and jacket in middle, and at each x i Is different in (1), j represents j th Pixel vector, s i Representing a semantic segmentation map, 2 representing the index of the jacket, 3 representing the index of the trousers, x i [s i ==2||s i ==3]Represents x i A jacket and pants of pixel vectors. Thus, the original image x i Can be regarded as x i =[v 1 ,v 2 ...v c1 ,...,v cn ,...v last ]Wherein [ v ] c1 ...v cn ]Pixels belonging to trousers and jackets obtained from equation (1).
Finally, the black image is obtained by setting the pixels of the pants and the jacket to zero. Specifically, we set [ v ] c1 ...v cn ]=0, then x can be obtained i′ =[v 1 ,v 2 ...0,...v last ]This is then x i Corresponding to the black clothes image.
2. Backbone network for each branch office
We selected imViT (see in particular [ He, shuting, et al. "Transreid: transformer-based object re-identification." "Proceedings of the IEEE/CVF International Conference on Computer Vision.2021.1,2,4]) As the backbone network for each of our branches. In each imViT, there are two outputs for extracting global and local features. For local we can choose how many local sub-features the local feature in the network is composed of, and in the experiments of the present invention, 4 local sub-features are chosen. Thus, for the i-th (i ∈ {0,1,2 }) branch, we can get the local feature F li And global feature F gi In which F is li =[F li1 ,F li2 ,F li3 ,F li4 ,i∈{0,1,2}]。
The backbone network is optimized by triplet losses and ID losses on global and local features, respectively. The ID penalty is a cross-entropy penalty without tag smoothing. As for the triad loss L tri As follows:
L tri =log(1+exp(||f a -f p || 2 -||f a -f n || 2 ))
wherein f is a Represents an anchor, f p Represents a positive sample, and f n Representing negative examples. Thus, each branch of our framework has two sets of penalty functions, L respectively tri-gi 、L id-gi And L is tri-li 、L id-li Wherein L is tri-gi 、L id-gi Are respectively global features F gi Corresponding triplet lost, ID lost, L tri-li 、L id-li Are respectively local features F li Corresponding triplet loss, ID loss.
3. Original branch guided by black clothes branch
The black clothes branch can learn features unrelated to clothes. Some important authentication information may be discarded during the process of obtaining the black coat image, which is hidden in the original image. And it is not feasible to extract the identity features that are not related to clothes directly from the original image. According to the knowledge distillation algorithm, we train the branches of the original image under the guidance of the pre-trained black clothing branches. In particular, we use Mean Square Error (MSE) loss to normalize the training of the original branch to train out more identity-related but clothing-unrelated features. Loss of mean square error L mse Is defined as
L mse =L mse-opg +L mse-opl
Wherein F l1 、F g1 Representing local and global features derived from black branches, respectively, F l2 、F g2 Respectively representing the original branch guided by the black branch resulting in connected local and global features, L mse-opg 、L mse-opl Respectively representing global features F opg =[F g1 ,F g2 ]And local feature F opl =[F l1 ,F l2 ]Corresponding loss of mean square error.
4. Extraction of head features
For the input raw images, we use pre-trained HRNet to obtain the head image portion. Putting the obtained head image into the imViT3 to obtain a combined local feature F l3 And global feature F g3 Wherein the former is composed of four local features F l31 ,F l32 ,F l33 ,F l34 And combining the components. Likewise, the original branch guided by the black branch may get the connected local feature F l2 And global feature F g2 . To obtain more uncertain clothes-related features, we are on F l2 And F l3 、F g2 And F g3 A summation of elements is performed, which is defined as follows:
F ohg =wF g2 +(1-w)F g3
F ohl =wF l2 +(1-w)F l3
wherein, F ohg 、F ohl Respectively represent the global overall characteristic and the local overall characteristic which are related to pedestrians but not related to clothes, w is a weight coefficient, and w is an element (0,1).
And the triplet losses and ID losses are applied to them, they are L id-ohg 、L tri-ohg 、L id-ohl And L tri-ohl Wherein L is id-ohg 、L tri-ohg Respectively represent F ohg Corresponding ID loss, triplet loss, L id-ohl 、L tri-ohl Respectively represent F ohl Corresponding ID loss, threeThe tuple is lost.
5. Joint training
In training, there are two phases. First, we send the black clothes image into the black clothes branch and use the global featureAndit was trained separately with ID penalties and triplet penalties (for simplicity we omitted these two penalties in the black clothing branch of fig. 1). We then obtained pre-trained branches of black clothing.
Second, we fix the learning weights of the black clothing branches and train the other branches in combination. Thus, the total loss function is defined as:
L total (θ)=l 1 L id (θ)+l 2 L tri (θ)+l 3 L mse (θ)
wherein L is id (θ) represents ID loss, L tri (θ) represents the triplet loss, L mse (θ) represents the mean square error loss. l 1 ,l 2 ,l 3 Is a trade-off parameter that balances the contributions of losses. In the experiments of the present invention,/ 1 ,l 2 ,l 3 Are set to 0.25, 0.25 and 0.5, respectively.
To verify the effect of the present invention, the following experiment was performed:
we have experimented with PRCC (PRCC includes 33698 pictures from 221 people, 3 different angles, also provides sketches of the outline of people, facilitates extraction of outline information of people) datasets in different garment settings and the same garment setting, respectively. In a different clothing setting, the image from camera A is used for the candidate set, and the image from camera C is used as the lookup set. In the same clothing setting, the image of the galery is also from camera a, but the image of the query is from camera B. The results are shown in Table 1.
TABLE 1 results of the experiment
We have made some comparisons of the method of the present invention with some of the most advanced methods on the PRCC dataset, including the representative traditional pedestrian re-identification methods (PCB [ Y. Sun, L. Zheng, Y. Yang, Q. Tian, and S. Wang, "Beyond part models: person retrieved with reconstructed part sales (and a strong connected basic), in Proceedings of the European Conference Computer Vision (ECCV), 2018, pp.480-496.],Zheng et al’s method[Z.Zheng,L.Zheng,and Y.Yang,“A discriminatively learned cnn embedding for person reidentification,”ACM transactions on multimedia computing,communications,and applications(TOMM),vol.14,no.1,pp.1–20,2017.],HPM[Y.Fu,Y.Wei,Y.Zhou,H.Shi,G.Huang,X.Wang,Z.Yao,and T.Huang,“Horizontal pyramid matching for person re-identification,”in Proceedings of the AAAI conference on artificial intelligence,vol.33,no.01,2019,pp.8295–8302.],HACNN[W.Li,X.Zhu,and S.Gong,“Harmonious attention network for person re-identification,”in Proceedings of the IEEE conference on computer vision and pattern recognition,2018,pp.2285–2294.]) And clothing changing pedestrian re-identification method (PRCC (sketch) [ Q.Yang, A.Wu, and W. -S.Zheng ", person re-identification by consistent Person sketch under modified change," IEEE transactions on pattern analysis and machine interaction, vol.43, no.6, pp.2029-2046,2019.],GI-ReID(OSNet)[X.Jin,T.He,K.Zheng,Z.Yin,X.Shen,Z.Huang,R.Feng,J.Huang,Z.Chen,and X.-S.Hua,“Cloth-changing person re-identification from a single image with gait prediction and regularization,”in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2022,pp.14 278–14 287.],LightMBN[F.Herzog,X.Ji,T.Teepe,S.J.Gilg,and G.Rigoll,“Lightweight multi-branch network for person re-identification,”in 2021 IEEE International Conference on Image Processing(ICIP).IEEE,2021,pp.1129–1133.]). The results of all comparison methods are from their published papers.
It can be seen from table 1 that the method of the invention achieves very good performance. On the same garment, the method of the present invention is slightly inferior to LightMBN, but almost exceeds all other methods. This indicates a better generalization of the method of the present invention to pedestrian re-identification of the same garment. In different clothes settings, the method of the invention shows obvious excellent performance, and the accuracy of mAP (mAP is used for evaluating the overall effect of the pedestrian re-identification algorithm, wherein AP refers to the average precision of one query sample and represents the effect of the model on one sample, and mAP is the average value of AP of all the query samples and represents the overall effect of the model on all the query samples) exceeds 4.7% of LightMBN, and the accuracy of rank-1 (namely R@1 in Table 1) exceeds 9.6%. This may be due to the fact that the method of the present invention is able to extract more identity features that are not related to clothing.
For visual analysis, we randomly extracted 3 images from the PRCC and shown in fig. 3 the baseline and the Corresponding Activation Map (CAM) captured by the method of the invention. As can be seen from the first row, the baseline activation region is concentrated on clothing and backgrounds, many of which are activated and used to identify a person, which may confuse the identification results. In addition, the baseline draws less attention to the head. From the second row it can be seen that the activation points in our proposed method are more concentrated and the background and clothing areas are less, which reduces the impact of the background and clothing on the recognition. Furthermore, we also note that our proposed model gives more attention to the shape of the head and body.
On the basis of the above embodiment, as shown in fig. 4, the present invention further provides a clothes-changing pedestrian re-identification device guided by black clothes and head images, comprising:
the black clothes image obtaining module is used for deleting clothes characteristics from the original pedestrian image to obtain a black clothes pedestrian image;
the head image obtaining module is used for processing the original pedestrian image by using the pre-trained HRNet to obtain a pedestrian head image in the original pedestrian image;
the clothes changing pedestrian re-identification network construction module is used for constructing a clothes changing pedestrian re-identification network, and the network consists of three network branches, namely an original branch, a black clothes branch and a head branch, and is respectively used for learning original pedestrian image characteristics, black clothes pedestrian image characteristics and pedestrian head image characteristics; the three network branch main networks have the same structure and do not share parameters;
the black clothing branch training module is used for inputting the black clothing pedestrian image obtained by the black clothing image obtaining module into the black clothing branch for training to obtain a pre-trained black clothing branch;
the original branch training module is used for training an original branch under the guidance of the pre-trained black clothes branch to obtain the characteristics related to the pedestrian but not related to clothes in the original pedestrian image;
the head branch training module is used for inputting the pedestrian head image obtained by the head image obtaining module into a head branch for learning, combining the learned pedestrian head image characteristics with the characteristics which are obtained by the original branch training module and are related to pedestrians but not related to clothes to obtain the overall characteristics which are related to pedestrians but not related to clothes, and finishing the clothes-changing pedestrian re-recognition network training;
and the clothes changing pedestrian re-recognition module is used for carrying out clothes changing pedestrian re-recognition based on the trained clothes changing pedestrian re-recognition network.
Further, the black clothes image obtaining module is specifically configured to:
adopting a pre-trained human body analytic model to obtain body part images of pedestrians in an original pedestrian image, and recombining the obtained body part images to obtain six parts: a background, a head, a jacket, pants, arms, and legs, from which pixels of the jacket and pants images are extracted to form a clothing region; all pixels of the clothing area are set to be zero, and a black clothing pedestrian image is obtained.
Further, the three network branch trunk networks are all imViT networks, two outputs are used for extracting global features and local features in each imViT network respectively, the trunk networks are optimized by triple losses and ID losses on the global features and the local features respectively, and the ID losses are cross entropy losses without label smoothness.
Further, the original branch training module is specifically configured to:
training the branches of the original image under the guidance of the pre-trained black clothes branches according to a knowledge distillation algorithm, and normalizing the training of the original branches by adopting mean square error loss so as to train more identity-related but clothes-unrelated features.
In summary, the present invention uses a non-GAN approach, as compared to a method of generating a large volume of dressing change image expansion datasets using GAN. The method utilizes the human body semantic analysis model to obtain the coat and trousers parts of the human body, and shields the coat and trousers parts by using the method of the invention, so that the model can learn the characteristics irrelevant to clothes more intensively without expanding a data set, thereby saving space and time. Compared with other methods for separating clothes characteristics from identity characteristics, the method has the advantages that all the clothes of the pedestrians are shielded, so that black clothes images are obtained, the colors of the clothes of the pedestrians are uniform, the model pays more attention to parts except the colors of the clothes, and the robustness of the model is improved. Compared with the method of directly learning the features from the images after the clothes are removed, the method of the invention utilizes the black clothes branch to guide the original branch to directly learn the identity features irrelevant to the clothes from the original RGB images, thus effectively utilizing the information in the original images, effectively reducing the information loss in the process of obtaining the black clothes images and improving the robustness of the features. Compared with a method for directly extracting clothes irrelevant features from an original image, the method has the advantages that the special processing of the head image blocks is added, so that the more discriminative fine-grained features can be extracted, and the global features are well supplemented. The test result on the PRCC data set shows that the method of the invention achieves excellent re-identification effect on clothes-changing pedestrians.
While only the preferred embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes and modifications may be made therein without departing from the spirit and scope of the invention.
Claims (8)
1. A method for re-identifying a clothes-changing pedestrian guided by black clothes and head images is characterized by comprising the following steps:
step 1: deleting clothes features from the original pedestrian image to obtain a black clothes pedestrian image;
step 2: processing the original pedestrian image by using a pre-trained HRNet to obtain a pedestrian head image in the original pedestrian image;
and step 3: constructing a coat-changing pedestrian re-identification network, wherein the network consists of three network branches, namely an original branch, a black coat branch and a head branch, and is respectively used for learning original pedestrian image characteristics, black clothes pedestrian image characteristics and pedestrian head image characteristics; the three network branch main networks have the same structure and do not share parameters;
and 4, step 4: inputting the black pedestrian image of the clothes obtained in the step 1 into a black clothing branch for training to obtain a pre-trained black clothing branch;
and 5: training an original branch under the guidance of a pre-trained black clothes branch to obtain features related to pedestrians but not related to clothes in an original pedestrian image;
step 6: inputting the pedestrian head image obtained in the step 2 into a head branch for learning, combining the learned pedestrian head image characteristics with the characteristics which are related to the pedestrian and are not related to clothes and obtained in the step 5 to obtain overall characteristics which are related to the pedestrian and are not related to clothes, and finishing the clothes changing pedestrian re-recognition network training;
and 7: and carrying out clothes changing pedestrian re-identification based on the trained clothes changing pedestrian re-identification network.
2. A method of pedestrian re-identification of changing clothes guided by black clothes and head images according to claim 1, characterized in that said step 1 comprises:
adopting a pre-trained human body analytic model to obtain body part images of pedestrians in an original pedestrian image, and recombining the obtained body part images to obtain six parts: a background, a head, a jacket, pants, arms, and legs, from which pixels of the jacket and pants images are extracted to form a clothing region; all pixels of the clothing area are set to zero, resulting in a black clothing pedestrian image.
3. A method of pedestrian re-identification of changing clothes guided by black clothes and head images as claimed in claim 1, characterized in that the three network branch trunk networks are imViT networks, in each imViT network there are two outputs for extracting global and local features respectively, the trunk network is optimized by triplet losses and ID losses on global and local features respectively, where ID loss is cross entropy loss without label smoothing.
4. A method of pedestrian re-identification of a clothes change guided by images of black clothes and head according to claim 1, wherein said step 5 comprises:
training the branches of the original image under the guidance of the pre-trained black clothes branches according to a knowledge distillation algorithm, and normalizing the training of the original branches by adopting mean square error loss so as to train more identity-related but clothes-unrelated features.
5. A pedestrian re-recognition apparatus for changing clothes guided by black clothes and head images, comprising:
the black clothes image obtaining module is used for deleting clothes characteristics from the original pedestrian image to obtain a black clothes pedestrian image;
the head image obtaining module is used for processing the original pedestrian image by using the pre-trained HRNet to obtain a pedestrian head image in the original pedestrian image;
the clothes changing pedestrian re-identification network construction module is used for constructing a clothes changing pedestrian re-identification network, and the network consists of three network branches, namely an original branch, a black clothes branch and a head branch, and is respectively used for learning original pedestrian image characteristics, black clothes pedestrian image characteristics and pedestrian head image characteristics; the three network branch main networks have the same structure and do not share parameters;
the black clothing branch training module is used for inputting the black clothing pedestrian image obtained by the black clothing image obtaining module into the black clothing branch for training to obtain a pre-trained black clothing branch;
the original branch training module is used for training original branches under the guidance of the pre-trained black clothes branches to obtain features which are related to pedestrians but not related to clothes in the original pedestrian images;
the head branch training module is used for inputting the pedestrian head image obtained by the head image obtaining module into a head branch for learning, combining the learned pedestrian head image characteristics with the characteristics which are obtained by the original branch training module and are related to pedestrians but not related to clothes to obtain the overall characteristics which are related to pedestrians but not related to clothes, and finishing the clothes-changing pedestrian re-recognition network training;
and the clothes changing pedestrian re-recognition module is used for carrying out clothes changing pedestrian re-recognition based on the trained clothes changing pedestrian re-recognition network.
6. A pedestrian re-identification apparatus guided by black clothes and head images for clothes change according to claim 5, wherein said black clothes image deriving module is specifically configured to:
adopting a pre-trained human body analytic model to obtain body part images of pedestrians in an original pedestrian image, and recombining the obtained body part images to obtain six parts: a background, a head, a jacket, pants, arms, and legs, from which pixels of the jacket and pants images are extracted to form a clothing region; all pixels of the clothing area are set to zero, resulting in a black clothing pedestrian image.
7. A clothing-blacking and head-image-guided clothes-change pedestrian weight recognition device according to claim 5, wherein the three network branch trunk networks are imViT networks, in each of which there are two outputs for extracting global features and local features, respectively, and the trunk network is optimized by triplet losses and ID losses on the global and local features, respectively, wherein the ID losses are cross-entropy losses without label smoothing.
8. The black coat and head image guided clothes change pedestrian re-recognition apparatus of claim 5, wherein the original branch training module is specifically configured to:
training the branches of the original image under the guidance of the pre-trained black clothes branches according to a knowledge distillation algorithm, and normalizing the training of the original branches by adopting mean square error loss so as to train more identity-related but clothes-unrelated features.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211258905.XA CN115620338A (en) | 2022-10-14 | 2022-10-14 | Method and device for re-identifying clothes-changing pedestrians guided by black clothes and head images |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211258905.XA CN115620338A (en) | 2022-10-14 | 2022-10-14 | Method and device for re-identifying clothes-changing pedestrians guided by black clothes and head images |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115620338A true CN115620338A (en) | 2023-01-17 |
Family
ID=84861950
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211258905.XA Pending CN115620338A (en) | 2022-10-14 | 2022-10-14 | Method and device for re-identifying clothes-changing pedestrians guided by black clothes and head images |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115620338A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116129473A (en) * | 2023-04-17 | 2023-05-16 | 山东省人工智能研究院 | Identity-guide-based combined learning clothing changing pedestrian re-identification method and system |
-
2022
- 2022-10-14 CN CN202211258905.XA patent/CN115620338A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116129473A (en) * | 2023-04-17 | 2023-05-16 | 山东省人工智能研究院 | Identity-guide-based combined learning clothing changing pedestrian re-identification method and system |
CN116129473B (en) * | 2023-04-17 | 2023-07-14 | 山东省人工智能研究院 | Identity-guide-based combined learning clothing changing pedestrian re-identification method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ning et al. | JWSAA: joint weak saliency and attention aware for person re-identification | |
Matsukawa et al. | Person re-identification using CNN features learned from combination of attributes | |
Zhang et al. | Content-adaptive sketch portrait generation by decompositional representation learning | |
CN111783521B (en) | Pedestrian re-identification method based on low-rank prior guidance and based on domain invariant information separation | |
CN109508663A (en) | A kind of pedestrian's recognition methods again based on multi-level supervision network | |
Feng et al. | Deep-masking generative network: A unified framework for background restoration from superimposed images | |
Chen et al. | Learning discriminative and generalizable representations by spatial-channel partition for person re-identification | |
Hu et al. | Dual face alignment learning network for NIR-VIS face recognition | |
CN113158739B (en) | Method for solving re-identification of replacement person by twin network based on attention mechanism | |
Wang et al. | Finger vein recognition based on multi-receptive field bilinear convolutional neural network | |
CN114782977B (en) | Pedestrian re-recognition guiding method based on topology information and affinity information | |
CN115620338A (en) | Method and device for re-identifying clothes-changing pedestrians guided by black clothes and head images | |
CN116030495A (en) | Low-resolution pedestrian re-identification algorithm based on multiplying power learning | |
Zhao et al. | Visible-infrared person re-identification based on frequency-domain simulated multispectral modality for dual-mode cameras | |
Kanwal et al. | Person re-identification using adversarial haze attack and defense: A deep learning framework | |
Pan et al. | Disentangled representation and enhancement network for vein recognition | |
Zhu et al. | Contactless Palmprint Image Recognition across Smartphones with Self-paced CycleGAN | |
Chan et al. | Diverse-Feature Collaborative Progressive Learning for Visible-Infrared Person Re-Identification | |
CN117576729A (en) | Visible light-infrared pedestrian re-identification method based on multi-stage auxiliary learning | |
Hong et al. | Camera-specific Informative Data Augmentation Module for Unbalanced Person Re-identification | |
Oh et al. | Visual adversarial attacks and defenses | |
Liu et al. | Similarity preserved camera-to-camera GAN for person re-identification | |
Li et al. | Criminal investigation image classification based on spatial cnn features and elm | |
Xu et al. | Cross domain person re-identification with large scale attribute annotated datasets | |
Gong et al. | Dynamically Adaptive Instance Normalization and Attention-Aware Incremental Meta-Learning for Generalizable Person Re-identification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |