CN116129473A

CN116129473A - Identity-guide-based combined learning clothing changing pedestrian re-identification method and system

Info

Publication number: CN116129473A
Application number: CN202310401773.XA
Authority: CN
Inventors: 高赞; 魏盛旬; 赵一博; 薛彦兵; 卓涛; 李志慧; 程志勇
Original assignee: Tianjin University of Technology; Shandong Institute of Artificial Intelligence
Current assignee: Tianjin University of Technology; Shandong Institute of Artificial Intelligence
Priority date: 2023-04-17
Filing date: 2023-04-17
Publication date: 2023-05-16
Anticipated expiration: 2043-04-17
Also published as: CN116129473B

Abstract

The invention belongs to the technical field of computer vision, and provides a method and a system for re-identifying a pedestrian in a combined learning and changing process based on identity guidance. The method comprises the following steps: acquiring a pedestrian image to be retrieved; inputting the pedestrian image to be searched into a pre-trained re-recognition model of the clothing changing pedestrian, and extracting identity robust features irrelevant to clothing as search features; the re-identification model adaptively weakens the interference brought by clothes information through a clothes attention degradation network guiding model, the human semantic information is highlighted by utilizing human semantic attention and a jigsaw module, and a pedestrian identity enhancement module guiding model is used for extracting more favorable identity robust representation; and matching the retrieval features with the pedestrian image features in the retrieval library in a similarity manner. The invention has lower test cost and higher efficiency, and can efficiently cope with the changing scene.

Description

Identity-guide-based combined learning clothing changing pedestrian re-identification method and system

Technical Field

The invention relates to a method and a system for re-identifying a pedestrian in a combined learning and changing process based on identity guidance, and belongs to the technical field of computer vision.

Background

The pedestrian Re-identification (Person Re-identification) technology can be combined with technologies such as face recognition and pedestrian detection more and more deeply, is widely applied to the fields such as intelligent security, target association and intelligent retail, is an important fulcrum for combining artificial intelligence technology with industry, and attracts attention and research of more and more researchers. Pedestrian re-recognition mainly solves the problem of pedestrian matching under a cross-camera cross-scene, and pedestrians with the same identity are retrieved from different camera images. The effective identification feature extraction of the pedestrian re-identification algorithm is hindered by a plurality of objectively existing factors such as image resolution, observation visual angle, illumination, gesture change and the like in a practical application scene, which is a challenging task.

To date, researchers have proposed many pedestrian re-recognition methods, and current pedestrian re-recognition algorithms mainly extract image features based on convolutional neural networks and then classify or match the features. For example Sun et al, considering the consistency of content within each stripe, propose a PCB approach that uses a unified feature segmentation strategy to learn local features and outputs convolution features consisting of six stripes to enhance the consistency of the content of each partitioned feature, thereby ensuring that the stripes are spatially aligned. Because the model cannot distinguish between the occlusion region and the non-occlusion region, an incorrect retrieval result may be caused; the Zhou et al designed a full-scale network OSNet, consisting of residual blocks of multiple convolved feature streams, capturing different spatial scale features while encapsulating multi-scale synergistic combined features. The method is limited to optimizing the feature extraction method, and the algorithm has limited effect, so that many researchers consider fine-grained features or introduce new feature joint learning.

These studies have focused mainly on short spans of time, often assuming that a pedestrian wears the same clothing under different cameras, so that the visual appearance is still highly similar. The appearance of clothes of pedestrians is still an important basis for extracting characteristics from models, however, in a real scene, the methods cannot be well applied, because the pedestrians can change the clothes with high probability or different pedestrians can wear the same clothes, for example, people can change proper clothes in different places and different weather in daily life, and club members can wear the same uniform when participating in activities.

Therefore, the re-identification method of the clothing changing pedestrian aiming at the clothing changing problem is a more challenging and urgent difficult problem to be solved.

Disclosure of Invention

The invention aims to provide a combined learning clothing changing pedestrian re-identification method based on identity guidance so as to overcome the technical problems in the prior art.

In order to achieve the above purpose, the present invention is realized by the following technical scheme:

a combined learning clothing changing pedestrian re-identification method based on identity guidance comprises the following steps:

s110, acquiring a pedestrian image to be retrieved;

S120, inputting the pedestrian image to be retrieved into a pre-trained re-clothes-changing pedestrian re-identification model, extracting identity robust features irrelevant to clothes as retrieval features, wherein the re-clothes-changing pedestrian re-identification model is obtained by training a pedestrian image dataset, a pre-acquired clothes region mask image, a pedestrian foreground image and an upper clothes mask image, the re-clothes-changing pedestrian re-identification model adaptively weakens interference brought by clothes information through a clothes attention degradation network guiding model, human semantic attention and a jigsaw module are utilized to highlight human semantic information, and a pedestrian identity enhancement module guiding model is used for extracting more favorable identity robust representation;

s130, matching the similarity between the retrieval features and the pedestrian image features in the retrieval library, sorting the retrieval results according to the sequence of the similarity scores from high to low, and outputting the sorting results as re-identification results.

The method for training the re-identification model of the clothing changing pedestrian based on the identity-guided joint learning re-identification method comprises the following steps:

s1201, acquiring an original pedestrian image to be processed from a pedestrian image data set; the semantic information labeling of each part of the human body is carried out on the original pedestrian image through a pre-trained human body semantic analysis model, and a corresponding human body semantic segmentation diagram is obtained;

S1202, acquiring a clothing region mask image by using the human semantic segmentation map; acquiring a pedestrian foreground image by utilizing the human semantic segmentation map and the original pedestrian image; acquiring an upper garment shielding image by utilizing the human semantic segmentation map and the pedestrian foreground image;

s1203, acquiring degradation characteristic representation according to the original pedestrian image and the clothing region mask image through a clothing attention degradation network; acquiring an original characteristic representation according to the original pedestrian image through a backbone network; acquiring human semantic feature representation according to the pedestrian foreground image through a human semantic attention and jigsaw module; acquiring identity enhancement feature representations according to the upper garment mask image by a pedestrian identity enhancement module;

s1204, performing joint training constraint on the degradation feature representation, the original feature representation, the human semantic feature representation and the identity enhancement feature representation by using a loss function;

s1205, acquiring a trained re-identification model of the clothing changing pedestrian.

The above-mentioned identity-guide-based combined learning clothing-changing pedestrian re-identification method is a pre-acquisition method of the clothing region mask image, the pedestrian foreground image and the upper clothing shielding image, and comprises the following steps:

Acquiring an original pedestrian image to be processed from a pedestrian image dataset;

the semantic information labeling of each part of the human body is carried out on the original pedestrian image through a pre-trained human body semantic analysis model, and a corresponding human body semantic segmentation diagram is obtained;

acquiring a clothing region mask image: positioning the local position of a clothing region in the pedestrian image by using clothing label information of the human semantic segmentation map; distinguishing a clothing region from other regions by performing binarization processing, wherein a pixel portion belonging to the clothing region in the human semantic segmentation map is set to 1, and other pixels are set to 0; obtaining a clothing region mask image;

acquiring a pedestrian foreground image: binarizing the human semantic segmentation map for locating a foreground region where a human is located; wherein the background part of the human body semantic segmentation graph is set to 0, and all the other parts containing human body parts and accessories are set to 1; performing matrix multiplication operation on the pedestrian image and the binarized human semantic segmentation image to obtain a pedestrian foreground image with all information except the background reserved as foreground information;

acquiring an upper garment mask image: and positioning an upper body clothes region in the pedestrian foreground image through the human body semantic segmentation map, setting the upper body clothes region in the pedestrian foreground image as 1, and performing shielding treatment to obtain an upper clothes shielding image.

The above-mentioned identity-guide-based combined learning clothing-changing pedestrian re-identification method preferably includes the following steps:

positioning the clothes region of the corresponding original pedestrian image by using the clothes region mask image, and performing weight reduction operation on all pixels belonging to the clothes region to obtain three clothes weakening feature images with different scales;

inputting an original pedestrian image into a ResNet50 network, outputting three intermediate feature images F with different scales at different stages, acquiring a spatial attention feature image F at different stages through convolution operation, performing consistency constraint learning on the spatial attention feature image through the clothes weakening feature image, performing spatial attention weighting on the spatial attention feature image and the intermediate feature image at different stages to obtain a new intermediate feature image weakened by the clothes region, and acquiring a degradation feature representation through pooling operation on the intermediate feature image weakened by the clothes region at the last stage;

wherein the spatial attention profile is represented by the formula

Acquiring; the spatial attention weighting is given by the formula +.>

Proceeding; wherein->

Show->

Intermediate feature map of individual phases,/->

And

representing two +.>

Convolution filter of >

Representing convolution operations +.>

Representing a spatial attention profile, < >>

Representing the hadamard matrix product.

According to the preferable scheme of the identity-guide-based combined learning clothing changing pedestrian re-identification method, the spatial attention is paid to the clothing weakening characteristic diagramThe method for consistency constraint learning by the feature map comprises the following steps: carrying out weakening guidance on clothes regions by utilizing multi-scale clothes weakening feature graphs on space attention feature graphs with corresponding scales, and carrying out semantic loss function

The realization of the method is realized in that,

wherein ,

representing the number of feature maps of different scales, +.>

A spatial attention profile and a garment weakening profile representing a kth scale +.>

Representing the height and width of the feature map.

According to the preferred scheme of the identity-guide-based combined learning clothing changing pedestrian re-identification method, the original characteristic representation method is obtained according to the original pedestrian image through a backbone network; a method for acquiring human semantic feature representation according to the pedestrian foreground image through human semantic attention and a jigsaw module; a method for obtaining identity enhancement feature representations from the upper garment mask image by a pedestrian identity enhancement module; the method specifically comprises the following steps of: using Vision Transformer short for ViT pre-trained on ImageNet as backbone network, inputting ViT original pedestrian image to obtain original characteristic representation; randomly shuffling pedestrian foreground images with the same identity in the training data of the same batch, and performing intra-identity jigsaw to obtain new pedestrian foreground images; inputting the new pedestrian foreground image into a ViT model sharing weight parameters with a backbone network, and obtaining human semantic feature representation; positioning the head and neck shoulder region of the upper garment shielding image through a positioning layer of a pre-trained spatial transformation network and acquiring a local robust image; the local robust image is input ViT to a model to obtain an identity enhanced feature representation.

The identity-guide-based combined learning clothing changing pedestrian re-identification method is preferably realized by the following formula: the loss function is realized by the following formula:

, wherein ,/>

A classification penalty for constraining the feature representation;

measuring loss for triples for measuring the distance between pairs of samples; />

Semantic loss of weakening and guiding clothing regions for the multi-scale clothing weakening feature map to the corresponding-scale space attention feature map; />

For measuring semantic consistency constraint losses of high-level semantic differences between the degraded feature representation, the original feature representation and the human semantic feature representation,

the high-level semantic consistency constraint loss is realized by the following formula:

，

wherein ,

calculating a mean function; />

To calculate a variance function; />

From the attention degrading network of the garment,

from the backbone network->

From human semantic attention and jigsaw module; />

Representing the L2 paradigm.

The invention also provides a joint learning clothing-changing pedestrian re-identification system based on identity guidance, which comprises:

the data acquisition unit acquires pedestrian images to be searched, wherein the search images and the pedestrian images in the search library are shot by different cameras;

The feature extraction unit inputs the pedestrian image to be searched into a pre-trained re-recognition model of the clothing changing pedestrian, and extracts identity robust features irrelevant to clothing as search features; the re-identification model of the clothing changing pedestrian is obtained by training a pedestrian image data set, a pre-acquired clothing region mask image, a pedestrian foreground image and an upper clothing shielding image; the re-identification model adaptively weakens the interference brought by clothes information through a clothes attention degradation network guiding model, the human semantic information is highlighted by utilizing human semantic attention and a jigsaw module, and a pedestrian identity enhancement module guiding model is used for extracting more favorable identity robust representation;

and the result identification unit is used for carrying out similarity matching on the retrieval features and the pedestrian image features in the retrieval library, sequencing the retrieval results according to the sequence from high to low of the similarity score, and outputting the sequencing results as re-identification results.

The invention has the beneficial effects that:

1) According to the identity-guide-based combined learning and changing pedestrian re-identification method, through the combined training of the clothes attention degradation network, the backbone network, the human semantic attention and jigsaw module and the pedestrian identity enhancement module, the pedestrian image clothing region information is weakened, the human semantic information is fully utilized, the visual field/posture change of pedestrians is enriched, the distinguishing pedestrian identity characteristics are highlighted, and the characteristics with robustness and discrimination under the scene of changing clothes can be extracted.

2) The identity-guide-based combined learning re-identification method for the clothing changing pedestrian achieves excellent effects in the related re-identification data set of the clothing changing pedestrian, and is higher in identification performance and stability.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention.

FIG. 1 is a flow diagram of a method for re-identifying a pedestrian for changing clothing based on identity-guided joint learning according to an embodiment of the invention;

FIG. 2 is a schematic frame diagram of a method for re-identifying a pedestrian for changing clothing based on identity guidance in a joint learning manner according to an embodiment of the invention;

FIG. 3 is an exemplary presentation of model retrieval results of a method for re-identifying a pedestrian in a clothing change based on identity-guided joint learning according to an embodiment of the present invention;

FIG. 4 is a logic block diagram of a joint learning coat-change pedestrian re-identification system based on identity guidance according to an embodiment of the invention;

fig. 5 is a schematic diagram of an internal structure of an electronic device based on an identity-guided joint learning clothing-change pedestrian re-recognition method according to an embodiment of the invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The embodiment of the application can acquire and process the related data based on the artificial intelligence technology and the computer vision technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results. Artificial intelligence is a branch of computer science that perceives its environment and takes action, maximizing its chance of success. In addition, artificial intelligence can learn from past experience, make reasonable decisions, and respond quickly. Thus, the scientific goal of artificial intelligence researchers is to understand intelligence by constructing computer programs with symbolically meaningful reasoning or reasoning. The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Computer Vision (CV) is a science of how to "look" at a machine, and more specifically, a camera and a Computer are used to replace human eyes to perform machine Vision such as recognition, tracking and measurement on a target, and further perform image processing, and the image is processed by the Computer to be an image more suitable for human eyes to observe or transmit to an instrument for detection. As a scientific discipline, computer vision research-related theory and technology has attempted to create artificial intelligence systems that can obtain "information" from images or multidimensional data. Because perception can be seen as the extraction of information from sensory signals, computer vision can also be seen as science of how to "perceive" an artificial system from images or multi-dimensional data. As an engineering discipline, computer vision seeks to create a computer vision system based on relevant theory and model. Computer vision can also be considered as a complement to biological vision. In the field of biological vision, human and various animal vision has been studied to create physical models for use in the perception of information by these vision systems. On the other hand, in computer vision, artificial intelligence systems implemented in software and hardware have been studied and described. The disciplinary communication between biological vision and computer vision is of great value to each other. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, and map construction, among others, as well as common biometric recognition techniques such as face recognition, fingerprint recognition, and others.

Specifically, as an example, fig. 1 is a schematic flow chart of a method for re-identifying a pedestrian in a clothing change based on identity guidance in a joint learning manner according to an embodiment of the present invention. As shown in fig. 1, the present invention provides a method for re-identifying a pedestrian in a joint learning and changing based on identity guidance, which can be performed by a device, and the device can be implemented by software and/or hardware. The combined learning and changing pedestrian re-identification method based on identity guidance comprises the steps S110-S130. Specifically, S110, acquiring a pedestrian image to be retrieved, where the retrieved image and the pedestrian image in the retrieval library are captured by different cameras; s120, inputting the pedestrian image to be searched into a pre-trained re-recognition model of the clothing changing pedestrian, and extracting identity robust features irrelevant to clothing as search features; the re-identification model of the clothing changing pedestrian is obtained by training a pedestrian image data set, a pre-acquired clothing region mask image, a pedestrian foreground image and an upper clothing shielding image; the re-identification model adaptively weakens the interference brought by clothes information through a clothes attention degradation network guiding model, the human semantic information is highlighted by utilizing human semantic attention and a jigsaw module, and a pedestrian identity enhancement module guiding model is used for extracting more favorable identity robust representation; and S130, matching the similarity between the retrieval features and the pedestrian image features in the retrieval library, sorting the retrieval results according to the sequence of the similarity scores from high to low, and outputting the sorting results as re-identification results.

FIG. 2 is a schematic frame diagram of a method for re-identifying a pedestrian for changing clothing based on identity guidance in a joint learning manner according to an embodiment of the invention; as shown in fig. 2, the invention provides a re-identification model of a clothing changing pedestrian, which comprises a clothing attention degradation network, a backbone network, a human semantic attention and jigsaw module and a pedestrian identity enhancement module.

Firstly, training a re-identification model of a clothing changing pedestrian, carrying out small-batch processing on data of a pedestrian image dataset through a pre-trained human body semantic analysis model, marking semantic information of each part of a human body, and obtaining a corresponding human body semantic segmentation map. It should be noted that the human body semantic analysis model may be, but not limited to, a Self-correction (SCHP) model for human body analysis, which is a fine semantic segmentation task for assigning each image pixel in a human body to a semantic class, such as an arm, a leg, a skirt, etc. In order to obtain the segmentation semantic annotation with finer granularity, the pedestrian image is divided into 18 semantic annotation parts, namely a background, a head, hair, a sunglasses, a jacket, a shirt, trousers, a skirt, a waistband, a left shoe, a right shoe, a face, a left leg, a right leg, a left arm, a right arm, a knapsack and a scarf, and the 18 semantic parts are combined and used in combination as shown in fig. 2 in combination with a specific task condition.

Then, three enhanced identity semantic representations are obtained for each preprocessed image acquired by the encoder, based on the feature-guided emphasized focus differences. Firstly, a strategy of soft attention knowledge distillation is provided, a mask image of a clothing region is obtained by utilizing the human semantic segmentation map, a multiscale feature map weakened by the clothing region is obtained by combining the mask image with an original pedestrian image, the attention degree of the model to the clothing region of the pedestrian image is weakened by gradually guiding the abstract weight of the clothing region in the original feature map, so that the attention-reducing feature representation (namely degradation feature) of the clothing region is extracted, the absolute damage to visual semantic information caused by hard coverage is avoided, the attention degree of the model to the clothing region is effectively reduced, the information of different scales of the pedestrian image is fully excavated, the deep neural network can capture richer low-level visual features and more comprehensive fusion with high-level semantics, and the identity features with distinctiveness are forcedly extracted from other non-clothing regions by the model. Secondly, combining the human body semantic segmentation map with the original pedestrian image to obtain a pedestrian foreground image, highlighting human body semantic information through human body semantic attention and a jigsaw module, simulating different postures of the same identity to cope with disturbance caused by disordered background and moderate posture change, and obtaining characteristic representation (namely human body semantic characteristics) of human body semantic enhancement. Thirdly, the human body semantic segmentation map and the pedestrian foreground image are used for obtaining an upper garment shielding image, and identity characteristic representations (namely identity enhancement characteristics) which are more robust to identity information are extracted through a pedestrian identity enhancement module guiding model. That is, three enhanced identity semantic information representations are obtained through identity semantic guidance, the interference brought by the clothes information is weakened by the clothes attention degradation network guidance model in a self-adaptive mode, the human body semantic information is highlighted by the human body semantic attention and the jigsaw module, and the attention of the model to the identity robust area is further enhanced by the pedestrian identity enhancement module.

And finally, carrying out joint training on the degradation features, the original features, the human semantic features and the identity enhancement features by using a loss function to obtain a trained re-recognition model of the clothing changing pedestrian. In particular, the penalty function includes a classification penalty for constraining the original features, human semantic features, and identity enhancement features; a triplet metric loss for measuring the distance between pairs of samples; semantic loss of the weakening guide of the clothing region is performed; and the semantic consistency regularization constraint loss is used for measuring high-level semantic differences among the degraded feature representation, the original feature representation and the human semantic feature representation. The obtained four loss weights are summed, so that the attention degree of the model to the clothes region can be weakened, the consistency characteristic which is not influenced by clothes change can be mined, and the model is enhanced to concentrate on the characteristic of the human robust region in the image. The joint loss can be trained by a better constraint network, and the optimized deep learning network model is used for pedestrian feature extraction, so that a trained re-recognition model of the changing pedestrian is obtained.

And extracting pedestrian characteristics of the pedestrian images to be retrieved by using the trained re-recognition model of the pedestrian for matching the pedestrians with specific identities. Specifically, inputting the pedestrian image to be searched into a trained re-recognition model of the changing pedestrian to obtain a feature vector capable of representing the pedestrian image to be searched, performing similarity matching on the feature of the pedestrian image to be searched and the feature of the pedestrian image in a search library, sorting the search results according to the sequence from high to low of the similarity score, outputting the sorting result as a re-recognition result, and finally returning the pedestrian image ranked at the front as the search result.

In a specific implementation process, the method for re-identifying the pedestrian for changing the clothes by joint learning based on identity guidance comprises steps S110-S130.

S110, acquiring pedestrian images to be searched, wherein the search images and the pedestrian images in a search library are shot by different cameras;

in particular, in a scene where capturing a face image is difficult, such as having a mask, a side angle, a distance being long, and the like, difficulties are encountered. And acquiring the pedestrian image to be searched, namely the image of the pedestrian to be identified. The acquisition device can be, but is not limited to, a digital camera, a monitoring camera, a tablet computer, a mobile phone and the like;

s120, inputting the pedestrian image to be searched into a pre-trained re-recognition model of the clothing changing pedestrian, and extracting identity robust features irrelevant to clothing as search features; the re-identification model of the clothing changing pedestrian is obtained by training a pedestrian image data set, a pre-acquired clothing region mask image, a pedestrian foreground image and an upper clothing shielding image; the re-identification model adaptively weakens the interference brought by clothes information through the clothes attention degradation network guiding model, the human semantic information is highlighted by utilizing the human semantic attention and the jigsaw module, and the pedestrian identity enhancement module guiding model is used for extracting more favorable identity robust representation.

Specifically, the method for training the pre-trained re-identification model of the clothing changing pedestrian comprises steps S1201-1208.

the clothing region mask image acquisition method comprises S120201, positioning to a local position of clothing according to clothing label information of semantic segmentation by means of a human body semantic segmentation map with semantic labels; s120202 distinguishing the clothing region from other regions by performing binarization processing; wherein, the pixel part belonging to the clothing region in the human body semantic segmentation map is set as 1, and the pixels of other parts are set as 0; s120203, taking the binarized image after binarization processing as a clothing region mask map; the pedestrian foreground image acquisition method comprises S120211, performing binarization processing on a human body semantic segmentation map for positioning a foreground region of a human body; wherein the background part of the human body semantic segmentation graph is set to 0, and all the other parts containing human body parts and accessories are set to 1; s120212, performing matrix multiplication operation on the pedestrian image and the binarized human semantic segmentation image to obtain a pedestrian foreground image with all information except the background reserved as foreground information; it should be noted that, for example, shoes, portable objects, scarves, etc. may be sometimes used as effective distinguishable human body characteristic information, so that such visual information is retained; the upper garment mask image acquisition method includes S120221, positioning an upper garment region in a pedestrian foreground image through a human semantic segmentation map; s120222, setting an upper body clothes area in a pedestrian foreground image to be 1 for shielding treatment; s120223, taking the image after shielding treatment as an upper garment shielding image;

S1203, aiming at the interference caused by clothes change in re-identification of a pedestrian for changing clothes through a clothes attention degradation network, weakening the weight of a model to the pedestrian map by gradually guiding the abstract weight of a clothes region in the original feature map to be reducedExtracting attention degradation characteristics of clothing region like attention degree of clothing region

；

In a specific embodiment, the method and the device for training the clothing region in the original feature map gradually reduce the attention of the model to the clothing region in the original feature map, and the key is to acquire the multi-scale weakening feature map and perform regularization constraint learning on the original feature map, wherein the training step comprises S120301-S120303.

S120301, acquiring a clothing region mask image as shown in the steps S120201-S120203; dividing pedestrian images of a dataset into small batches, each batch of input images being represented as

The clothing region mask map of the corresponding lot is denoted +.>

。/>

The pixel value range of (1) is 0, 1]Representing the above mentioned garment region portion and other portions, respectively. The pixels at each position of the image are represented by a vector of length 3, the values of the vector coming from the (R, G, B) three channels of the image. Thus image->

Can use->

A pixel vector representation. Mask map by clothing region- >

Locating, extracting all pixels of clothing region in each batch, and the pixel vector of clothing region can be expressed as +.>

, wherein ,/>

Is the pixel space of the weakened clothing area of the current lot, < >>

Is the number of pixel vectors associated with the garment. />

Is the index of the c-th pixel vector. "/>

"means equal to->

Is the index of the clothing region in the clothing mask map mi. />

Is an superparameter, controls the weight of the weakening of the clothing pixels (set +.>

). Let->

Is represented as all pixel vectors in

, wherein />

Is a clothing-related pixel vector, and the clothing-related pixels are replaced with +.>

A pixel vector of the impaired pixel value. Finally, the vector space in the generated image of the impaired clothing region can be expressed as +.>

, wherein

. Pixels of clothing regionAfter the weight reduction treatment, the scale of the feature map is changed to obtain +.>

、/>

and />

The three clothes weakening characteristic diagrams with different scales have the advantages that the pixel values of the clothes areas are greatly different from those of other non-clothes areas, so that the attention degree of the clothes areas can be weakened by the model, and meanwhile, important semantic information can be prevented from being lost due to excessively hard covering of the clothes areas.

S120302, inputting the original pedestrian image into ResNet50 network, and outputting three intermediate feature maps with different scales at different stages

Obtaining spatial attention profile in different phases by convolution operations>

And carrying out consistency constraint learning on the spatial attention feature map through the clothes weakening feature map, carrying out spatial attention weighting on the spatial attention feature map and the middle feature map in different stages to obtain a new middle feature map weakened by the clothes region, and obtaining degradation feature representation through pooling operation on the middle feature map weakened by the clothes region in the last stage. Wherein the spatial attention profile is represented by the formula +.>

Acquiring; the spatial attention weighting is given by the formula +.>

Proceeding; wherein (1)>

Indicate->

Intermediate feature map of individual phases,/->

and />

Representing two +.>

Convolution filter of>

Representing convolution operations +.>

Representing a spatial attention profile, < >>

Representing the hadamard matrix product.

S120303, carrying out multi-level clothes area weakening guidance on the space attention characteristic diagram of the corresponding scale by utilizing three clothes weakening characteristic diagrams of different scales, and passing through a semantic loss function

The low-weight feature map gradual guiding model of the three garment regions with different scales is realized, the attention to the garment regions is weakened, the interference caused by garment information is effectively resisted, and excessive loss of some important identity information is avoided. Wherein (1) >

Representing the number of feature maps of different scales, +.>

and />

and />

Representing the height and width of the feature map. />

S1204, acquiring original characteristic representation according to the original pedestrian image through a backbone network; using pre-trained Vision Transformer (ViT) on ImageNet as backbone network, raw pedestrian image input ViT acquires raw feature representation

。

S1205, positioning a foreground human body area according to the problem that the characteristics cannot be concentrated in the human body area due to complex background information in the pedestrian re-recognition data through the human semantic attention and jigsaw module, generating a new gesture in an auxiliary mode by matching with a random jigsaw strategy, fully utilizing the human semantic information, enhancing the adaptability of the model to the proper change of the visual field/gesture of the pedestrian, and acquiring the human semantic characteristic representation

The method comprises the steps of carrying out a first treatment on the surface of the The attention of the model is further focused on visual clues irrelevant to clothes in a regularization learning mode; specifically, the pedestrian foreground image is obtained as shown in the steps S120211 to S120212; firstly, taking all images with the same identity from pedestrian foreground images in the same batch, then taking a first image of each identity, randomly shuffling all images with the same identity to increase the randomness of the posture change of the same identity, obtaining a second image after random shuffling of each identity by using a new image index sequence, then mutually exchanging and splicing the first pedestrian image with the lower half pixel value of another pedestrian image randomly to obtain a new pedestrian foreground jigsaw image, and inputting the processed batch data into a ViT model sharing weight parameters with a backbone network to obtain human semantic feature representation. Because the human semantic features are obtained from the foreground image, the human feature channels can be selectively emphasized, and large field differences generated after the jigsaw are avoided. Human semantic information can be more focused and negative effects of background information can be reduced as much as possible, and the characteristic representation has a negative effect on human semantic information The moderate posture change condition has stronger discrimination and robustness. In order to fully utilize the semantic information of human beings, the original characteristic representation and the human semantic characteristic representation are subjected to joint learning of high-level semantics.

S1206, the pedestrian identity enhancement module is used for carrying out positioning extraction on local features of the head, the neck and the shoulder to further obtain identity enhancement feature representation, wherein the robust identity information of the pedestrian cannot be effectively highlighted according to the integral features of the original image

The method comprises the steps of carrying out a first treatment on the surface of the Specifically, the upper garment shielding image is acquired as shown in the steps S120201 to S120203; positioning, extracting and re-sizing the head and neck and shoulder areas of the upper garment shielding image through a pre-trained local image extraction network to enable the head and neck and shoulder areas to have the same size as the original image, and obtaining a local robust image; the local robust image as an enhanced signal is further input to the ViT model to derive an identity enhanced feature representation. The model can pay more attention to the local identity robust information of the head and neck shoulders under the gain of the identity enhancement features, and the available face details, neck and shoulder contours, hairstyles, skin colors and the like can be used as the supplement of pedestrian features in the partial region features of the pedestrian image. It should be noted that, with the VIT model similar to the backbone network, in particular, in order to avoid the over-fitting problem, the model weights are independently learned. It should be noted that the local image extraction network may be, but is not limited to, a positioning layer that is a model of the spatial transformation network (Spatial Transformer Network, STN); STN consists of three components: positioning network (localization net), grid generator (Grid generator), sampler (Sampler); the positioning network is used for parameter prediction, the grid point generator is used for coordinate mapping, and the sampler is used for pixel acquisition; after inputting the feature map into STN, firstly, extracting features by adopting convolution operation through a positioning network, further deducing parameters of space transformation by using some hidden network layers, then, obtaining the corresponding relation of pixel point coordinates before and after transformation by a grid point generator according to the parameters of space transformation, and finally, generating by using a sampler in a bilinear interpolation mode To form a characteristic diagram after space transformation.

S1207, performing joint training constraint on the degradation feature representation, the original feature representation, the human semantic feature representation and the identity enhancement feature representation by using a loss function;

the loss function is realized by the following formula:

。

wherein ,

for classification loss for constraint feature representation, using cross entropy loss to calculate; />

In order to measure the measurement loss of the distance between the sample pairs, using the triplet loss calculation, the distance between the sample and the positive sample is as small as possible, and the distance between the sample and the negative sample is as large as possible; />

Semantic loss of the garment region weakening guidance is performed on the spatial attention feature map of the corresponding scale for the multi-scale garment weakening feature map, as calculated in step S120303; />

For the measurement of the semantic consistency constraint loss of high-level semantic differences between the degraded feature representation, the original feature representation, the human semantic feature representation, the method is characterized by the following formula +.>

Realizing the method. Wherein (1)>

Calculating a mean function; />

To calculate a variance function; />

From the attentiveness degrading network of clothing, < - > for example>

From the backbone network->

From human semantic attention and jigsaw module; />

Representing the L2 paradigm.

The obtained four loss weights are summed, so that the attention degree of the model to the clothes region can be weakened, the consistency characteristic which is not influenced by clothes change can be mined, and the model is enhanced to concentrate on the characteristic of the human robust region in the image. The joint loss can be trained by a better constraint network, and the optimized deep learning network model is used for pedestrian feature extraction, so that a trained re-recognition model of the changing pedestrian is obtained. In a specific implementation process, the weight of each loss in the joint loss function is determined according to actual needs, and is not specifically limited herein. The influence of the changing scene on pedestrian recognition is reduced through the loss function.

And S130, matching the similarity between the retrieval features and the pedestrian image features in the retrieval library, sorting the retrieval results according to the sequence of the similarity scores from high to low, and outputting the sorting results as re-identification results.

Specifically, inputting the pedestrian image to be searched into a trained re-recognition model of the changing pedestrian to obtain a feature vector capable of representing the pedestrian image to be searched, performing similarity matching on the feature of the pedestrian image to be searched and the feature of the pedestrian image in a search library, sorting the search results according to the sequence from high to low of the similarity score, outputting the sorting result as a re-recognition result, and finally returning the pedestrian image ranked at the front as the search result. The similarity measure between features is accomplished by calculating the euclidean distance of the normalized features.

In a specific implementation process, a pedestrian image is given, other pedestrian images with the same identity are retrieved in a test set, and the ordered list is returned. The process of retrieving matches is as follows: extracting feature vector representations of all images in the test set through the trained model, respectively calculating similarity between a given search image and all images in the test set, sequencing search results of the test set according to the sequence of similarity scores from high to low, and returning to a search result list according to the sequence.

In summary, the identity-guide-based joint learning re-recognition method for the clothing changing pedestrian self-adaptively weakens the interference caused by the clothing information through the clothing attention degradation network guiding model by establishing a re-recognition model for the clothing changing pedestrian, then using the human semantic attention and the jigsaw module to highlight the human semantic information, simulating different postures of the same identity to cope with the interference caused by disordered background and moderate posture change, and extracting more favorable identity robust characterization through the pedestrian identity enhancement module guiding model. All modules of the method are trained jointly in a unified framework and only the original features extracted from the original pedestrian image are used as the authentication features in the test. The invention has the advantages of lower test cost, higher efficiency, capability of effectively coping with changing clothes scenes and systematic, scientific, robust and generalization technical effects.

In order to verify the validity of the invention, as shown in fig. 3, an example presentation of the retrieval results of the model on the LTCC, PRCC, NKUP dataset is given; each row is a retrieval example, the retrieval process is given a query graph, the similarity distance between all images in the retrieval library and the query graph is calculated, and the ten images which are most similar to the query graph are returned. All returned results may be correct or incorrect, and the correct search results are marked with T and the incorrect search results are marked with F in the visual results according to the label. It can be seen that the correct result retrieved by the model can be ranked at a position in front of the retrieval list, which proves that the identity-guided joint learning re-clothing pedestrian re-recognition method is very useful for re-recognition of the clothing pedestrian, and the extracted features are effective and robust.

Corresponding to the identity-guide-based combined learning clothing-changing pedestrian re-recognition method, the invention further provides an identity-guide-based combined learning clothing-changing pedestrian re-recognition system. Fig. 4 shows functional modules of an identity-based joint learning garment change pedestrian re-identification system according to an embodiment of the invention.

As shown in fig. 4, the identity-guide-based joint learning and changing pedestrian re-identification system 400 provided by the invention can be installed in an electronic device. Depending on the functions implemented, the identity-based joint learning garment-changing pedestrian re-recognition system 400 may include a data acquisition unit 410, a feature extraction unit 420, and a result recognition unit 430. The unit of the invention, which may also be referred to as a module, refers to a series of computer program segments, which are stored in a memory of the electronic device, capable of being executed by a processor of the electronic device and of performing a certain fixed function.

In the present embodiment, the functions concerning the respective modules/units are as follows:

a data acquisition unit 410, configured to acquire a pedestrian image to be retrieved, where the retrieved image and the pedestrian image in the retrieval library are captured by different cameras;

the feature extraction unit 420 is configured to input the pedestrian image to be retrieved into a pre-trained re-recognition model of the clothing-changing pedestrian, and extract identity robust features that are not related to clothing as retrieval features; the re-identification model of the clothing changing pedestrian is obtained by training a pedestrian image data set, a pre-acquired clothing region mask image, a pedestrian foreground image and an upper clothing shielding image; the re-identification model adaptively weakens the interference brought by clothes information through a clothes attention degradation network guiding model, the human semantic information is highlighted by utilizing human semantic attention and a jigsaw module, and a pedestrian identity enhancement module guiding model is used for extracting more favorable identity robust representation;

the result identifying unit 430 is configured to perform similarity matching on the search feature and the pedestrian image feature in the search library, rank the search results according to the order of the similarity score from high to low, and output the ranked results as the re-identification result.

The more specific implementation manner of the identity-based combined learning and changing pedestrian re-recognition system 400 provided by the present invention may refer to the above description of the embodiment of the identity-based combined learning and changing pedestrian re-recognition method, which is not listed here.

According to the identity-guide-based combined learning and changing pedestrian re-identification system, the characteristics of robustness to the identity and independence from clothes are focused on the model through the combined collaborative learning among the clothes attention degradation network, the backbone network, the human semantic attention, the jigsaw module and the pedestrian identity enhancement module. Specifically, aiming at the interference caused by clothes change in re-identification of a clothes changing pedestrian, the attention degree of a model to a clothes region of a pedestrian image is weakened by gradually guiding the abstract weight of the clothes region in an original feature map to reduce the weight of the model, and the attention degradation feature of the clothes region is extracted; aiming at the problem that the complex background information in the pedestrian re-identification data enables the features to be unable to be concentrated in the human body area, the human body semantic attention and the jigsaw module are used for positioning the foreground human body area, and a random jigsaw strategy is matched for assisting in generating a new gesture, so that the human body semantic information is fully utilized, and meanwhile, the adaptability of the model to the moderate change of the visual field/gesture of the pedestrian is enhanced; the attention of the model is further focused on visual clues irrelevant to clothes in a regularization learning mode; meanwhile, aiming at the robust identity information of pedestrians which cannot be highlighted by the overall characteristics of the original image, the local characteristics of the head, the neck and the shoulders are positioned and extracted by the pedestrian identity enhancement module, so that the identity enhancement characteristic representation is further obtained. The identity-guide-based combined learning re-identification method for the clothing changing pedestrian achieves an excellent effect in the related re-identification data set of the clothing changing pedestrian.

As shown in fig. 5, the invention provides an electronic device 5 based on an identity-guided joint learning clothing-changing pedestrian re-identification method. The electronic device 5 may comprise a processor 50, a memory 51 and a bus, and may further comprise a computer program stored in the memory 51 and executable on said processor 50, such as an identity-based joint learning garment change pedestrian re-recognition program 52. The memory 51 includes at least one type of readable storage medium, including flash memory, a mobile hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 51 may in some embodiments be an internal storage unit of the electronic device 5, such as a removable hard disk of the electronic device 5. The memory 51 may also be an external storage device of the electronic device 5 in other embodiments, for example, a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 5. Further, the memory 51 may also include both an internal storage unit and an external storage device of the electronic device 5. The memory 51 may be used not only for storing application software installed in the electronic device 5 and various data, such as codes of a person re-identification program for a joint learning dressing change based on identity guidance, but also for temporarily storing data that has been output or is to be output.

The processor 50 may be comprised of integrated circuits in some embodiments, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, combinations of various control chips, and the like. The processor 50 is a Control Unit (Control Unit) of the electronic device, connects the respective components of the entire electronic device using various interfaces and lines, executes or executes programs or modules (e.g., an identity-based joint learning coat-change pedestrian re-recognition program, etc.) stored in the memory 51, and invokes data stored in the memory 51 to perform various functions of the electronic device 5 and process the data.

The bus may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. The bus is arranged to enable a connection communication between the memory 51 and at least one processor 50 etc.

Fig. 5 shows only an electronic device with components, and it will be understood by those skilled in the art that the structure shown in fig. 5 is not limiting of the electronic device 5 and may include fewer or more components than shown, or may combine certain components, or a different arrangement of components. For example, although not shown, the electronic device 5 may further include a power source (such as a battery) for supplying power to the respective components, and preferably, the power source may be logically connected to the at least one processor 50 through a power management device, so that functions of charge management, discharge management, power consumption management, and the like are implemented through the power management device. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The electronic device 5 may further include various sensors, bluetooth modules, wi-Fi modules, etc., which will not be described herein.

Further, the electronic device 5 may also comprise a network interface, optionally comprising a wired interface and/or a wireless interface (e.g. WI-FI interface, bluetooth interface, etc.), typically used for establishing a communication connection between the electronic device 5 and other electronic devices.

The electronic device 5 may optionally further comprise a user interface, which may be a Display, an input unit, such as a Keyboard (Keyboard), or a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device 5 and for displaying a visual user interface.

It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.

The identity-based joint learning re-clothing pedestrian recognition program 52 stored in the memory 51 of the electronic device 5 is a combination of instructions that, when executed in the processor 50, may implement: s110, acquiring pedestrian images to be searched, wherein the search images and the pedestrian images in a search library are shot by different cameras; s120, inputting the pedestrian image to be searched into a pre-trained re-recognition model of the clothing changing pedestrian, and extracting identity robust features irrelevant to clothing as search features; the re-identification model of the clothing changing pedestrian is obtained by training a pedestrian image data set, a pre-acquired clothing region mask image, a pedestrian foreground image and an upper clothing shielding image; the re-identification model adaptively weakens the interference brought by clothes information through a clothes attention degradation network guiding model, the human semantic information is highlighted by utilizing human semantic attention and a jigsaw module, and a pedestrian identity enhancement module guiding model is used for extracting more favorable identity robust representation; and S130, matching the similarity between the retrieval features and the pedestrian image features in the retrieval library, sorting the retrieval results according to the sequence of the similarity scores from high to low, and outputting the sorting results as re-identification results.

In particular, the specific implementation method of the above instructions by the processor 50 may refer to the description of the relevant steps in the corresponding embodiment of fig. 1, which is not repeated herein. It should be emphasized that, to further ensure the privacy and security of the identity-based joint learning re-clothing pedestrian recognition program, the identity-based joint learning re-clothing pedestrian recognition program is stored in the node of the blockchain where the server cluster is located.

Further, the modules/units integrated by the electronic device 5 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as a stand alone product. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM).

Embodiments of the present invention also provide a computer readable storage medium, which may be non-volatile or volatile, storing a computer program which when executed by a processor implements: s110, acquiring pedestrian images to be searched, wherein the search images and the pedestrian images in a search library are shot by different cameras; s120, inputting the pedestrian image to be searched into a pre-trained re-recognition model of the clothing changing pedestrian, and extracting identity robust features irrelevant to clothing as search features; the re-identification model of the clothing changing pedestrian is obtained by training a pedestrian image data set, a pre-acquired clothing region mask image, a pedestrian foreground image and an upper clothing shielding image; the re-identification model adaptively weakens the interference brought by clothes information through a clothes attention degradation network guiding model, the human semantic information is highlighted by utilizing human semantic attention and a jigsaw module, and a pedestrian identity enhancement module guiding model is used for extracting more favorable identity robust representation; and S130, matching the similarity between the retrieval features and the pedestrian image features in the retrieval library, sorting the retrieval results according to the sequence of the similarity scores from high to low, and outputting the sorting results as re-identification results.

Specifically, the specific implementation method of the computer program when executed by the processor may refer to the description of the relevant steps in the identity-based joint learning clothing change pedestrian re-recognition method, which is not described herein.

In the several embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.

The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.

The blockchain is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, etc., and the blockchain may store medical data such as personal health files, kitchens, inspection reports, etc.

Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units or means recited in the system claims can also be implemented by means of software or hardware by means of one unit or means. The terms second, etc. are used to denote a name, but not any particular order.

Finally, it should be noted that the foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited to the above-mentioned embodiment, but may be modified or some of the technical features thereof may be replaced by other technical solutions described in the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The identity-guiding-based combined learning clothing changing pedestrian re-identification method is characterized by comprising the following steps of:

s110, acquiring a pedestrian image to be retrieved;

2. The identity-guide-based joint learning re-clothing pedestrian recognition method according to claim 1, wherein the method for training the re-clothing pedestrian recognition model specifically comprises the following steps:

3. The identity-guide-based joint learning clothing change pedestrian re-identification method as claimed in claim 1, wherein the method comprises the following steps: the pre-acquisition method of the clothing region mask image, the pedestrian foreground image and the upper clothing shielding image comprises the following steps:

4. The identity-guide-based joint learning clothing change pedestrian re-identification method as claimed in claim 2, wherein S1203 specifically includes the steps of:

Wherein the spatial attention profile is represented by the formula

Acquiring; the spatial attention weighting is given by the formula +.>

Proceeding; wherein->

An intermediate feature map of the i-th stage is shown,/>

and />

Representing two +.>

Convolution filter of>

Representing convolution operations +.>

Representing a spatial attention profile, < >>

Representing the hadamard matrix product.

5. The identity-guide-based joint learning clothing change pedestrian re-identification method according to claim 4, wherein the method for performing consistency constraint learning on the spatial attention feature map through the clothing weakening feature map comprises the following steps:

carrying out weakening guidance on clothes regions by utilizing multi-scale clothes weakening feature graphs on space attention feature graphs with corresponding scales, and carrying out semantic loss function

The realization of the method is realized in that,

wherein ,

representing the number of feature maps of different scales, +.>

Representing the height and width of the feature map.

6. The identity-guide-based joint learning re-clothing pedestrian recognition method according to claim 2, wherein the method is characterized in that the original characteristic representation is obtained according to the original pedestrian image through a backbone network; a method for acquiring human semantic feature representation according to the pedestrian foreground image through human semantic attention and a jigsaw module; a method for obtaining identity enhancement feature representations from the upper garment mask image by a pedestrian identity enhancement module; the method specifically comprises the following steps of: using Vision Transformer short for ViT pre-trained on ImageNet as backbone network, inputting ViT original pedestrian image to obtain original characteristic representation; randomly shuffling pedestrian foreground images with the same identity in the training data of the same batch, and performing intra-identity jigsaw to obtain new pedestrian foreground images; inputting the new pedestrian foreground image into a ViT model sharing weight parameters with a backbone network, and obtaining human semantic feature representation; positioning the head and neck shoulder region of the upper garment shielding image through a positioning layer of a pre-trained spatial transformation network and acquiring a local robust image; the local robust image is input ViT to a model to obtain an identity enhanced feature representation.

7. The identity-guide-based joint learning clothing change pedestrian re-identification method according to claim 2, wherein the loss function is implemented by the following formula: the loss function is realized by the following formula:

, wherein />

A classification penalty for constraining the feature representation; />

To measure the distance between pairs of samplesIs a triplet metric loss; />

，

wherein ,

calculating a mean function; />

To calculate a variance function; />

From the attentiveness degrading network of clothing, < - > for example>

From the backbone network->

From human semantic attention and jigsaw module; />

Representing the L2 paradigm.

8. The utility model provides a joint learning clothing changing pedestrian re-identification system based on identity guide which characterized in that includes: