CN116311377A

CN116311377A - Method and system for re-identifying clothing changing pedestrians based on relationship between images

Info

Publication number: CN116311377A
Application number: CN202310324819.2A
Authority: CN
Inventors: 袁彩虹; 苏晨爽; 邹明东; 周玉洁; 许元辰; 关志杰
Original assignee: Henan University
Current assignee: Henan University
Priority date: 2023-03-29
Filing date: 2023-03-29
Publication date: 2023-06-23

Abstract

The invention provides a re-identification method and a re-identification system for a clothing changing pedestrian based on a relationship between images. The method comprises the following steps: acquiring all pedestrian images to be identified and preprocessing to obtain a series of pedestrian images wearing the same clothes; constructing an intra-image relation mining model and an inter-image relation mining model; dividing the preprocessed pedestrian image to be identified into a plurality of batches; aiming at the current batch, carrying out image inner relation modeling by utilizing an image inner relation mining model to obtain N fusion features; aiming at the current batch, constructing inter-image relation features by utilizing an inter-image relation mining model according to N fusion features, respectively fusing the inter-image relation features with the N fusion features to obtain respective final features of the N pedestrian images to be identified, and judging whether pedestrians in the pedestrian images to be identified are target pedestrians according to the final features; the first two steps are repeated for the next batch until identification of all batches is completed.

Description

Method and system for re-identifying clothing changing pedestrians based on relationship between images

Technical Field

The invention relates to the technical field of pedestrian re-identification, in particular to a method and a system for re-identifying a clothing changing pedestrian based on the relationship between images.

Background

Pedestrian re-recognition can be seen as a pedestrian retrieval problem aimed at retrieving a specific person from a large number of images taken from different cameras and scenes. The method mainly faces various challenges such as low image resolution, visual angle change, posture change, light ray change, shielding and clothes changing. At present, most pedestrian re-identification methods are carried out under the condition that the wearing of pedestrians is unchanged in a short period, and the study of re-identification of clothing-changing pedestrians is unavoidable in order to realize the landing of the pedestrian re-identification industry.

Pedestrian re-identification research based on changing clothes is more in line with actual conditions, and has stronger practical significance. For example, a potential criminal may escape tracking by deliberately changing the dressing, and the dressing of a lost child or elderly may change over time, such as by removing a coat or hat. For re-identification of a clothing change pedestrian, the identity of a person is typically determined by his physiological characteristics, such as appearance, height, etc., rather than apparent characteristics, such as clothing, shoes, hairstyles, etc. The key to solve the re-identification problem of the clothing changing pedestrian is to force the model to learn the physiological characteristics of the pedestrian which are not easy to disguise or change, but not the apparent characteristics of the clothing color and the like.

To address the impact and challenges of changing clothing on pedestrian re-recognition, some studies have provided methods for segmenting character contours and extracting more discernable body shape features. Ye et al utilize joint point features and model the relationships between the points, and also introduce a shape decomposition module that eliminates clothing, inputting the regularized differences between global features and relationship features into a self-paying network, allowing the network to automatically separate clothing features and body shape features. Jin et al adopts a dual stream architecture to learn rich gait information by approximately predicting a continuous gait frame from a single input query image in the gait stream; obtaining feature vectors in the ReID stream through an off-the-shelf network (such as ResNet 50); advanced semantic consistency constraints are then imposed on the same person on the features of both streams, thereby encouraging the ReID stream to learn clothing-independent gait motion features.

However, the existing pedestrian re-recognition method still has the following limitations: (1) Most of the existing methods only pay attention to extracting global features, local features or contour features from one image, but the relationships between images are rarely utilized, and although some works propose to model the relationships between images by using methods such as conditional random fields, the works only model the relationships between a few images during training, and have certain limitations on the learning of the relationships. (2) The prior method inevitably encounters the influence of apparent characteristics such as color and the like when facing the clothes changing problem. (3) Most of the existing approaches are CNN-based networks, but CNNs can only exploit local dependencies and suffer from information loss due to the use of downsampling operations.

Disclosure of Invention

In order to solve at least part of the problems, the invention provides a re-identification method and a re-identification system for a clothing changing pedestrian based on the relationship between the images.

In one aspect, the invention provides a re-identification method for a clothing changing pedestrian based on a relationship between an image and an image, comprising the following steps:

step 1: acquiring all pedestrian images to be identified, and preprocessing all the pedestrian images to be identified so that pedestrians contained in all the pedestrian images to be identified wear the same clothes;

step 2: constructing an intra-image relation mining model and an inter-image relation mining model;

step 3: dividing all the preprocessed pedestrian images to be identified into a plurality of batches, wherein each batch contains N pedestrian images to be identified;

step 4: respectively carrying out image interior relation modeling on N pedestrian images to be identified by utilizing the image interior relation mining model aiming at the current batch to obtain N fusion features; the process of modeling the intra-image relationship of each pedestrian image to be identified specifically comprises the following steps: extracting global features and local features of the pedestrian image to be identified, and taking the combination of the global features and the local features as features in an original image; constructing an intra-image relation feature related to the pedestrian image to be identified according to the global feature and the local feature, and fusing the intra-image relation feature with the original intra-image feature to obtain a fused feature of the pedestrian image to be identified;

step 5: aiming at the current batch, according to N fusion features corresponding to N pedestrian images to be identified, constructing image relationship features among the N pedestrian images to be identified by utilizing the image relationship mining model, respectively fusing the image relationship features with the N fusion features to obtain respective final features of the N pedestrian images to be identified, and judging whether pedestrians in the pedestrian images to be identified are target pedestrians according to the respective final features of the N pedestrian images to be identified;

step 6: and (5) repeating the steps 4 to 5 for the next batch until the identification of all batches is completed.

Further, the step 1 specifically includes:

step 1.1: selecting one image from all pedestrian images to be identified as a target pedestrian reference image;

step 1.2: semantic segmentation is carried out on the input pedestrian image to be identified by utilizing the human body analysis model, so that pixels belonging to the body of the pedestrian in the pedestrian image to be identified are obtained; the pedestrian body comprises two body parts, namely a coat and a lower coat;

step 1.3: and respectively replacing the pixels of all parts of the pedestrian body corresponding to the target pedestrian reference image to the positions of the pixels of the corresponding parts of the body of the rest of the pedestrian images to be identified, wherein the pixels of the rest of the pedestrian images to be identified at other positions are kept unchanged.

Further, the human body analysis model is an SCHP model.

Further, the intra-image relation mining model comprises a CNN model, a human body posture estimation module and a first transducer module; correspondingly, the step 4 specifically includes:

extracting global features of the input pedestrian image to be identified by adopting a CNN model;

extracting a local key point heat map of an input pedestrian image to be identified by adopting a human body posture estimation module;

taking the result of multiplying the global feature and the local key point heat map as a local feature;

adopting a first transducer module to construct an intra-image relation feature related to the pedestrian image to be identified according to the global feature and the local feature;

and taking the result of multiplying the intra-image relation feature and the intra-image feature of the original image as the fusion feature of the pedestrian image to be identified.

Further, in step 4, before constructing the intra-image relationship feature, the method further includes:

optimizing the global features and the local features by adopting the loss function shown in the formula (1);

wherein K represents a local feature V _l The number of features included in the set of features,

confidence of kth key point, < ->

A heat map indicating the kth key point, max indicating the maximum value taking operation, beta _K+1 =1 refers to global feature v _K+1 Confidence of->

Representing a classification loss function, +.>

Representing a triple loss function>

Is a local feature v _k Probability of identity truth value predicted by classifier, alpha is boundary,/>

Representing positive pairs of features (v) from the same pedestrian _ak ,v _pk ) Distance between->

Representing a negative pair of features (v) from different pedestrians _ak ,v _nk ) Distance between them.

Further, the key points include one or more of a nose, a left eye, a right eye, a left ear, a right ear, a left shoulder, a right shoulder, a left elbow, a right elbow, a left wrist, a right wrist, a left crotch, a right crotch, a left knee, a right knee, a left ankle, and a right ankle.

Further, the intra-image relational features are represented by a vector u by aggregation as shown in formula (2) _i To express:

wherein s (·) is a Softmax function that converts affinity into weight,

is to map vectorsLinear projection function to v matrix, A ε R ^(K+1)×(K+1) The representation comprising any two features v _i and v_j Affinity matrix of similarity between i+.j and i ε [1,2., K, k+1]，j∈[1,2,...,K,K+1]When i=k+1 or j=k+1, v _i or v_j Representing global features, and conversely, representing local features;

wherein, calculate and get affinity matrix A according to formula (3):

wherein

Representing the mapping of the vector to a q matrix, the linear projection function of the K matrix, K (·,) being the inner product function,/>

Represents a scale factor, q _i Representing v _i Corresponding matrix, k _j Representing v _j The corresponding matrix, T, represents the transpose.

Further, the inter-image relationship mining model comprises a second transducer module; correspondingly, the step 5 specifically includes:

and constructing image relationship features among the N pedestrian images to be identified by using the image relationship mining model according to N fusion features corresponding to the N pedestrian images to be identified by using a second transducer module.

Further, the relation characteristic between images adopts an aggregate expression vector w shown in a formula (4) _d To express:

wherein s (·) is a Softmax function that converts affinity into weight,

is a linear projection function, B epsilon R ^N ^×N The representation comprises any two fusion features f _d and f_e Affinity matrix of similarity between d.noteq.e and d.e. [1,2., N]And j.epsilon.1, 2, N]；

Wherein, calculate and get affinity matrix B according to formula (5):

wherein

Represents a scale factor, q _d Represents f _d Corresponding matrix, k _e Represents f _e The corresponding matrix, T, represents the transpose.

Further, the method further comprises the following steps: construction of a loss function

Optimizing the relationship features between images;

wherein ,λ₁ 、λ ₂ and λ₃ Represent weights, L _id Representing Identity Loss, L _tri Representing Triplet Loss, L _C The Center Loss is indicated as being the Center Loss,

is the fusion feature f _r Probability of identity truth value predicted by classifier, alpha is boundary,/>

Representing a positive pair of features (f _ar ,f _pr ) Distance between->

Representing a negative pair of features (f _ar ,f _nr ) A distance therebetween; m represents the total number of fusion features, +.>

Y to represent fusion features _t The features of the individual classes represent centers.

On the other hand, the invention provides a re-identification system for a clothing changing pedestrian based on the relationship between the inside of an image and the image, which comprises the following components: the device comprises an image preprocessing unit, an intra-image relation mining unit, an inter-image relation mining unit and an identification unit;

the image preprocessing unit is used for acquiring all the pedestrian images to be recognized, and preprocessing all the pedestrian images to be recognized so that pedestrians contained in all the pedestrian images to be recognized wear the same clothes;

the image intra-relation mining unit is used for constructing an image intra-relation mining model so as to respectively perform image intra-relation modeling on N pedestrian images to be identified contained in the input batch by using the image intra-relation mining model to obtain N fusion features; the process of modeling the intra-image relationship of each pedestrian image to be identified specifically comprises the following steps: extracting global features and local features of the pedestrian image to be identified, and taking the combination of the global features and the local features as features in an original image; constructing an intra-image relation feature related to the pedestrian image to be identified according to the global feature and the local feature, and fusing the intra-image relation feature with the original intra-image feature to obtain a fused feature of the pedestrian image to be identified;

the image relation mining unit is used for constructing an image relation mining model so as to construct image relation features among the N pedestrian images to be identified according to N fusion features corresponding to the N pedestrian images to be identified contained in the input batch by using the image relation mining model, and fusing the image relation features with the N fusion features to obtain final features of the N pedestrian images to be identified respectively;

the identification unit is used for judging whether the pedestrians in the pedestrian images to be identified are target pedestrians according to the final characteristics of the N pedestrian images to be identified contained in the input batch.

The invention has the beneficial effects that:

(1) Unlike available method, which separates clothes features and identity features with self-focusing network, the present invention has no need of separating clothes features and identity features of pedestrians, and pre-processes all the original pedestrian images to be identified into one series of identical clothes images, and the pre-processed pedestrian images have different pedestrians with identical clothes, so that the clothes features are no longer taken as features for distinguishing pedestrians 'identities, and the subsequent model design has no need of paying great attention to the clothes features of pedestrians, so as to avoid the influence of clothes features on the model, and the model has no need of distinguishing pedestrians' identities depending on the color appearance features, so that the designed model can extract more distinguishing shape features;

(2) Compared with the existing method for processing the original image data set without extra, the method provided by the invention has the advantages that the human body analysis model is adopted to divide the image to obtain the pixels of the body part (namely, the upper garment and the lower garment) with the greatest influence on pedestrian identification in the image, and the pixels are replaced, so that the greatest influence factors can be abandoned before the pedestrian characteristic extraction is carried out, a series of pedestrian images with the same clothes are obtained, and the influence of changing the clothes on the pedestrian re-identification is eliminated. Meanwhile, by the preprocessing mode, the model can be more concentrated on physiological characteristics which are not easy to change under the condition that a data set is not required to be expanded, so that the model extracts more discernable shape characteristics;

(3) Unlike available method of extracting only local information from one image and comparing, or extracting only global information and comparing to identify pedestrian identity, the present invention has the advantages of establishing intra-image relation characteristic and inter-image relation characteristic, utilizing intra-image and inter-image relation information to extract more distinguishing identity characteristic and utilizing the semantic information contained in image fully;

(4) Compared with the existing method for obtaining the local features by horizontal segmentation, the method can obtain more accurate local key points by adopting human body posture estimation, and because the local key points of adjacent joints are connected under the human body topological structure, more relation information can be extracted by carrying out relation exploration on all the local key points.

(5) Compared with the method for processing the head block specially in the existing re-recognition method for the clothing-changing pedestrian, the method can additionally extract five key points of the face, including the nose, the left eye, the right eye, the left ear and the right ear, by adopting the human body posture estimation, and can extract more discernable fine-grained characteristics by exploring the mutual relation among the five key points.

(6) Compared with other pedestrian re-recognition methods based on CNN networks, the method has strong capability of obtaining long-distance dependence by using the Transformer, and can lead the model to pay attention to different representing elements together due to the introduction of multi-head attention, so that different parts of pedestrians can be paid attention to under the condition that all images wear the same clothes, and the discernable shape features can be extracted.

Drawings

Fig. 1 is a schematic flow chart of a re-identification method for a clothing changing pedestrian based on a relationship between images in an image according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of an intra-image relationship mining model and an inter-image relationship mining model according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions in the embodiments of the present invention will be clearly described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Example 1

As shown in fig. 1, an embodiment of the present invention provides a method for identifying a re-clothing-changing pedestrian based on a relationship between an image and an image, including the following steps:

s101: acquiring all pedestrian images to be identified, and preprocessing all the pedestrian images to be identified so that pedestrians contained in all the pedestrian images to be identified wear the same clothes;

s102: constructing an intra-image relation mining model and an inter-image relation mining model;

s103: dividing all the preprocessed pedestrian images to be identified into a plurality of batches, wherein each batch contains N pedestrian images to be identified;

s104: respectively carrying out image interior relation modeling on N pedestrian images to be identified by utilizing the image interior relation mining model aiming at the current batch to obtain N fusion features; the process of modeling the intra-image relationship of each pedestrian image to be identified specifically comprises the following steps: extracting global features and local features of the pedestrian image to be identified, and taking the combination of the global features and the local features as features in an original image; constructing an intra-image relation feature related to the pedestrian image to be identified according to the global feature and the local feature, and fusing the intra-image relation feature with the original intra-image feature to obtain a fused feature of the pedestrian image to be identified;

s105: aiming at the current batch, according to N fusion features corresponding to N pedestrian images to be identified, constructing image relationship features among the N pedestrian images to be identified by utilizing the image relationship mining model, respectively fusing the image relationship features with the N fusion features to obtain respective final features of the N pedestrian images to be identified, and judging whether pedestrians in the pedestrian images to be identified are target pedestrians according to the respective final features of the N pedestrian images to be identified;

in particular, the identity of a person is typically determined by its physiological characteristics, not the apparent characteristics of clothing or the like. By extracting the relational features between images using interactions between the images, it is possible to better find the salient features of one pedestrian image that are different from other pedestrian images.

S106: steps S104 to S105 are repeatedly performed for the next lot until the identification of all lots is completed.

Unlike the conventional method of separating clothes features and identity features by using a self-attention network, the pretreatment mode adopted by the embodiment of the invention does not separate the clothes features and the identity features of pedestrians, but pretreats all the original pedestrian images to be identified, which are input, into a series of pedestrian images with the same clothes, and even pedestrians with different identities wear the same clothes in the pretreated pedestrian images, so that the clothes features are not taken as the features for distinguishing the identities of the pedestrians, and therefore, the problem of the clothes of the pedestrians is not needed to be excessively focused in the subsequent model design, thereby avoiding the influence of the clothes features on the model, and the model is not dependent on the color appearance features for identifying the identities of the pedestrians, so that the designed model can also extract more distinguishing shape features;

in addition, unlike the existing method that only local information is extracted from an image and is compared, or only global information is extracted and is compared to identify the identity of a pedestrian, the embodiment of the invention provides a method for constructing the intra-image relationship features and the inter-image relationship features, further utilizes the intra-image and inter-image relationship information to extract more discernable identity features, and fully utilizes semantic information contained in the image.

Example 2

On the basis of the above embodiment, the embodiment of the present invention provides an image preprocessing method, so that all pedestrians included in the pedestrian image to be identified wear the same garment. The method specifically comprises the following steps:

s201: selecting one image from all pedestrian images to be identified as a target pedestrian reference image; in this embodiment, the first inputted pedestrian image to be identified is selected as the target pedestrian reference image by default.

S202: semantic segmentation is carried out on the input pedestrian image to be identified by utilizing the human body analysis model, so that pixels belonging to the body of the pedestrian in the pedestrian image to be identified are obtained; the pedestrian body comprises two body parts, namely a coat and a lower coat (also called trousers); in this embodiment, the human body analysis model is an sccp model.

Specifically, all pedestrian images to be recognized of the current lot are noted as x= [ X ₁ ,x ₂ ,....,x _N ]Where N is the batch size, x _i An i-th image of a pedestrian to be identified having a size of c×h×w is represented, wherein C, H, W represents the number of passages, the height, and the width, respectively.

First, the result of semantic segmentation using a human body analytical model is expressed as s= [ S ] ₁ ,s ₂ ,....,s _N ]，s _i Representing image x _i And has a size of 1 XH W for identifying s _i Semantic information of the middle pixel can be set as follows: the pixel values for the background, head, coat, lower coat, arm and leg positions are set to 0, 1,2, 3, 4 and 5, respectively.

Then, pixels belonging to the pedestrian body (i.e., the upper garment and the lower garment) are acquired from the above semantic division result. Image x _i Each pixel can be represented as a vector v of length C _j Thus, image x _i There are a total of h×w pixel vectors. Image x _i The set of body part pixels is:

wherein ,

representing the set of pixels at the upper garment site, < >>

Representing the pixel set of the lower garment part, U ₁ Is image x _i Total number of pixel vectors of the jacket, U ₂ Is image x _i Total number of pixel vectors of the middle-lower garment, < >>

Represents the j ₁ Individual pixel vectors, v _j2 Represents the j ₂ The pixel vectors, 2, represent the index of the jacket and 3 the index of the lower jacket.

It will be appreciated that U ₁ and U₂ At each x _i May be different.

S203: when other pedestrian images to be identified are input, the pixels of all parts of the pedestrian body corresponding to the target pedestrian reference image are respectively replaced to the positions of the pixels of the corresponding parts of the other pedestrian images to be identified, and the pixels of the other positions of the other pedestrian images to be identified are kept unchanged.

Specifically, the pixel sets of the respective body parts corresponding to the target pedestrian reference image are stored separately and denoted as G, and the pixel sets of the body parts (i.e., the upper garment and the lower garment) included in G are denoted as G, respectively _upper and G_pants The method comprises the steps of carrying out a first treatment on the surface of the Setting M as the total number of pixels, for N pedestrian images to be recognized, there is m=n×h×w, and all pixel vectors in X are expressed as V _X ：

wherein ,

pixel vector belonging to upper garment, +.>

Pixel vectors belonging to trousers.

Next, V will be _X Pixel vector set of other pedestrian images to be identified in the image G besides the image G

and />

Respectively replace G _upper and G_pants The method comprises the steps of carrying out a first treatment on the surface of the The changed pixel vector may be represented as V _X '：

Other steps are the same as those of embodiment 1, and will not be repeated here.

According to the embodiment of the invention, the human body analysis model is adopted to divide the image to obtain the pixels of the body parts (namely the upper garment and the lower garment) with the greatest influence on pedestrian identification in the image, and the pixels are replaced, so that the greatest influence factors can be abandoned before the pedestrian characteristic extraction is carried out, a series of pedestrian images with the same clothes are obtained, and the influence of changing the clothes on the pedestrian re-identification is eliminated. Meanwhile, by the preprocessing mode, the model can be more concentrated on physiological characteristics which are not easy to change under the condition that a data set is not required to be expanded, so that the model extracts more discernable shape characteristics.

Example 3

On the basis of the above embodiments, as shown in fig. 2, the embodiment of the present invention provides a network architecture of an intra-image relationship mining model and a network architecture of an inter-image relationship mining model, and based on the two relationship mining models provided in the present embodiment, intra-image relationship features and inter-image relationship features can be better constructed. In fig. 2, IP represents an image preprocessing process, INS represents an intra-image relationship mining model, ITS represents an inter-image relationship mining model, and the following is specific:

the intra-image relation mining model comprises a CNN model, a human body posture estimation module and a first transducer module; correspondingly, the step S104 specifically includes: extracting a feature map f of an input pedestrian image to be identified by adopting a CNN model _cnn The method comprises the steps of carrying out a first treatment on the surface of the The human body posture estimation module is adopted to extract the local key point heat map of the input pedestrian image to be identified, and the local key point heat map is recorded as m _kp The method comprises the steps of carrying out a first treatment on the surface of the The global feature V is obtained by an average pooling operation (g ()) of the feature map _g (for convenience of description, it is denoted as V _g ＝g(f _cnn ) A) is provided; the result of multiplying the global feature and the local key point heat map is used as a local feature (for convenience of description, written as

Adopting a first transducer module to construct an intra-image relation feature related to the pedestrian image to be identified according to the global feature and the local feature; and taking the result of multiplying the intra-image relation feature and the intra-image feature of the original image as the fusion feature of the pedestrian image to be identified.

Specifically, the keypoints in the local keypoint heat map comprise one or more of a nose, left eye, right eye, left ear, right ear, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist, left crotch, right crotch, left knee, right knee, left ankle, and right ankle; it will be appreciated that a key point will correspond to a local feature, that is, the local feature may be more than one, but a plurality of.

Further, to optimize the global features and the local features, the method further comprises, before constructing the intra-image relational features: and optimizing the global features and the local features by adopting the loss function described in the formula (1).

confidence of kth key point, < ->

Representing a classification loss function, +.>

Representing a triple loss function>

Representing a negative pair of features (v) from different pedestrians _ak ,v _nk ) Distance between them. It is noted that the classifiers of the different local features are not shared.

Based on the intra-image relation mining model shown in FIG. 2, the global feature V is calculated _g And a set of local features V _l Simultaneously inputting the image internal relation model into a first transducer module to carry out image internal relation modeling, and adopting an aggregation expression vector u shown in a formula (2) to image internal relation characteristics _i To express:

wherein s (·) is a Softmax function that converts affinity into weight,

is a linear projection function mapping vectors to v matrix, A.epsilon.R ^(K+1)×(K+1) The representation comprising any two features v _i and v_j Affinity matrix of similarity between i+.j and i ε [1,2., K, k+1]，j∈[1,2,...,K,K+1]When i=k+1 or j=k+1, v _i or v_j Representing global features, and conversely, representing local features;

wherein, calculate and get affinity matrix A according to formula (3):

wherein

Finally, the obtained intra-image relationship feature u _i With all original features v corresponding to _i Respectively multiplying, and then splicing all the obtained features into one feature, namely the fusion feature of the image, wherein the fusion feature is filledThe relation information among the local key points in the image is utilized, and robustness is improved.

The inter-image relationship mining model includes a second transducer module, and correspondingly, step S105 specifically includes: and constructing image relationship features among the N pedestrian images to be identified by using the image relationship mining model according to N fusion features corresponding to the N pedestrian images to be identified by using a second transducer module. For convenience of description, N fusion features corresponding to N pedestrian images to be identified are denoted as f= [ F ] ₁ ,f ₂ ,....,f _N ]。

Based on the image-to-image relation mining model shown in fig. 2, inputting N fusion features F into a second transducer module for image-to-image relation modeling, and using the aggregate expression vector w shown in formula (4) for the image-to-image relation features _d To express:

wherein s (·) is a Softmax function that converts affinity into weight,

is a linear projection function that maps the vector to a v matrix; b epsilon R ^N×N The representation comprises any two fusion features f _d and f_e Affinity matrix of similarity between d.noteq.e and d.e. [1,2., N]And j.epsilon.1, 2, N]。

Wherein, calculate and get affinity matrix B according to formula (5):

wherein ,

Finally, the obtained relation feature w _d With all original features f corresponding to _d And multiplying the obtained features respectively, and taking the obtained features as final features corresponding to the pedestrian images to be identified.

Further, in order to improve the discrimination capability of the deep learning features, the fusion features can keep the features of different classes separable and minimize the intra-class variation, and a loss function is also constructed in the embodiment

To optimize the relationship features between images. The method comprises the following steps:

Representing a positive pair of features (f _ar ,f _pr ) Distance between->

Representing a negative pair of features (f _ar ,f _nr ) A distance therebetween; one fusion feature is one sample, in order to minimize the distance between each sample in min-batch and the center of the corresponding class, we provide a class center for each class, m represents the total number of fusion features (i.e., the number of samples),

y to represent fusion features _t The features of the individual classes represent centers. In the present embodiment, lambda ₁ 、λ ₂ and λ₃ The values were 1, 1 and 0.0005, respectively.

It is noted that when deriving the category center, only the picture of a certain category in the current batch is used to obtain the update amount of the category center. The update strategy of Center Loss is as formula (10):

the delta (condition) value is as follows: when the condition is established, the value is 1, otherwise, the value is 0.

Compared with the existing method for obtaining the local features by horizontal segmentation, the method for obtaining the local features by horizontal segmentation can obtain more accurate local key points by using human body posture estimation, and because the local key points of adjacent joints are connected under a human body topological structure, more relation information can be extracted by carrying out relation exploration on all the local key points. Meanwhile, compared with a method for processing head blocks specially in the existing re-recognition method of the changing pedestrian, the method can additionally extract five key points of the face, including a nose, a left eye, a right eye, a left ear and a right ear, by adopting human body posture estimation, and can extract more discernable fine-grained characteristics by exploring the mutual relation among the five key points.

Meanwhile, compared with other pedestrian re-recognition methods based on CNN networks, the embodiment utilizes the capability of a transducer to obtain long-distance dependency, and due to the introduction of multi-head attention, the model can pay attention to different representing elements together, so that different parts of pedestrians can be paid attention to under the condition that all images wear the same clothes, and discernable shape features can be extracted.

Example 4

Corresponding to the method, the embodiment of the invention provides a re-identification system for a clothing changing pedestrian based on the relationship between images in the images, which comprises an image preprocessing unit, an image relationship mining unit and an identification unit;

It should be noted that, the re-recognition system for a clothing changing pedestrian based on the relationship between the images in the image provided by the embodiment of the present invention is to implement the above method embodiment, and the function thereof may specifically refer to the above method embodiment and is not repeated herein.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. The re-identification method for the clothing changing pedestrians based on the relationship between the images is characterized by comprising the following steps:

2. The method for re-identifying a clothing changing pedestrian based on the relationship between images according to claim 1, wherein the step 1 specifically comprises:

3. The method for re-identifying a clothing changing pedestrian based on the relationship between images according to claim 2, wherein the human body analysis model is an SCHP model.

4. The method for re-identifying a clothing change pedestrian based on an intra-image and inter-image relationship according to claim 1, wherein the intra-image relationship mining model comprises a CNN model, a human body posture estimation module and a first transducer module; correspondingly, the step 4 specifically includes:

5. The method for re-identifying a clothing changing pedestrian based on an intra-image and inter-image relationship according to claim 4, wherein in step 4, before constructing the intra-image relationship feature, further comprises:

β _k ＝max(m _kp [k])∈[0,1]is the confidence of the kth key point, m _kp [k]A heat map indicating the kth key point, max indicating the maximum value taking operation, beta _K+1 =1 refers to global feature v _K+1 Confidence of->

Representing a classification loss function, +.>

Representing a triple loss function>

6. The method for re-identifying a clothing changing pedestrian based on the relationship between images according to claim 4, wherein the characteristic of the relationship between images is represented by a vector u by aggregation shown in the formula (2) _i To express:

wherein s (·) is a Softmax function that converts affinity into weight,

wherein, calculate and get affinity matrix A according to formula (3):

wherein ,

7. The method for re-identifying a clothing change pedestrian based on an intra-image and inter-image relationship according to claim 1, wherein the inter-image relationship mining model comprises a second transducer module; correspondingly, the step 5 specifically includes:

8. The method for re-identifying a clothing changing pedestrian based on the relationship between images according to claim 7, wherein the relationship between images is characterized by adoptingThe vector w is represented by the aggregation shown in equation (4) _d To express:

wherein s (·) is a Softmax function that converts affinity into weight,

is a linear projection function mapping vectors to v matrix, B.epsilon.R ^N×N The representation comprises any two fusion features f _d and f_e Affinity matrix of similarity between d.noteq.e and d.e. [1,2., N]And j.epsilon.1, 2, N]；

Wherein, calculate and get affinity matrix B according to formula (5):

wherein ,

9. The method for re-identifying a clothing changing pedestrian based on the relationship between images according to claim 8, further comprising: construction of a loss function

Optimizing the relationship features between images;

wherein ,λ₁ 、λ ₂ and λ₃ Represent weights, L _id Representing Identity Loss, L _tri Representing Triplet Loss, L _C Represent Center Loss, p _fr Is the fusion feature f _r The probability of belonging to the identity truth value predicted by the classifier, alpha being the boundary,

representing a positive pair of features (f _ar ,f _pr ) Distance between->

10. The re-identification system for the clothing changing pedestrians based on the relation between the images is characterized by comprising the following components: the device comprises an image preprocessing unit, an intra-image relation mining unit, an inter-image relation mining unit and an identification unit;