CN117635973B - Clothing changing pedestrian re-identification method based on multilayer dynamic concentration and local pyramid aggregation - Google Patents

Clothing changing pedestrian re-identification method based on multilayer dynamic concentration and local pyramid aggregation Download PDF

Info

Publication number
CN117635973B
CN117635973B CN202311661718.0A CN202311661718A CN117635973B CN 117635973 B CN117635973 B CN 117635973B CN 202311661718 A CN202311661718 A CN 202311661718A CN 117635973 B CN117635973 B CN 117635973B
Authority
CN
China
Prior art keywords
layer
pedestrian
image
aggregation
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311661718.0A
Other languages
Chinese (zh)
Other versions
CN117635973A (en
Inventor
张国庆
周洁琼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202311661718.0A priority Critical patent/CN117635973B/en
Publication of CN117635973A publication Critical patent/CN117635973A/en
Application granted granted Critical
Publication of CN117635973B publication Critical patent/CN117635973B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a re-identification method of a clothing changing pedestrian based on multilayer dynamic concentration and local pyramid aggregation, which comprises the following steps of (1) adding a wind and rain scene to an image data set and executing standardized preprocessing and data enhancement operation; (2) constructing a sequence input to a transducer model; (3) Constructing a pedestrian feature extraction network based on a standard Transformer architecture; (4) Carrying out dynamic weight adjustment and fusion treatment on the obtained characteristics of each layer of the transducer by utilizing a multi-layer dynamic focusing module; (5) Selectively extracting and fusing specific layer characteristics in a Transformer network through a local pyramid aggregation module to obtain multi-scale characteristic information; (6) The feature output obtained according to the steps (4) - (5) is applied to a loss function to verify whether the query image and the test image are of the same category, so that training and optimization of the model are completed; the method and the device can remarkably improve the recognition accuracy and the robustness of the algorithm under a complex scene, especially when facing the re-recognition task of the clothing changing pedestrian.

Description

Clothing changing pedestrian re-identification method based on multilayer dynamic concentration and local pyramid aggregation
Technical Field
The invention relates to the technical field of computer vision image recognition, in particular to a re-recognition method for clothing changing pedestrians based on multilayer dynamic concentration and local pyramid aggregation.
Background
Pedestrian Re-identification (ReID) is a key issue in the research of the computer vision field and the public safety field, and aims to realize identity confirmation and tracking of individuals under different monitoring cameras. Existing ReID algorithms focus mainly on efficient identification strategies in the short term, but these strategies often do not adequately take into account the dynamics of pedestrian clothing changes, limiting their application in long spans of time. In practical applications, especially law enforcement and criminal investigation scenarios, important attention is paid to the fact that personnel may escape recognition by changing apparel, which puts higher demands on ReID systems. Thus, research and development of a robust long-term ReID technology (i.e., CC-ReID) is a necessary path to solve the identification problem caused by garment alterations.
Current research on CC-ReID is largely divided into two categories: the first category is the introduction of auxiliary modules (e.g., generating body contour sketches, extracting gesture key points, gait analysis, etc.) to identify clothing-independent biometric features. For example, the studies of Yang [1] et al overcome the effects of garment changes by constructing a body contour based network model. Nevertheless, this approach is susceptible to external environments (e.g., lighting and shielding) and may ignore other important biomarkers such as facial features and gait patterns. The second method is directed to separating the identity feature from the clothing feature. For example, the antagonistic feature unwrapping network (AFD-Net) proposed by Xu et al utilizes intra-class reconstruction and inter-class antagonistic mechanisms to distinguish identity-related from non-related (e.g., clothing) features. However, this approach may face challenges of high computational cost, model stability, and data dependency issues.
In recent years, a model based on a transducer architecture benefits from an advanced multi-head attention mechanism, and breakthrough achievement is achieved in the task of comprehensively analyzing a plurality of key features of an image to realize identification. The multi-head attention mechanism can effectively focus on key features of different areas of the image through parallel processing, and the adaptability and the discrimination capability of the model to various visual angle transformation and pedestrian clothing alternation are enhanced. However, the existing method mainly uses the advanced information of the top layer of the transducer to extract the discrimination features, but fails to fully use the detailed information of the lower layer of the network, which may limit the capturing capability of the model on the fine-grained features in the complex scene. To solve this problem, we propose an innovative adaptive perceptual attention mechanism and pyramid-level feature fusion network. The network design aims to realize high-efficiency integration of multi-scale information so as to enhance the recognition accuracy and robustness of the re-recognition algorithm of the clothing changing pedestrian in a complex scene.
Disclosure of Invention
The invention aims to: the invention aims to provide a re-identification method for a clothing changing pedestrian based on multilayer dynamic concentration and local pyramid aggregation.
The technical scheme is as follows: the invention discloses a re-identification method for a clothing changing pedestrian based on multilayer dynamic concentration and local pyramid aggregation, which comprises the following steps:
(1) Adding a wind and rain scene to the image data set and performing standardized preprocessing and data enhancement operations;
(2) Dividing the preprocessed image into N blocks which are consistent in size and non-overlapping, introducing additional learning embedded [ CLS_TOKEN ] as global features for sequence input, and simultaneously endowing each block with position codes [ POS_TOKEN ] to form a sequence input to a transducer model;
(3) Constructing a pedestrian feature extraction network based on a standard transducer architecture, inputting the sequence generated in the step (2), extracting pedestrian features and recording the features of each transducer layer;
(4) Carrying out dynamic weight adjustment and fusion treatment on each layer of characteristics of the transducer obtained in the step (3) by utilizing a multi-layer dynamic focusing module;
(5) Selectively extracting and fusing specific layer characteristics in a Transformer network through a local pyramid aggregation module to obtain multi-scale characteristic information, and embedding the multi-scale characteristic information into a self-attention mechanism by adopting fast Fourier transformation;
(6) And (3) applying the characteristic output obtained according to the steps (4) - (5) to a loss function to verify whether the query image and the test image are in the same category, thereby completing training and optimizing the model.
Further, the step (1) of adding a weather scene to the image data set includes the steps of:
(11) Generating a noise matrix N obeying Uniform distribution in the range of the width w and the height h of the image by using a formula N-uniformity (0, 255), and simulating random scattering effects of raindrops at different positions;
(12) Applying fuzzy processing to the noise matrix through a formula N' =n×k to generate a raindrop effect without a specific direction;
Wherein K represents a predefined fuzzy kernel, (°) represents a two-dimensional convolution operation;
(13) Constructing a diagonal matrix D to represent a straight-line falling path of the raindrops; simulating the inclination of the raindrops by rotating the diagonal matrix D, and then reproducing the falling speed and direction of the raindrops in the air by using Gaussian blur processing, so as to finally obtain a blur kernel M for simulating the raindrops;
(14) By the formula: Fusing the simulated weather effect with the original image;
Wherein C represents an image channel, beta is a mixed weight, and N' is a noise matrix after a fuzzy core.
Further, the standardized preprocessing and data enhancement operation in the step (1) includes: horizontal overturn, random clipping and random erasing.
Further, the step (2) specifically includes the following steps:
Setting an image x to belong to R W*H*C, wherein H, W and C respectively represent the height, the width and the channel number of the image x;
first, the image is divided into N non-overlapping blocks, denoted as Secondly, introducing an additional learning embeddable x cls as a feature representation of the aggregation at the beginning of the input sequence; then, adding a position code P behind the feature vector of each image block; finally, the input sequence transmitted to the transducer layer is formulated as:
Z0=[xcls;F(x1);F(x2);...;F(xN)]+P
Wherein Z 0 represents the embedding of the input sequence; p ε R (N+1)*D represents the location embedding; f is a linear projection function that maps the image to D-dimensional space.
Further, the step (3) specifically includes the following steps:
The input sequence Z 0 is input into a transducer network for processing, each layer refines the characteristics and integrates the context information through a multi-head self-attention mechanism, and the output Z l of the first layer can be calculated by the following ways:
Zl=Transformerlayer(Zl-1),l=1,2,...,L
Wherein TransformerLayer represents a layer in the standard transducer, and L represents the sum of layers;
The output of each layer of transformers Z 1,Z2,...,ZL.
Further, the step (4) includes the following steps:
(41) Constructing a weight vector W= { W 1,w2,...,wL }, wherein W i is the importance of the features extracted by the ith layer in the corresponding model hierarchy; weighting each layer by using orthogonality constraint weighting; the specific weighted calculation formula is as follows:
wherein f i represents the feature importance of the i-th layer, initialized to a uniform value on all layers; beta and gamma are learnable parameters; < F i,Fj > represents the inner product between the feature sets of the i-th and j-th layers as a measure of their feature correlation; alpha is a regularization coefficient; l is the total number of layers.
(42) And introducing an L2 regularization term to calculate fused characteristics, wherein the formula is as follows:
where λ is a non-negative regularization parameter used to mitigate overfitting by limiting the magnitude of the weights within the model; Is the Frobenius norm of the weight matrix W, and the sum of squares of all layer weights is calculated.
Further, the step (5) specifically comprises the following steps:
In the local pyramid aggregation module, output features f 1,f2,f3,f4 of four different convertors layers are selected as input, and convolution block operations are performed respectively:
First, a1×1 convolution layer is used; secondly, feature dimensions are adjusted and nonlinearities are introduced using BacthNorm D and ReLU functions; then, adding a self-attention mechanism of the fast Fourier transform, and optimizing the characteristics by using global information of all elements in the sequence; and finally, connecting all the features, and inputting the features into the same convolution block to obtain the fused features. The formula is as follows:
Wherein the method comprises the steps of Representing the entire convolution block operation, f t represents the features resulting from the fusion of f m and f m+1. As shown in fig. 2, three outputs are finally obtained by the local pyramid aggregation module.
Further, the step (6) of the loss function includes: ID loss and triplet loss; the ID loss adopts a traditional cross entropy loss function, and tag smoothing is not included; the formula is as follows:
Wherein C is the class number, y i is the one-hot coding of the real label, and p i is the probability that the model prediction sample belongs to the ith class.
The triplet loss formula is as follows:
wherein d (ap) and d (an) each represent an anchor sample And positive samples/>And negative sample/>A distance therebetween; the super parameter M is used as the lower limit of the distance between the positive and negative sample pairs, and M is the upper limit;
wherein the function f (·) represents a feature extraction operator mapping the input image to the embedding space; Representing an L2 norm for calculating the Euclidean distance between two feature vectors; [ ] + is a hinge loss function, the loss is calculated only if the value in brackets is positive, otherwise the loss is 0;
The total loss function formula L is as follows:
where N represents the output volume produced by the entire training architecture, initially the loss of each output is set to equal weight, denoted w i (i=0, 1,2, 3); the weights of the various parts are then dynamically adjusted during the training process by a back propagation algorithm.
Judging whether the maximum iteration times are reached, if so, outputting the final model precision, and if not, repeating the steps (2) - (5).
Further, the method also comprises the following steps: (0) constructing a monitoring network to obtain pedestrian video data; detecting pedestrians by adopting a target detection algorithm, and then obtaining a pedestrian detection frame by adopting a target tracking algorithm; pedestrian video sequences clipped to 258 x 128 pixel specification form an atlas gallery.
The electronic equipment comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, wherein the computer program realizes any one of the clothing changing pedestrian re-identification methods based on multilayer dynamic concentration and local pyramid aggregation when being loaded to the processor.
The beneficial effects are that: compared with the prior art, the invention has the following remarkable advantages: by combining the detail information of the lower layer of the network, the fine granularity characteristics in the complex scene can be more effectively captured and processed; the pyramid-level feature fusion network can integrate information of different levels, so that more comprehensive data analysis and processing are provided; under a complex scene, particularly when facing a re-recognition task of a clothing changing pedestrian, the method can remarkably improve the recognition precision and robustness of the algorithm; and each level of the transducer network is utilized more comprehensively, and the limitation of the transducer network in processing complex scenes is overcome.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a network structure diagram based on a multi-layer dynamic centralized and local pyramid aggregation framework;
FIG. 3 is a block structure diagram of a local pyramid aggregation module within a multi-layer dynamic concentration and local pyramid aggregation framework in accordance with the present invention:
FIG. 4 is a schematic illustration of self-attention in combination with fast Fourier transform in a dynamic small-strand pedestrian re-recognition framework based on the graph neural network of the present invention;
fig. 5 is a schematic view of a pedestrian image with a weather scene added thereto according to the present invention.
Detailed Description
The technical scheme of the invention is further described below with reference to the accompanying drawings.
As shown in fig. 1-5, an embodiment of the present invention provides a method for identifying a re-clothing pedestrian based on multilayer dynamic concentration and local pyramid aggregation, including the following steps:
(0) Constructing a monitoring network to acquire pedestrian video data; detecting pedestrians by adopting a target detection algorithm, and then obtaining a pedestrian detection frame by adopting a target tracking algorithm; a pedestrian video sequence cut into 258 x 128 pixel specifications forms an atlas gallery;
(1) Adding a wind and rain scene to the image data set and performing standardized preprocessing and data enhancement operations; adding a weather scene to an image dataset comprises the steps of:
(11) Generating a noise matrix N which is subjected to Uniform distribution within the range of the width w and the height h of the image by using a formula N-uniformity (0, 255), and simulating random scattering effects of raindrops at different positions;
(12) Applying fuzzy processing to the noise matrix through a formula N' =n×k to generate a raindrop effect without a specific direction;
Wherein K represents a predefined fuzzy kernel, (°) represents a two-dimensional convolution operation;
(13) Constructing a diagonal matrix D to represent a straight-line falling path of the raindrops; simulating the inclination of the raindrops by rotating the diagonal matrix D, and then reproducing the falling speed and direction of the raindrops in the air by using Gaussian blur processing, so as to finally obtain a blur kernel M for simulating the raindrops;
(14) By the formula: Fusing the simulated weather effect with the original image;
Wherein C represents an image channel, beta is a mixed weight, and N' is a noise matrix after a fuzzy core;
the standardized preprocessing and data enhancement operations include: horizontal overturn, random clipping and random erasing.
(2) Dividing the preprocessed image into N blocks which are consistent in size and non-overlapping, introducing additional learning embedded [ CLS_TOKEN ] as global features for sequence input, and simultaneously endowing each block with position codes [ POS_TOKEN ] to form a sequence input to a transducer model; the method comprises the following steps:
Setting an image x to belong to R W*H*C, wherein H, W and C respectively represent the height, the width and the channel number of the image x;
first, the image is divided into N non-overlapping blocks, denoted as Secondly, introducing an additional learning embeddable x cls as a feature representation of the aggregation at the beginning of the input sequence; then, adding a position code P behind the feature vector of each image block; finally, the input sequence transmitted to the transducer layer is formulated as:
Z0=[xcls;F(x1);F(x2);...;F(xN)]+P
Wherein Z 0 represents the embedding of the input sequence; p ε R (N+1)*D represents the location embedding; f is a linear projection function that maps the image to D-dimensional space.
(3) Constructing a pedestrian feature extraction network based on a standard transducer architecture, inputting the sequence generated in the step (2), extracting pedestrian features and recording the features of each transducer layer; the method comprises the following steps:
The input sequence Z 0 is input into a transducer network for processing, each layer refines the characteristics and integrates the context information through a multi-head self-attention mechanism, and the output Z l of the first layer can be calculated by the following ways:
Zl=Transformerlayer(Zl-1),l=1,2,...,L
Wherein TransformerLayer represents a layer in the standard transducer, and L represents the sum of layers;
The output of each layer of transformers Z 1,Z2,...,ZL.
(4) Carrying out dynamic weight adjustment and fusion treatment on each layer of characteristics of the transducer obtained in the step (3) by utilizing a multi-layer dynamic focusing module; the method comprises the following steps:
(41) Constructing a weight vector W= { W 1,w2,...,wL }, wherein W i is the importance of the features extracted by the ith layer in the corresponding model hierarchy; weighting each layer by using orthogonality constraint weighting; the specific weighted calculation formula is as follows:
wherein f i represents the feature importance of the i-th layer, initialized to a uniform value on all layers; beta and gamma are learnable parameters; < F i,Fj > represents the inner product between the feature sets of the i-th and j-th layers as a measure of their feature correlation; alpha is a regularization coefficient; l is the total number of layers.
(42) And introducing an L2 regularization term to calculate fused characteristics, wherein the formula is as follows:
where λ is a non-negative regularization parameter used to mitigate overfitting by limiting the magnitude of the weights within the model; Is the Frobenius norm of the weight matrix W, and the sum of squares of all layer weights is calculated.
(5) Selectively extracting and fusing specific layer characteristics in a Transformer network through a local pyramid aggregation module to obtain multi-scale characteristic information, and embedding the multi-scale characteristic information into a self-attention mechanism by adopting fast Fourier transformation; the method comprises the following steps:
In the local pyramid aggregation module, output features f 1,f2,f3,f4 of four different convertors layers are selected as input, and convolution block operations are performed respectively:
First, a1×1 convolution layer is used; secondly, feature dimensions are adjusted and nonlinearities are introduced using BacthNorm D and ReLU functions; then, adding a self-attention mechanism of the fast Fourier transform, and optimizing the characteristics by using global information of all elements in the sequence; and finally, connecting all the features, and inputting the features into the same convolution block to obtain the fused features. The formula is as follows:
Wherein the method comprises the steps of Representing the entire convolution block operation, f t represents the features resulting from the fusion of f m and f m+1. As shown in fig. 2, three outputs are finally obtained by the local pyramid aggregation module.
The self-attention mechanism of the fast Fourier transform is as follows:
first, the self-attention module receives the input X εR B*N*C, where B is the batch size, N is the sequence length, and C is the feature dimension. Second, through three linear layers, input X is converted into query Q, key K, and value V: q=xw Q,K=XWK,V=XWV. Wherein W Q,WK and W V are both learnable weight matrices; then, the query, key, and value are split into multiple heads; the Fast Fourier Transform (FFT) algorithm exhibits the best efficiency when the input size is an integer power of 2. Finally, apply the appropriate padding to Q and K, apply FFT to padded Q padded and K padded, and estimate their correlation in the frequency domain. The output formula is as follows:
Attn=Softmax(F-1(F(Qpadded)⊙F(Kpadded))[:,:,:,:Q.size(1)])
Wherein F (·) and F -1 (·) represent FFT and inverse FFT, respectively. The dot product of the FFT result is calculated first, and the dot product result is processed with inverse FFT (IFFT) and truncated to the original size. The dot product result obtained in the previous step is normalized by using a softmax function, and the attention weight Attn is obtained. Then, the attention score and the corresponding value vector are weighted and aggregated through dot product operation, and the result is added with the input X to obtain the self-attention output with enhanced characteristics:
Out=Attn⊙V+X
(6) And (3) applying the characteristic output obtained according to the steps (4) - (5) to a loss function to verify whether the query image and the test image are in the same category, thereby completing training and optimizing the model. The loss function includes: ID loss and triplet loss; the ID loss adopts a traditional cross entropy loss function, and tag smoothing is not included; the formula is as follows:
Wherein C is the class number, y i is the one-hot coding of the real label, and p i is the probability that the model prediction sample belongs to the ith class.
The triplet loss formula is as follows:
wherein d (ap) and d (an) each represent an anchor sample And positive samples/>And negative sample/>A distance therebetween; the super parameter M is used as the lower limit of the distance between the positive and negative sample pairs, and M is the upper limit;
wherein the function f (·) represents a feature extraction operator mapping the input image to the embedding space; Representing an L2 norm for calculating the Euclidean distance between two feature vectors; [ ] + is a hinge loss function, the loss is calculated only if the value in brackets is positive, otherwise the loss is 0;
The total loss function formula L is as follows:
where N represents the output volume produced by the entire training architecture, initially the loss of each output is set to equal weight, denoted w i (i=0, 1,2, 3); the weights of the various parts are then dynamically adjusted during the training process by a back propagation algorithm.
Judging whether the maximum iteration times are reached, if so, outputting the final model precision, and if not, repeating the steps (2) - (5).
The embodiment of the invention also provides electronic equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the computer program realizes any one of the clothing changing pedestrian re-identification methods based on multilayer dynamic concentration and local pyramid aggregation when being loaded to the processor.

Claims (8)

1. The clothing changing pedestrian re-identification method based on multilayer dynamic concentration and local pyramid aggregation is characterized by comprising the following steps of:
(1) Adding a wind and rain scene to the image data set and performing standardized preprocessing and data enhancement operations;
(2) Dividing the preprocessed image into Q blocks which are consistent in size and non-overlapping, introducing additional learnable embedded [ CLS_TOKEN ] as global features for sequence input, and simultaneously endowing each block with position codes [ POS_TOKEN ] to form a sequence Z 0 input to a pedestrian feature extraction network;
(3) Constructing a pedestrian feature extraction network based on a standard fransformer architecture, inputting the sequence generated in the step (2), extracting pedestrian features, and recording output features Z l, l=1, 2, and L of each fransformer layer; l is the number of layers of a transducer layer included in the pedestrian feature extraction network;
(4) Carrying out dynamic weight adjustment and fusion treatment on the output characteristics of each transducer layer obtained in the step (3) by utilizing a multi-layer dynamic focusing module;
The step (4) comprises the following steps:
(41) Constructing a weight vector W= { W 1,w2,...,wL }, wherein W i is the weight of the feature output by the ith layer of the Transformer layer in the pedestrian feature extraction network; weighting each layer of the Transformer layer by utilizing orthogonality constraint weighting; the specific weighted calculation formula is as follows:
Wherein g i represents the feature importance of the i-th layer, initialized to a uniform value on all layers; beta and gamma are learnable parameters; < Z i,Zj > represents the inner product between the output features of the i-th layer and j-th layer of the convertors, which is a measure of their output feature correlation; alpha is a regularization coefficient;
(42) And introducing an L2 regularization term to calculate fused characteristics, wherein the formula is as follows:
where λ is a non-negative regularization parameter used to mitigate overfitting by limiting the magnitude of the weights within the model; the Frobenius norm of the weight vector W is used for calculating the square sum of all the weights of the Transformer layers;
(5) Selectively extracting and fusing the output characteristics of a specific Transformer layer in the pedestrian characteristic extraction network through a local pyramid aggregation module to obtain multi-scale characteristic information, and embedding the multi-scale characteristic information into a self-attention mechanism by adopting fast Fourier transformation;
The step (5) is specifically as follows:
In the local pyramid aggregation module, selecting output features f 1,f2,f3,f4 of four different convertors layers as input, and performing three-layer pyramid feature aggregation operation, wherein each feature aggregation operation comprises the steps of connecting self-attention outputs obtained by respectively performing convolution block calculation on two inputs, and inputting the self-attention outputs into the same convolution block to obtain fused features; three outputs are finally obtained through the local pyramid aggregation module;
The convolution block is calculated specifically as follows: first, a1×1 convolution layer is used; secondly, feature dimensions are adjusted and nonlinearities are introduced using BacthNorm D and ReLU functions; then, adding a self-attention mechanism of fast Fourier transformation to obtain self-attention output with enhanced characteristics;
(6) And (3) applying the characteristic output obtained according to the steps (4) - (5) to a loss function to verify whether the query image and the test image are in the same category, thereby completing training and optimizing the model.
2. The method for re-identifying a dressing pedestrian based on multi-layer dynamic concentration and local pyramid aggregation according to claim 1, wherein the step (1) of adding a weather scene to the image dataset comprises the steps of:
(11) Generating a noise matrix N which is subjected to Uniform distribution within the range of the width W and the height H of the image by using a formula N-uniformity (0, 255), and simulating random scattering effects of raindrops at different positions;
(12) Applying fuzzy processing to the noise matrix through a formula N' =n×k to generate a raindrop effect without a specific direction; where K represents a predefined fuzzy kernel, representing a two-dimensional convolution operation;
(13) Constructing a diagonal matrix D to represent a straight-line falling path of the raindrops; simulating the inclination of the raindrops by rotating the diagonal matrix D, and then reproducing the falling speed and direction of the raindrops in the air by using Gaussian blur processing, so as to finally obtain a blur kernel M for simulating the raindrops;
(14) By the formula: Fusing the simulated weather effect with the original image;
Wherein I C represents the original image, β is the mixing weight, and N "is the noise matrix after blurring kernel.
3. The method for re-identifying a clothing changing pedestrian based on multilayer dynamic concentration and local pyramid aggregation according to claim 1, wherein the standardized preprocessing and data enhancement operations in step (1) comprise: horizontal overturn, random clipping and random erasing.
4. The method for re-identifying the clothing changing pedestrian based on multilayer dynamic concentration and local pyramid aggregation according to claim 1, wherein the step (2) is specifically as follows:
Setting an image x to belong to R W*H*C, wherein H, W and C respectively represent the height, the width and the channel number of the image x;
First, the image is partitioned into Q non-overlapping blocks, denoted { x i |i=1, 2, …, Q }; secondly, introducing an additional learning embeddable x cls as a feature representation of the aggregation at the beginning of the input sequence; then, adding a position code P behind the feature vector of each image block; finally, the input sequence transmitted to the transducer layer is formulated as:
Z0=[xcls;F(x1);F(x2);...;F(xQ)]+P
Wherein Z 0 represents an input sequence; p ε R (Q+1)*D represents the location embedding; f is a linear projection function that maps the image to D-dimensional space.
5. The method for re-identifying the clothing changing pedestrian based on multilayer dynamic concentration and local pyramid aggregation according to claim 1, wherein the step (3) is specifically as follows:
The input sequence Z 0 is input into the pedestrian feature extraction network for processing, each layer refines the features and integrates the context information through a multi-head self-attention mechanism, and the output features Z l of the first layer can be calculated by the following modes:
Zl=Transformerlayer(Zl-1),l=1,2,...,L
Wherein TransformerLayer represents a layer in the standard transducer architecture;
The output features of each transducer layer constitute { Z 1,Z2,...,ZL }.
6. The method for re-identifying a clothing changing pedestrian based on multilayer dynamic concentration and local pyramid aggregation according to claim 1, wherein the step (6) loss function comprises: ID loss and triplet loss; the ID loss adopts a traditional cross entropy loss function, and tag smoothing is not included; the formula is as follows:
Wherein B is the class number, y i is the one-hot code of the real label, and p i is the probability that the model prediction sample belongs to the ith class;
the triplet loss formula is as follows:
wherein d (ap) and d (an) each represent an anchor sample And positive samples/>And negative sample/>A distance therebetween; the hyper-parameter m serves as a lower limit for the distance between positive and negative pairs of samples:
wherein the function f (·) represents a feature extraction operator mapping the input image to the embedding space; Representing an L2 norm for calculating the Euclidean distance between two feature vectors; [ ] + is a hinge loss function, the loss is calculated only if the value in brackets is positive, otherwise the loss is 0;
The total loss function formula L is as follows:
Wherein initially the loss of each output is set to an equal weight, denoted u i, where i = 0,1,2,3; then dynamically adjusting the weight of each part through a back propagation algorithm in the training process;
Judging whether the maximum iteration times are reached, if so, outputting the final model precision, and if not, repeating the steps (2) - (5).
7. The method for re-identifying a clothing changing pedestrian based on multilayer dynamic concentration and local pyramid aggregation according to claim 1, further comprising the steps of: (0) constructing a monitoring network to obtain pedestrian video data; detecting pedestrians by adopting a target detection algorithm, and then obtaining a pedestrian detection frame by adopting a target tracking algorithm; pedestrian video sequences clipped to 258 x 128 pixel specification constitute an image dataset.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the computer program when loaded into the processor implements a method for re-identification of a clothing change pedestrian based on multi-layer dynamic concentration and local pyramid aggregation according to any one of claims 1-7.
CN202311661718.0A 2023-12-06 2023-12-06 Clothing changing pedestrian re-identification method based on multilayer dynamic concentration and local pyramid aggregation Active CN117635973B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311661718.0A CN117635973B (en) 2023-12-06 2023-12-06 Clothing changing pedestrian re-identification method based on multilayer dynamic concentration and local pyramid aggregation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311661718.0A CN117635973B (en) 2023-12-06 2023-12-06 Clothing changing pedestrian re-identification method based on multilayer dynamic concentration and local pyramid aggregation

Publications (2)

Publication Number Publication Date
CN117635973A CN117635973A (en) 2024-03-01
CN117635973B true CN117635973B (en) 2024-05-10

Family

ID=90023146

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311661718.0A Active CN117635973B (en) 2023-12-06 2023-12-06 Clothing changing pedestrian re-identification method based on multilayer dynamic concentration and local pyramid aggregation

Country Status (1)

Country Link
CN (1) CN117635973B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113627266A (en) * 2021-07-15 2021-11-09 武汉大学 Video pedestrian re-identification method based on Transformer space-time modeling
CN115482508A (en) * 2022-09-26 2022-12-16 天津理工大学 Reloading pedestrian re-identification method, reloading pedestrian re-identification device, reloading pedestrian re-identification equipment and computer-storable medium
CN115631513A (en) * 2022-11-10 2023-01-20 杭州电子科技大学 Multi-scale pedestrian re-identification method based on Transformer
JP2023523502A (en) * 2021-04-07 2023-06-06 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド Model training methods, pedestrian re-identification methods, devices and electronics
CN116486433A (en) * 2023-04-10 2023-07-25 浙江大学 Re-identification method based on cross self-distillation converter re-identification network
CN116977817A (en) * 2023-04-28 2023-10-31 浙江工商大学 Pedestrian re-recognition method based on multi-scale feature learning
CN117011883A (en) * 2023-05-16 2023-11-07 沈阳化工大学 Pedestrian re-recognition method based on pyramid convolution and transducer double branches

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2023523502A (en) * 2021-04-07 2023-06-06 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド Model training methods, pedestrian re-identification methods, devices and electronics
CN113627266A (en) * 2021-07-15 2021-11-09 武汉大学 Video pedestrian re-identification method based on Transformer space-time modeling
CN115482508A (en) * 2022-09-26 2022-12-16 天津理工大学 Reloading pedestrian re-identification method, reloading pedestrian re-identification device, reloading pedestrian re-identification equipment and computer-storable medium
CN115631513A (en) * 2022-11-10 2023-01-20 杭州电子科技大学 Multi-scale pedestrian re-identification method based on Transformer
CN116486433A (en) * 2023-04-10 2023-07-25 浙江大学 Re-identification method based on cross self-distillation converter re-identification network
CN116977817A (en) * 2023-04-28 2023-10-31 浙江工商大学 Pedestrian re-recognition method based on multi-scale feature learning
CN117011883A (en) * 2023-05-16 2023-11-07 沈阳化工大学 Pedestrian re-recognition method based on pyramid convolution and transducer double branches

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
Cloth-Changing Person Re-identification from A Single Image with Gait Prediction and Regularization;Xin Jin等;https://arxiv.org/pdf/2103.15537.pdf;20220331;全文 *
Multi-Biometric Unified Network for Cloth-Changing Person Re-Identification;Guoqing Zhang等;2022 IEEE International Conference on Multimedia and Expo (ICME);20220826;全文 *
Multi-direction and Multi-scale Pyramid in Transformer for Video-based Pedestrian Retrieval;Xianghao Zang等;https://arxiv.org/pdf/2202.06014.pdf;20220406;全文 *
Specialized Re-Ranking: A Novel Retrieval-Verification Framework for Cloth Changing Person Re-Identification;Renjie Zhang等;https://arxiv.org/pdf/2210.03592.pdf;20221007;全文 *
TransReID: Transformer-based Object Re-Identification;Shuting He等;https://arxiv.org/pdf/2102.04378.pdf;20210326;全文 *
基于CNN和TransFormer多尺度学习行人重识别方法;陈莹等;电子与信息学报;20230630;第45卷(第6期);全文 *

Also Published As

Publication number Publication date
CN117635973A (en) 2024-03-01

Similar Documents

Publication Publication Date Title
CN110348319B (en) Face anti-counterfeiting method based on face depth information and edge image fusion
CN106295601B (en) A kind of improved Safe belt detection method
Aggarwal et al. Image surface texture analysis and classification using deep learning
CN113221641B (en) Video pedestrian re-identification method based on generation of antagonism network and attention mechanism
CN111709311A (en) Pedestrian re-identification method based on multi-scale convolution feature fusion
CN105825183B (en) Facial expression recognizing method based on partial occlusion image
CN113011357B (en) Depth fake face video positioning method based on space-time fusion
CN103714326B (en) One-sample face identification method
CN108154133B (en) Face portrait-photo recognition method based on asymmetric joint learning
CN107392213B (en) Face portrait synthesis method based on depth map model feature learning
CN105893941B (en) A kind of facial expression recognizing method based on area image
Tereikovskyi et al. The method of semantic image segmentation using neural networks
CN111126155B (en) Pedestrian re-identification method for generating countermeasure network based on semantic constraint
Zhang Application of artificial intelligence recognition technology in digital image processing
CN117635973B (en) Clothing changing pedestrian re-identification method based on multilayer dynamic concentration and local pyramid aggregation
CN117036904A (en) Attention-guided semi-supervised corn hyperspectral image data expansion method
Liu et al. Iris double recognition based on modified evolutionary neural network
CN114360058B (en) Cross-view gait recognition method based on walking view prediction
CN112348011B (en) Vehicle damage assessment method and device and storage medium
Liu et al. Weather recognition of street scene based on sparse deep neural networks
Vankayalapati et al. Interfacing Kera’s deep learning technique for Real-Time Age and Gender Prediction
Zhou et al. Layer-weakening feature fusion network for remote sensing detection
Li et al. Facial feature localisation and subtle expression recognition based on deep convolution neural network
Fang et al. Learning deep compact channel features for object detection in traffic scenes
Tumati et al. Face Invariant Classification and Detection of Mythology Characters Using Custom Dataset (ClaDeMuC-CD)

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant