CN113392786B - Cross-domain pedestrian re-identification method based on normalization and feature enhancement - Google Patents

Cross-domain pedestrian re-identification method based on normalization and feature enhancement Download PDF

Info

Publication number
CN113392786B
CN113392786B CN202110689585.2A CN202110689585A CN113392786B CN 113392786 B CN113392786 B CN 113392786B CN 202110689585 A CN202110689585 A CN 202110689585A CN 113392786 B CN113392786 B CN 113392786B
Authority
CN
China
Prior art keywords
feature
pedestrian
normalization
unit
nem
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110689585.2A
Other languages
Chinese (zh)
Other versions
CN113392786A (en
Inventor
殷光强
贾召钱
王文超
曾宇昊
王春雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202110689585.2A priority Critical patent/CN113392786B/en
Publication of CN113392786A publication Critical patent/CN113392786A/en
Application granted granted Critical
Publication of CN113392786B publication Critical patent/CN113392786B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention belongs to the technical field of pedestrian re-identification, and particularly relates to a cross-domain pedestrian re-identification method based on normalization and feature enhancement. According to the technical scheme, on the basis of not using target domain data, domain gaps can be effectively restrained, pedestrian distinguishing characteristics are enhanced, and further the generalization capability of the recognition network model is enhanced; by means of the residual error connection idea, the example normalization can inhibit style difference and prevent information loss, so that the extracted features have domain invariance and the discrimination is kept; spatial information is fused into the channels through the attention unit CAB, and the characteristic weight of each channel is self-adaptively adjusted through constructing the dependency relationship among the channels, so that the pedestrian characteristic is effectively enhanced.

Description

Cross-domain pedestrian re-identification method based on normalization and feature enhancement
Technical Field
The invention belongs to the technical field of pedestrian re-identification, and particularly relates to a cross-domain pedestrian re-identification method based on normalization and feature enhancement.
Background
Cross-domain pedestrian re-identification refers to retrieving a target pedestrian from large-scale image or video data over different domains using computer vision techniques. The ideal cross-domain pedestrian re-recognition model can be trained once and tested at will, namely, the model is trained only by using the collected source domain data, and then the trained model can obtain good re-recognition effect on any other target domain. However, huge domain gaps often exist among data sets, which seriously hinder the generalization of the model from a source domain to a target domain and is also a main reason that the cross-domain pedestrian re-identification performance is difficult to improve.
Because of the inevitable domain differences between different data domains, many advanced re-recognition algorithms perform well when tested on a single data set, but have poor ability to generalize to another data domain. In order to improve the generalization capability of the model as much as possible, a plurality of cross-domain pedestrian re-identification methods appear in recent years, and the model is required to be better adapted to a target domain. The general method is to collect data of a part of target domains, cluster the extracted features by using a certain clustering algorithm to generate pseudo labels, train a model by using the generated pseudo labels, update model parameters, and iterate the steps until convergence. Although many cross-domain pedestrian re-identification methods do effectively improve the generalization capability of the model, the collection of the target domain data is also time-consuming and labor-consuming, and in practical application, the data of the target domain cannot be collected at all.
Specifically, in the cross-domain pedestrian re-identification model, the domain gap of the data set is mainly introduced in the data collection process, such as: differences in collection time can cause differences in image brightness, and differences in collection locations can cause differences in image background. These different stylistic differences make the data distribution in different domains different, which in turn leads to complications in the re-recognition task. Currently, transfer learning is one of the mainstream means for solving the problem of model generalization, specifically, knowledge or patterns learned in a certain field or task are applied to different but related fields or problems, and image style conversion (style transfer) is a transfer learning method in the image field, so that the problem of model generalization caused by image style difference can be effectively solved, and researchers widely apply the transfer learning method to the task of cross-domain pedestrian re-identification. The method of style migration based on generation of a countermeasure network (GAN) requires the use of target domain data during model training, adding additional collection and training costs, and Instance Normalization (IN) is used for style normalization, which performs a form of style normalization to adjust the feature statistics of the network. However, the IN can dilute the information carried by the global statistics of the feature response, obviously, the IN introduced IN the row re-identification task can normalize the image style to inhibit the inter-domain difference, but the process of the IN can lose some judgment information.
To improve the generalization ability of the model, it is also an effective means to enhance pedestrian features using an attention mechanism that enables the model to focus on the region of interest, which is generally divided into spatial attention and channel attention. The spatial attention utilizes the spatial relationship among the features to generate a spatial attention weight so as to position the concerned pedestrian information on the spatial dimension; channel attention is to improve the representational capacity of the network by modeling the dependencies of each channel. Different attention is usually paid to different tasks, researchers need to match the tasks in specific tasks, but simple overlapping use causes certain redundancy and wastes computing resources.
Disclosure of Invention
According to the problems in the prior art, the invention provides a cross-domain pedestrian re-identification method based on normalization and feature enhancement, which can effectively inhibit domain gaps and enhance pedestrian distinguishing features on the basis of not using target domain data, thereby enhancing the generalization capability of a model.
The method is realized by the following technical scheme:
the cross-domain pedestrian re-identification method based on normalization and feature enhancement is characterized by comprising the steps of establishing an identification network model, image feature normalization, image feature recovery and image feature output;
establishing a recognition network model, wherein the establishing of the recognition network model comprises establishing a normalization enhancement module NEM with an Instance normalization unit IN (instant normalization 1), a residual weight training unit CMS and an attention unit CAB, and inserting the normalization enhancement module NEM into a ResNet50 model by taking a ResNet50 model as a backbone network to form a recognition network model;
the image feature normalization comprises the following steps:
s11, extracting pedestrian image features x ∈ R based on ResNet50 modelc×h×w(Note: x, x in this embodiment)1、x2Are all shown as figuresImage features), where x is the input feature of the normalized enhancement module NEM, c is the number of channels of the image features, h is the height of the image features, w is the width of the image features, Rc×h×wRepresenting a real number domain space of dimensions c x h x w, x ∈ Rc×h×wRepresenting a vector in a real number domain space with input features x of dimensions c x h x w;
s12, obtaining input characteristic x e R by using example normalization unit INc×h×wAnd a variance σ (x) in each channel, and calculates a normalized feature x based on the obtained mean μ (x) and variance σ (x)1The calculation formula is as follows:
Figure BDA0003124378280000021
wherein, gamma and beta are learnable parameter vectors, and gamma belongs to Rc and beta belongs to Rc, which means that gamma and beta are vectors in a real number domain space with c dimension; the initial values of gamma and beta elements are respectively set to be 1 and 0, and then are automatically updated in the training process;
the image feature recovery comprises the following steps:
s21, the residual error weight training unit CMS is facilitated according to the normalized feature x1Learning a residual weight WrNamely, the following steps are provided:
Wr=sigmoid(mean(conv(x1)))
wherein conv (-) represents convolution, mean (-) represents global mean, sigmoid (-) represents activation function;
s22, based on the residual weight WrFusing input features x and normalized features x1And recovering discrimination information lost by the image characteristics due to style normalization, wherein a fusion formula is as follows:
x2=Wr×x1+(1-Wr)×x
wherein x is2For the restored image feature, named restored feature, and x2∈Rc×h×wExpressed is a recovered feature x2Is one of real number domain spaces of dimension c x h x wVector quantity;
the image feature output comprises the steps of:
s31, exploring the restored features x using the attention cell CAB2The relevance between different channels in the channel and the self-adaptive extraction of the attention weight W of the channelcNamely, the following steps are provided:
Wc=ca(x2)
where ca (-) is the attention unit CAB, the channel attention weight WcMeasures the recovered feature x2The importance of the information of each channel;
s32, attention weighting W by channelcFor the recovered feature x2Filtering is carried out to enhance the characterization capability of the pedestrian characteristics, namely:
f=(Wc+1)×x2
wherein f is the output characteristic of the normalization and enhancement module NEM.
Further, the ResNet50 model comprises a Res1 unit, a Res2 unit, a Res3 unit, a Res4 unit, a Res5 unit and a Head unit which are sequentially connected in a communication mode, and normalization enhancement modules NEM are inserted into the output ends of the Res2 unit, the Res3 unit, the Res4 unit and the Res5 unit respectively.
Further, the method also includes introducing a NEM loss function into the normalized enhancement module NEM at the output end of the Res5 unit, that is, the image feature output of the normalized enhancement module NEM further includes the following steps:
s33, respectively calculating the central loss C of the input feature xxAnd output characteristic f center loss CfIn order to measure the in-class dispersion of the input feature x and the output feature f in the feature space, the calculation formula is as follows:
Figure BDA0003124378280000041
Figure BDA0003124378280000042
wherein, cx j∈RdRepresenting the class center of the jth pedestrian in the input feature x; c. Cf j∈RdRepresenting the class center of the jth pedestrian in the output characteristic f; n represents the total number of pedestrians in the data set, m represents the total number of features of the jth pedestrian, and xjiI-th feature representing j-th pedestrian in input feature x, d representing the dimension of each feature, RdA real number domain space representing d dimensions, i.e. cx jAnd cf jA vector in a real number domain space that is both d-dimensional;
s34, based on the center loss cfAnd cxEstablishing an NEM loss function, wherein the NEM loss function is as follows:
Figure BDA0003124378280000043
wherein L isNEMThe loss values calculated for input feature x and output feature f of NEM 5.
Further, in the step S11, x ∈ Rc×h×wThe carried characteristic information comprises a style and a shape; the style comprises an imaging style of the image and a clothing style of the pedestrian, and the shape is the contour shape of the pedestrian in the image.
Further, in the step S31, a channel attention weight W is obtainedcThe method comprises the following steps:
s311, along the recovered feature x2Performing maximum pooling and average pooling on the channel dimension to obtain two 1 × h × w two-dimensional matrixes, and recovering the characteristic x2Respectively carrying out element-by-element multiplication with the two 1 xhxw two-dimensional matrixes so as to respectively introduce the spatial information respectively corresponding to the two 1 xhxw two-dimensional matrixes into the recovered characteristic x2In the channel of (a);
s312, respectively performing maximum pooling and average pooling on the features corresponding to the introduced spatial information along the spatial dimension to generate two spatial aggregation masks F1And F2And F is1∈Rc×1×1,F2∈Rc×1×1(ii) a Wherein R isc×1×1Representing a real number domain space of dimensions c × 1 × 1, F1∈Rc×1×1And F2∈Rc×1×1Is represented by F1And F2Vectors in real number domain space, all of dimensions c × 1 × 1;
s313, concat operation is carried out on the two space aggregation masks, and the result obtained through the concat operation is sequentially subjected to convolution and sigmoid operation and is fused to obtain the final channel attention weight Wc
Further, the spatial information includes global information and saliency information on a space corresponding to the 1 × h × w two-dimensional matrix.
The beneficial effect that this technical scheme brought:
1) according to the technical scheme, on the basis of not using target domain data, domain gaps can be effectively restrained, pedestrian distinguishing characteristics are enhanced, and further the generalization capability of the recognition network model is enhanced; by means of the residual error connection idea, the example normalization can inhibit style difference and prevent information loss, so that the extracted features have domain invariance and the discrimination is kept; spatial information is fused into the channels through the attention unit CAB, and the characteristic weight of each channel is self-adaptively adjusted through constructing the dependency relationship among the channels, so that the pedestrian characteristic is effectively enhanced.
2) According to the technical scheme, NEM loss function constraint is introduced to identify the invariant features of the network model learning domain, so that the distance in the feature class is reduced, and the feature distribution is optimized.
Drawings
The foregoing and following detailed description of the invention will be apparent when read in conjunction with the following drawings, in which:
FIG. 1 is a block diagram of the overall structure of a pedestrian re-identification model as described herein;
fig. 2 is a block diagram of the structure of the normalization enhancement module NEM;
FIG. 3 is a block diagram of the attention unit CAB;
fig. 4 is a comparison graph of the effect of different combinations of insertions of the normalized enhancement module NEM in the ResNet 50;
Detailed Description
The technical solutions for achieving the objects of the present invention are further illustrated by the following specific examples, and it should be noted that the technical solutions claimed in the present invention include, but are not limited to, the following examples.
Example 1
The embodiment discloses a cross-domain pedestrian re-identification method based on normalization and feature enhancement, and as a basic implementation scheme of the invention, the method comprises the steps of establishing an identification network model, normalizing image features, recovering image features and outputting the image features.
Establishing a recognition network model, including establishing a normalized enhancement module NEM with an example normalization unit IN (namely IN IN FIG. 2), a residual weight training unit CMS (namely CMS IN FIG. 2) and an attention unit CAB (namely CA IN FIG. 2) as shown IN FIG. 2, and taking a ResNet50 model as a backbone network, and inserting the normalized enhancement module NEM into a ResNet50 model to form the recognition network model.
Normalizing the image features, namely normalizing the style of the features by calculating the mean and variance in each channel of the image features, so that the style difference between different domains can be inhibited, and the method specifically comprises the following steps:
s11, extracting pedestrian image features x ∈ R based on ResNet50 modelc×h×wThe carried characteristic information, wherein x is the input characteristic of the normalization enhancement module NEM, c is the channel number of the image characteristic, h is the channel height of the image characteristic, w is the channel width of the image characteristic, Rc×h×wRepresenting a real number domain space of dimensions c x h x w, x ∈ Rc×h×wIt is shown that the input feature x is a vector in a real number domain space of dimensions c × h × w, and x ∈ Rc×h×wThe carried characteristic information comprises a style and a shape; the style comprises the imaging style of the image and the clothing style of the pedestrian, and the shape is the contour shape of the pedestrian in the image;
s12, obtaining input characteristic x e R by using example normalization unit INc×h×wAnd a variance σ (x) in each channel, and calculates a normalized feature x based on the obtained mean μ (x) and variance σ (x)1The calculation formula is as follows:
Figure BDA0003124378280000061
wherein μ (x) and σ (x) represent an average value and a variance value calculated over a spatial dimension (h × w) of the image feature, respectively; both γ and β are learnable parameter vectors, and γ ∈ Rc、β∈RcIndicating that both γ and β are vectors in a real number domain space that is c-dimensional; the initial values of gamma and beta elements are respectively set to be 1 and 0, and then the values are automatically updated in the training process, specifically, the gamma initializes the vector with 1 and the beta initializes the vector with 0, the values of the gamma and the beta automatically change according to the gradient of back propagation in the training process, the function of the gamma and the beta is to ensure that the original learned characteristics are kept after each data is normalized, and simultaneously, the normalization operation and the accelerated training can be completed.
Although image feature normalization helps reduce style variation resulting in inter-domain gaps, if the style itself contains pedestrian re-recognition discrimination information, it may also result in significant information loss while eliminating the style variation. For example, clothing of pedestrians is important re-identification discrimination information, the texture of clothing fabric obviously belongs to one of styles, and when the style is inhibited, the discrimination of the feature is weakened, so that the image feature normalization can inhibit style difference and prevent information loss by means of a residual error connection idea, and meanwhile, the extracted feature has domain invariance and maintains discrimination. The image feature recovery method is specifically realized by image feature recovery, and comprises the following steps:
s21, the residual error weight training unit CMS is facilitated according to the normalized feature x1Learning a residual weight WrNamely, the following steps are provided:
Wr=sigmoid(mean(conv(x1)))
where conv (-) represents convolution, mean (-) represents global mean, sigmoid (-) represents activation function, i.e., feature x is first normalized1Passing through a convolution layer with convolution kernel size of 3 × 3 × c, step length of 2 and output channel of 1, normalizing feature x1The contained information is compressed in the dimensions of space and channels, then the average value is calculated in each channel, and the space is further compressedInformation, finally obtaining residual weight W between 0 and 1 after sigmoid mappingrI.e. residual weights Wr∈R1
S22, based on the residual weight WrFusing input features x and normalized features x1And recovering discrimination information lost by the image characteristics due to style normalization, wherein a fusion formula is as follows:
x2=Wr×x1+(1-Wr)×x
wherein x is2For the restored image feature, named restored feature, and x2∈Rc×h×wExpressed is a recovered feature x2Is a vector in a real number domain space of dimensions c x h x w.
Since spatial information is gradually compressed and pedestrian-related information is gradually shifted to channel dimensions in the feature extraction process (referring to the overall feature extraction process, more than one link) by the ResNet50 module, it is necessary to enhance pedestrian features by means of channel attention, that is, the image feature output includes the following steps:
s31, exploring the restored features x using the attention cell CAB2The relevance between different channels enables the attention to be focused on the most meaningful part of the pedestrian image, and the attention weight W of the channel is extracted in a self-adaptive mannercNamely, the following steps are provided:
Wc=ca(x2)
where ca (-) is the attention unit CAB, the channel attention weight WcMeasure x2The importance of the information of each channel;
s32, attention weighting W by channelcFor the recovered feature x2Filtering is carried out to enhance the characterization capability of the pedestrian characteristics, namely:
f=(Wc+1)×x2
wherein f is the output characteristic of the normalization and enhancement module NEM.
According to the technical scheme, on the basis of not using target domain data, domain gaps can be effectively restrained, pedestrian distinguishing characteristics are enhanced, and further the generalization capability of the recognition network model is enhanced; by means of the residual error connection idea, the example normalization can inhibit style difference and prevent information loss, so that the extracted features have domain invariance and the discrimination is kept; spatial information is fused into the channels through the attention unit CAB, and the characteristic weight of each channel is self-adaptively adjusted through constructing the dependency relationship among the channels, so that the pedestrian characteristic is effectively enhanced.
Example 2
The embodiment discloses a cross-domain pedestrian re-identification method based on normalization and feature enhancement, which is a preferred implementation scheme of the invention, namely in the embodiment 1, a ResNet50 model comprises a Resl unit, a Res2 unit, a Res3 unit, a Res4 unit, a Res5 unit and a Head unit which are sequentially connected in a communication mode, a normalization enhancement module NEM is inserted after each Res unit or part of Res units of a ResNet50 model, and the normalization enhancement module NEM can respectively enhance features in relevant stages, so that the overall effect is good. In the ResNet50 model, the features obtained by the operation of the Res1 unit are too shallow and basically do not contain semantic information such as styles, and the normalized enhancement module NEM is inserted after the Res1 unit to play a role in feature enhancement, so that the complexity of the model is further increased, and therefore, the normalized enhancement module NEM is not inserted after the Res1 unit in the process of designing the identified network model.
Specifically, the effect of the NEM is best after the Res units are inserted into the NEM, and verification can be carried out according to experiments. As shown in fig. 4: NEM23 indicates the insertion of normalized-enhancement-module NEM at the output of Res2 and Res3 cells, NEM234 indicates the insertion of normalized-enhancement-module NEM at the output of Res2, Res3 and Res4 cells, NEM2345 indicates the insertion of normalized-enhancement-module NEM at the output of Res2, Res3, Res4 and Res5 cells, NEM345 indicates the insertion of normalized-enhancement-module NEM at the output of Res3, Res4 and Res5 cells, NEM45 indicates the insertion of normalized-enhancement-module NEM at the output of Res4 and Res5 cells, respectively; in addition, M, D, MS in the abscissa represents three pedestrian re-identification common data sets of Market1501, DukeMTMC-reiD and MSMT17 respectively; M-D represents training a model on a Market1501, then carrying out pedestrian re-identification test on the trained model on a DukeMTMC-reiD, and so on, wherein D-M, MS-M and MS-D are the same principle; the ordinate represents the mAP accuracy. As can be seen from fig. 4, NEM2345 has the best effect and can effectively enhance the cross-domain pedestrian re-recognition performance of the model, so that normalization enhancing modules NEM, such as NEM2, NEM3, NEM4 and NEM5 shown in fig. 1, are inserted into the output ends of Res2 unit, Res3 unit, Res4 unit and Res5 unit, respectively. Thus, the network model identification method in the technical scheme comprises the following working procedures: the Res2 unit of the ResNet50 model extracts the image characteristics of the original image, and the image characteristics are normalized, restored and output through NEM2, and then the image characteristics are sent to: and extracting deeper features of pedestrians from Res3 units of the ResNet50 model, and continuing image feature normalization, image feature recovery and image feature output processing on the features of Res3 units by NEM3, and so on, wherein the Res4 units and NEM4 and Res5 units and NEM5 are the same principle until the Head unit of the ResNet50 model is finally output.
Example 3
This example discloses a cross-domain pedestrian re-identification method based on normalization and feature enhancement, which is a preferred implementation of the present invention, that is, in example 2, in order to promote better clustering characteristics of features, the method further includes introducing a NEM loss function into the normalization and enhancement module NEM at the output end of the Res5 unit, and constraining the normalization and enhancement module NEM (i.e., NEM5), where it is expected that features extracted by NEM5 have better domain invariance and discriminability, and therefore, the image feature output of the normalization and enhancement module NEM further includes the following steps:
s33, respectively calculating the central loss C of the input feature xxAnd output characteristic f center loss CfIn order to measure the in-class dispersion of the input feature x and the output feature f in the feature space, the calculation formula is as follows:
Figure BDA0003124378280000091
Figure BDA0003124378280000092
wherein, cx j∈RdRepresenting the class center of the jth pedestrian in the input feature x; c. Cf j∈RdRepresenting the class center of the jth pedestrian in the output characteristic f; n represents the total number of pedestrians in the data set, m represents the total number of features of the jth pedestrian, and xjiI-th feature representing j-th pedestrian in input feature x, d representing the dimension of each feature, RdA real number domain space representing d dimensions, i.e. cx jAnd cf jVectors in real number domain space, both d-dimensional;
s34, based on the center loss cfAnd cxEstablishing an NEM loss function, wherein the NEM loss function is as follows:
Figure BDA0003124378280000093
wherein L isNEMThe loss values calculated for input feature x and output feature f of NEM 5.
According to the technical scheme, NEM loss function constraint is introduced to identify the invariant features of the network model learning domain, so that the distance in the feature class is reduced, and the feature distribution is optimized.
Example 4
This example discloses a cross-domain pedestrian re-identification method based on normalization and feature enhancement, which is a preferred embodiment of the present invention, that is, in step S31 of example 1, as shown in fig. 3, a channel attention weight W is obtainedcThe method comprises the following steps:
s311, along the recovered feature x2Performing maximum pooling and average pooling on the channel dimension to obtain two 1 × h × w two-dimensional matrixes, and recovering the characteristic x2Respectively carrying out element-by-element multiplication with the two 1 xhxw two-dimensional matrixes so as to respectively introduce the spatial information respectively corresponding to the two 1 xhxw two-dimensional matrixes into the recovered characteristic x2In the channel of (a);
s312, in order to effectively calculate the channel attention, the relative phase is requiredCompressing the spatial dimension of the features, generally using average pooling for aggregation of spatial information to focus more on global information, however, maximum pooling can also obtain unique features about pedestrians to infer more detailed information on the channel, and thus, performing maximum pooling and average pooling on the features corresponding to the introduced spatial information along the spatial dimension to generate two spatial aggregation masks F1And F2And F is1∈Rc×1×1,F2∈Rc ×1×1The masks respectively focus on global information and unique information about pedestrians in the feature map; wherein R isc×h×1Representing a real number domain space of dimensions c × 1 × 1, F1∈Rc×1×1And F2∈Rc×1×1Is represented by F1And F2Vectors in real number domain space, all of dimensions c × 1 × 1;
s313, concat (vector splicing) operation is carried out on the two space aggregation masks, and the result obtained through the concat operation is sequentially subjected to convolution and sigmoid operation and is fused to obtain the final channel attention weight Wc

Claims (4)

1. The cross-domain pedestrian re-identification method based on normalization and feature enhancement is characterized by comprising the steps of establishing an identification network model, image feature normalization, image feature recovery and image feature output;
the identification network model building method comprises the steps of building a normalization enhancement module NEM with an instance normalization unit IN, a residual weight training unit CMS and an attention unit CAB, using a ResNet50 model as a main network, and inserting the normalization enhancement module NEM into a ResNet50 model to form an identification network model; the ResNet50 model comprises a Res1 unit, a Res2 unit, a Res3 unit, a Res4 unit, a Res5 unit and a Head unit which are sequentially connected in a communication mode, and normalization enhancement modules NEM are inserted into the output ends of the Res2 unit, the Res3 unit, the Res4 unit and the Res5 unit respectively;
the image feature normalization comprises the following steps:
s11, extracting pedestrian image features x ∈ R based on ResNet50 modelc×h×wCarry aboutWherein x is the input feature of the normalized enhancement module NEM, c is the number of channels of the image feature, h is the height of the image feature, w is the width of the image feature, Rc×h×wRepresenting a real number domain space of dimensions c x h x w, x ∈ Rc×h×wRepresenting a vector in a real number domain space with input features x of dimensions c x h x w;
s12, obtaining input characteristic x e R by using example normalization unit INc×h×wAnd a variance σ (x) in each channel, and calculates a normalized feature x based on the obtained mean μ (x) and variance σ (x)1The calculation formula is as follows:
Figure FDA0003530026290000011
where γ and β are both learnable parameter vectors, and γ ∈ Rc、β∈RcIndicating that both γ and β are vectors in a real number domain space that is c-dimensional; the initial values of gamma and beta elements are respectively set to be 1 and 0, and then are automatically updated in the training process;
the image feature recovery comprises the following steps:
s21, the residual error weight training unit CMS is facilitated according to the normalized feature x1Learning a residual weight WrNamely, the following steps are provided:
Wr=sigmoid(mean(conv(x1)))
wherein conv (-) represents convolution, mean (-) represents global mean, sigmoid (-) represents activation function;
s22, based on the residual weight WrFusing input features x and normalized features x1And recovering discrimination information lost by the image characteristics due to style normalization, wherein a fusion formula is as follows:
x2=Wr×x1+(1-Wr)×x
wherein x is2For the restored image feature, named restored feature, and x2∈Rc×h×wExpressed is a recovered feature x2Is one in a real number domain space of dimension c x h x wA vector number;
the image feature output comprises the steps of:
s31, exploring the restored features x using the attention cell CAB2The relevance between different channels in the channel and the self-adaptive extraction of the attention weight W of the channelcNamely, the following steps are provided:
Wc=ca(x2)
where ca (-) is the attention unit CAB, the channel attention weight WcMeasure x2The importance of the information of each channel;
s32, attention weighting W by channelcFor the recovered feature x2Filtering is carried out to enhance the characterization capability of the pedestrian characteristics, namely:
f=(Wc+1)×x2
wherein f is the output characteristic of the normalization enhancement module NEM;
s33, respectively calculating the central loss C of the input feature xxAnd output characteristic f center loss CfIn order to measure the in-class dispersion of the input feature x and the output feature f in the feature space, the calculation formula is as follows:
Figure FDA0003530026290000021
Figure FDA0003530026290000022
wherein, cx j∈RdRepresenting the class center of the jth pedestrian in the input feature x; c. Cf j∈RdRepresenting the class center of the jth pedestrian in the output characteristic f; n represents the total number of pedestrians in the data set, m represents the total number of features of the jth pedestrian, and xjiI-th feature representing j-th pedestrian in input feature x, d representing the dimension of each feature, RdA real number domain space representing d dimensions, i.e. cx jAnd cf jVectors in real number domain space, both d-dimensional;
s34, based on the center loss cfAnd cxEstablishing an NEM loss function, wherein the NEM loss function is as follows:
Figure FDA0003530026290000031
wherein L isNEMThe loss values calculated for input feature x and output feature f of NEM 5.
2. The cross-domain pedestrian re-identification method based on normalization and feature enhancement as claimed in claim 1, wherein: in the step S11, x ∈ Rc×h×wThe carried characteristic information comprises a style and a shape; the style comprises an imaging style of the image and a clothing style of the pedestrian, and the shape is the contour shape of the pedestrian in the image.
3. The cross-domain pedestrian re-identification method based on normalization and feature enhancement as claimed in claim 1, wherein in the step S31, the channel attention weight W is obtainedcThe method comprises the following steps:
s311, along the recovered feature x2Performing maximum pooling and average pooling on the channel dimension to obtain two 1 × h × w two-dimensional matrixes, and recovering the characteristic x2Respectively carrying out element-by-element multiplication with the two 1 xhxw two-dimensional matrixes so as to respectively introduce the spatial information respectively corresponding to the two 1 xhxw two-dimensional matrixes into the recovered characteristic x2In the channel of (a);
s312, respectively performing maximum pooling and average pooling on the features corresponding to the introduced spatial information along the spatial dimension to generate two spatial aggregation masks F1And F2And F is1∈Rc×1×1,F2∈Rc×1×1(ii) a Wherein R isc×1×1Representing a real number domain space of dimensions c × 1 × 1, F1∈Rc×1×1And F2∈Rc×1×1Is represented by F1And F2Vectors in real number domain space, all of dimensions c × 1 × 1;
s313, polymerizing the two spacesPerforming concat operation on the mask, sequentially performing convolution and sigmoid operation on the result obtained by the concat operation, and fusing to obtain the final channel attention weight Wc
4. The cross-domain pedestrian re-identification method based on normalization and feature enhancement as claimed in claim 3, wherein the spatial information includes global information and saliency information on a corresponding space of a1 x h x w two-dimensional matrix.
CN202110689585.2A 2021-06-21 2021-06-21 Cross-domain pedestrian re-identification method based on normalization and feature enhancement Active CN113392786B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110689585.2A CN113392786B (en) 2021-06-21 2021-06-21 Cross-domain pedestrian re-identification method based on normalization and feature enhancement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110689585.2A CN113392786B (en) 2021-06-21 2021-06-21 Cross-domain pedestrian re-identification method based on normalization and feature enhancement

Publications (2)

Publication Number Publication Date
CN113392786A CN113392786A (en) 2021-09-14
CN113392786B true CN113392786B (en) 2022-04-12

Family

ID=77623278

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110689585.2A Active CN113392786B (en) 2021-06-21 2021-06-21 Cross-domain pedestrian re-identification method based on normalization and feature enhancement

Country Status (1)

Country Link
CN (1) CN113392786B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117994822A (en) * 2024-04-07 2024-05-07 南京信息工程大学 Cross-mode pedestrian re-identification method based on auxiliary mode enhancement and multi-scale feature fusion

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106815838A (en) * 2017-01-22 2017-06-09 晶科电力有限公司 A kind of method and system of the detection of photovoltaic module hot spot
CN110008842A (en) * 2019-03-09 2019-07-12 同济大学 A kind of pedestrian's recognition methods again for more losing Fusion Model based on depth
CN111832514B (en) * 2020-07-21 2023-02-28 内蒙古科技大学 Unsupervised pedestrian re-identification method and unsupervised pedestrian re-identification device based on soft multiple labels
CN111739036B (en) * 2020-07-22 2022-09-09 吉林大学 Hyperspectrum-based file handwriting counterfeiting detection method
CN112069920B (en) * 2020-08-18 2022-03-15 武汉大学 Cross-domain pedestrian re-identification method based on attribute feature driven clustering
CN112200764B (en) * 2020-09-02 2022-05-03 重庆邮电大学 Photovoltaic power station hot spot detection and positioning method based on thermal infrared image

Also Published As

Publication number Publication date
CN113392786A (en) 2021-09-14

Similar Documents

Publication Publication Date Title
Wang et al. SaliencyGAN: Deep learning semisupervised salient object detection in the fog of IoT
CN107766850B (en) Face recognition method based on combination of face attribute information
Sun et al. Robust co-training
CN110728209A (en) Gesture recognition method and device, electronic equipment and storage medium
CN107578007A (en) A kind of deep learning face identification method based on multi-feature fusion
CN111625667A (en) Three-dimensional model cross-domain retrieval method and system based on complex background image
CN112990316B (en) Hyperspectral remote sensing image classification method and system based on multi-saliency feature fusion
CN109034035A (en) Pedestrian's recognition methods again based on conspicuousness detection and Fusion Features
CN110175248B (en) Face image retrieval method and device based on deep learning and Hash coding
CN110097029B (en) Identity authentication method based on high way network multi-view gait recognition
CN111339818A (en) Face multi-attribute recognition system
CN105139000A (en) Face recognition method and device enabling glasses trace removal
CN112085055A (en) Black box attack method based on migration model Jacobian array feature vector disturbance
CN110633624A (en) Machine vision human body abnormal behavior identification method based on multi-feature fusion
CN112801019B (en) Method and system for eliminating re-identification deviation of unsupervised vehicle based on synthetic data
CN111402156B (en) Restoration method and device for smear image, storage medium and terminal equipment
Song et al. A joint siamese attention-aware network for vehicle object tracking in satellite videos
CN114973418A (en) Behavior identification method of cross-modal three-dimensional point cloud sequence space-time characteristic network
CN113392786B (en) Cross-domain pedestrian re-identification method based on normalization and feature enhancement
CN112507778A (en) Loop detection method of improved bag-of-words model based on line characteristics
CN114596589A (en) Domain-adaptive pedestrian re-identification method based on interactive cascade lightweight transformations
Hua et al. Polarimetric SAR image classification based on ensemble dual-branch CNN and superpixel algorithm
CN104463962B (en) Three-dimensional scene reconstruction method based on GPS information video
CN111291705A (en) Cross-multi-target-domain pedestrian re-identification method
CN113052017B (en) Unsupervised pedestrian re-identification method based on multi-granularity feature representation and domain self-adaptive learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant