CN116994332A - Cross-mode pedestrian re-identification method and system based on contour map guidance - Google Patents

Cross-mode pedestrian re-identification method and system based on contour map guidance Download PDF

Info

Publication number
CN116994332A
CN116994332A CN202310721195.8A CN202310721195A CN116994332A CN 116994332 A CN116994332 A CN 116994332A CN 202310721195 A CN202310721195 A CN 202310721195A CN 116994332 A CN116994332 A CN 116994332A
Authority
CN
China
Prior art keywords
pedestrian
features
image
contour
visible light
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310721195.8A
Other languages
Chinese (zh)
Inventor
赵秀阳
徐启龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Jinan
Original Assignee
University of Jinan
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Jinan filed Critical University of Jinan
Priority to CN202310721195.8A priority Critical patent/CN116994332A/en
Publication of CN116994332A publication Critical patent/CN116994332A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/12Details of acquisition arrangements; Constructional details thereof
    • G06V10/14Optical characteristics of the device performing the acquisition or on the illumination arrangements
    • G06V10/143Sensing or illuminating at different wavelengths
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Social Psychology (AREA)
  • Psychiatry (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a cross-mode pedestrian re-recognition method and system based on contour map guidance, and belongs to the technical field of pedestrian re-recognition. The invention adopts a mode of adding and multiplying cooperation to weight the image features to the corresponding contour features, the adding and multiplying cooperation can be combined to the respective advantages of the image features and the contour features and save the information of the original image, a feature energy function is defined to weight the fusion features, the discrete degree of each neuron of the fusion features is calculated through mean square error, the features at the contour edge position after being weighted by the energy function can generate larger weight, the features at the contour edge position can acquire more attention in the subsequent calculation, and the obtained contour guiding features and the original image features are jointly used for loss calculation, so that the recognition accuracy of the cross-modal pedestrian re-recognition is improved. The problem that the pedestrian re-recognition accuracy is reduced due to the difference between multi-mode images in the prior art is solved.

Description

Cross-mode pedestrian re-identification method and system based on contour map guidance
Technical Field
The invention relates to the technical field of pedestrian re-recognition, in particular to a cross-mode pedestrian re-recognition method and system based on contour map guidance.
Background
The statements in this section merely relate to the background of the present disclosure and may not necessarily constitute prior art.
Pedestrian re-identification (ReID) is a vital task in video surveillance and smart cities. The goal of ReID is to search pedestrian images with the same identity in a pedestrian image library shot under the cross camera for the pedestrian images of a given query set. Most research today focuses on matching RGB pedestrian images captured by a visible light camera. However, under the condition of weak light, the imaging effect of the visible light camera is poor, a large number of noise points are generated in the shot image, and even a complete pedestrian image cannot be shot, so that the accuracy of matching the model with the pedestrian is sharply reduced. In recent years, many monitoring systems automatically switch cameras to near infrared mode in low light environments so as to capture complete appearance information of pedestrians at night. The accumulation of increasingly cross-modal data has prompted researchers to divert attention to visible-near infrared pedestrian re-identification (VI-ReID). By matching pedestrian images from two different modalities, the VI-ReID can be matched to the target pedestrian even under poor lighting conditions. VI-ReID also faces new challenges compared to traditional ReID tasks: the color difference at the channel level between the two modality pedestrian images can make it difficult for the model to mine pedestrian features that are identity-differentiated. The recognition accuracy of the cross-mode pedestrian re-recognition model is improved, and the method has important significance for urban intelligent security.
How to alleviate the influence caused by the mode difference and obtain a better pedestrian recognition effect is a problem to be solved urgently at present, and the main current methods are mainly divided into two main types:
the first method is to extract the characteristic with identity distinguishing property from a network architecture with reasonable design, and directly map the characteristic to a unified characteristic space to perform similarity comparison so as to match pedestrians with the same identity. Fu et al automatically search the best separation scheme of the BN layer for the feature extraction network by introducing an NAS method, park et al align the features by using the similarity among dense cross-modal features, so that the difference among modalities is relieved, and the discrimination of character characterization is further enhanced. However, the difference between the two modal images is nonlinear, and simply extracting the characteristics of the pedestrian image through the design network structure to perform similarity is difficult to minimize the influence caused by the modal difference.
The second type of method is an image-based method that aims at mitigating modal differences by generating and commonly inputting related images into a feature extraction network. Wang et al propose a method for generating a mode complementary image by using a cycle-GAN network, namely, generating a corresponding near-infrared mode pedestrian image for a visible light pedestrian image, and generating a visible light pedestrian image for the near-infrared pedestrian image, so that the cross-mode problem is regarded as a single-mode pedestrian matching problem; zhang et al propose a method of encoding identity-paired images of two modalities into a unified high-dimensional space and close to their characteristic distances by means of a pair of encoders, generating intermediate-modality images by means of a decoder, and inputting such intermediate-modality images together into a feature extraction network to mitigate the effects of modality differences. However, since the two modality pedestrian images of the cross-modality pedestrian re-recognition task are not aligned at the pixel level, the image quality generated by the GAN network or the encoder method is often difficult to guarantee. Ma et al propose to introduce such pixel-level aligned images of the pedestrian profile into a cross-modal pedestrian re-recognition task, but it re-uses a simple element addition method to fuse the image features and profile features at different feature extraction stages, resulting in the pedestrian profile not being fully expressed in the fused features.
Disclosure of Invention
In order to solve the defects of the prior art, the invention provides a cross-mode pedestrian re-recognition method, a system, electronic equipment and a computer readable storage medium based on contour map guidance, which are characterized in that a contour detection model is used for extracting corresponding contour maps of two-mode pedestrian images, the corresponding contour maps and an original pedestrian image are put into a feature extraction network together, the contour maps and the original image are effectively fused by using a multiplication and synergy method at different stages of feature extraction, and an energy function of fusion features is defined to re-weight the fusion features to strengthen the representation capability of contour information, so that the influence of mode differences on pedestrian re-recognition accuracy is relieved by fusing the contour information.
In a first aspect, the invention provides a cross-modal pedestrian re-recognition method based on contour map guidance;
the cross-mode pedestrian re-identification method based on contour map guidance comprises the following steps:
a visible light mode pedestrian image and a near infrared mode pedestrian image are obtained, and a pedestrian contour map is generated according to the visible light mode pedestrian image and the near infrared mode pedestrian image;
extracting visible light image features and near infrared image features according to the visible light mode pedestrian images and the near infrared mode pedestrian images;
acquiring contour guiding features by a multiplication and coordination method according to the pedestrian contour map, the visible light image features and the near infrared image features; defining a feature energy function, and updating the contour guiding feature;
performing horizontal segmentation on the visible light image characteristics to obtain visible light image local characteristics; performing horizontal segmentation on the near infrared image characteristics to obtain near infrared image local characteristics; horizontally dividing the updated contour guiding features to obtain contour guiding local features;
and carrying out identity prediction on the pedestrians according to the contour guiding local features, the visible light image local features and the near infrared image local features, and obtaining a pedestrian re-identification result.
Further, the generating the pedestrian profile according to the visible light mode pedestrian image and the near infrared mode pedestrian image includes:
inputting the visible light model pedestrian image into a contour detector to obtain a visible light model pedestrian contour map;
inputting the near-infrared mode pedestrian image into a contour detector to obtain a near-infrared mode pedestrian contour map;
and splicing the visible light mode pedestrian profile and the corresponding near infrared mode pedestrian profile to obtain a pedestrian profile.
Further, the extracting the features of the visible light image and the features of the near-infrared image according to the visible light mode pedestrian image and the near-infrared mode pedestrian image includes:
inputting the visible light mode pedestrian image and the near infrared mode pedestrian image into a double-flow feature extraction network;
the first branch of the double-flow feature extraction network extracts visible light mode pedestrian images to obtain visible light image features; the second branch of the double-flow feature extraction network extracts the near-infrared mode pedestrian image to acquire near-infrared image features;
the main network of the dual-flow characteristic extraction network is ResNet-50, the first branch and the second branch do not share the parameters of the first convolution block to extract the modal private characteristics, and the first branch and the second branch share the residual block parameters to extract the modal common characteristics.
Further, the acquiring the contour guiding feature according to the pedestrian contour map, the visible light image feature and the near infrared image feature includes:
inputting the pedestrian profile into a third branch for processing to obtain profile characteristics, wherein a backbone network of the third branch is ResNet-50;
and weighting the visible light image features and the near infrared image features to corresponding contour features through a multiplication and coordination method to obtain contour guiding features.
Further, the defining the feature energy function, and updating the contour guiding feature specifically includes:
defining a characteristic energy function, determining the weight of the contour guiding feature according to the characteristic energy function and the neuron discrete degree, and updating the contour guiding feature according to the weight.
Further, the characteristic energy function is expressed as:
wherein F isAlpha, beta and gamma are super parameters as feature vectors, mu and phi 2 Guiding the degree of discretization of the neurons in the features for the contours.
Further, the step of predicting the identity of the pedestrian according to the contour guiding local feature, the visible light image local feature and the near infrared image local feature, and the step of obtaining the pedestrian re-recognition result includes:
re-weighting the local features of the visible light image and the near infrared image through non-local attention, inputting the re-weighted local features of the visible light image and the re-weighted local features of the near infrared image into a global pooling layer, and obtaining local feature vectors of the visible light image and local feature vectors of the near infrared image;
inputting the contour guiding local feature into a global pooling layer to obtain a contour guiding local feature vector;
and inputting the contour guiding local feature vector, the visible light image local feature vector and the near infrared image local feature vector into a full-connection layer for identity prediction, and obtaining a pedestrian re-recognition result.
In a second aspect, the invention provides a cross-modal pedestrian re-recognition system based on contour map guidance;
cross-modal pedestrian re-recognition system based on contour map guidance, comprising:
a profile generation module configured to: a visible light mode pedestrian image and a near infrared mode pedestrian image are obtained, and a pedestrian contour map is generated according to the visible light mode pedestrian image and the near infrared mode pedestrian image; a feature fusion module configured to: extracting visible light image features and near infrared image features according to the visible light mode pedestrian images and the near infrared mode pedestrian images; acquiring contour guiding features by a multiplication and coordination method according to the pedestrian contour map, the visible light image features and the near infrared image features; defining a feature energy function, and updating the contour guiding feature;
a pedestrian re-identification module configured to: performing horizontal segmentation on the visible light image characteristics to obtain visible light image local characteristics; performing horizontal segmentation on the near infrared image characteristics to obtain near infrared image local characteristics; horizontally dividing the updated contour guiding features to obtain contour guiding local features; and carrying out identity prediction on the pedestrians according to the contour guiding local features, the visible light image local features and the near infrared image local features, and obtaining a pedestrian re-identification result.
In a third aspect, the present invention provides an electronic device;
an electronic device comprising a memory and a processor and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the steps of the contour map-guided cross-modality pedestrian re-recognition method described above.
In a fourth aspect, the present invention provides a computer-readable storage medium;
a computer readable storage medium storing computer instructions which, when executed by a processor, perform the steps of the cross-modal pedestrian re-recognition method based on profile guidance described above.
Compared with the prior art, the invention has the beneficial effects that:
1. according to the technical scheme provided by the invention, aiming at the problem of nonlinear difference between visible light and near infrared modes in cross-mode pedestrian re-recognition, a method combining multiplication and combination is provided for introducing a pedestrian profile, so that pedestrian images and profile features are effectively fused, and modal difference is relieved.
2. In order to further improve the expression capability of contour information in the fusion characteristics, the technical scheme provided by the invention designs an energy function, and weights the fusion characteristics according to the discrete degree among neurons to strengthen the expression capability of the contour region neurons; the fusion features and the original image features predict the identity of the pedestrian together, so that the accuracy rate of the cross-mode pedestrian re-recognition is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.
FIG. 1 is a schematic flow chart provided in an embodiment of the present invention;
fig. 2 is a schematic diagram of a network architecture according to an embodiment of the present invention;
FIG. 3 is a schematic diagram showing the comparison of the effects of the cross-modal pedestrian re-recognition method based on contour map guidance and the regDB dataset based on other methods according to the embodiment of the invention;
fig. 4 is a schematic diagram showing the comparison of effects of the cross-modal pedestrian re-recognition method based on contour map guidance based on the SYSU-MM01 dataset in other methods according to an embodiment of the present invention.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, unless the context clearly indicates otherwise, the singular forms also are intended to include the plural forms, and furthermore, it is to be understood that the terms "comprises" and "comprising" and any variations thereof are intended to cover non-exclusive inclusions, such as, for example, processes, methods, systems, products or devices that comprise a series of steps or units, are not necessarily limited to those steps or units that are expressly listed, but may include other steps or units that are not expressly listed or inherent to such processes, methods, products or devices.
Embodiments of the invention and features of the embodiments may be combined with each other without conflict.
Example 1
Because the pedestrian images of the identity pairing of the visible light and the near infrared modes are not in one-to-one correspondence at the pixel level, and the generation process of the images is also uncontrollable, in the prior art, the image meeting the actual requirements is difficult to generate by a GAN-based method, and the mode difference cannot be effectively relieved; therefore, the invention provides a cross-modal pedestrian re-recognition method based on contour map guidance, which effectively fuses the contour map and an original image in different stages of feature extraction by using a multiplication and synergy method, defines an energy function of fusion features to re-weight the fusion features to strengthen the representation capability of contour information, and relieves the influence of modal differences on the accuracy of cross-modal pedestrian re-recognition by fusing the contour information.
Next, a detailed description will be given of one disclosed in this embodiment with reference to fig. 1 to fig. 4, and the cross-mode pedestrian re-recognition method based on contour map guidance includes the following steps:
step 1, a visible light mode pedestrian image and a near infrared mode pedestrian image are obtained, and a pedestrian contour map is generated according to the visible light mode pedestrian image and the near infrared mode pedestrian image.
Specifically, a visible light mode pedestrian image and a near infrared mode pedestrian image are input into a contour detector, and a corresponding pedestrian contour map is obtained.
Taking reading data in a cross-mode pedestrian re-identification data set SYSU-MM01 as an example, the detailed description of the step 1 is carried out, and the specific flow is as follows:
s101, reading a cross-mode pedestrian re-identification data set SYSU-MM01, dividing the data set into uniform training batches, and defining visible light mode pedestrian images in each training batch asNear-infrared modality pedestrian image is defined as +.>Where N x P represents N pedestrian identities included in each training batch, and for each pedestrian identity, P pedestrian images are randomly selected in the visible light mode pedestrian image set and the near infrared mode pedestrian image set, respectively, and all training images are scaled to 384 x 192 x 3.
In this embodiment, the parameters N are set to 4 and P is set to 4, so the size parameter of each training lot is [32,384,192,3].
S102, extracting the feeding characteristics of the pedestrian images in each training batchBefore the network, firstly, a contour detection model RCF pre-trained on a BSDS500 data set is utilized to carry out contour detection on two mode images by taking VGG-16 as a main network so as to generate a corresponding visible light mode pedestrian contour map X vis2c =RCF(X vis ) And near infrared modality pedestrian profile X ie2c =RCF(X ir ),X vis2c And X ir2c Also 384×192×3, and each contour map is assigned the same pedestrian identity label as the corresponding original image. X is to be vis2c And X ir2c Splicing in the batch dimension to obtain (X vis2c ,X ir2c ) (the dimensional parameters are the same as those of each batch, i.e. [32,384,192,3]])。
Step 2, extracting visible light image features and near infrared image features according to the visible light mode pedestrian images and the near infrared mode pedestrian images; and extracting contour features according to the pedestrian contour map.
Specifically, the identity paired visible light mode pedestrian image and the near infrared mode pedestrian image are sent to a first branch and a second branch of a double-flow feature extraction network, and the pedestrian profile generated in the step 1 is input to an additional third branch; wherein, the double-flow characteristic extraction network takes ResNet-50 as a main network, and the third branch also takes ResNet-50 as a main network.
The specific flow is as follows:
s201, visible light model pedestrian images X in each training batch are processed vis And near infrared modality pedestrian image X ir Inputting into a feature extraction network with a backbone network of ResNet-50 double flow, and extracting X vis The branch of the feature is called a first branch, and the visible light image feature is acquired through the first branch; extracting X ir The branch of the feature is called a second branch, through which near infrared image features are acquired.
The two branches do not share the parameters of the first convolution block of ResNet-50 to extract modality private features, while the remaining residual blocks share parameters to extract modality common features.
S202, a pedestrian profile (X) vis2c ,X ir2c ) Feeding an additional feature extraction branch extraction phase called the third branchProfile features to be applied.
Step 3, acquiring contour guiding features according to the contour features, the visible light image features and the near infrared image features; specifically, in each residual error stage of ResNet-50 network feature extraction, the image features extracted by the first branch and the second branch are fused and weighted to the corresponding contour features extracted by the third branch, so as to obtain contour guiding features.
The specific flow is as follows:
s301, the backbone networks of the first branch, the second branch and the third branch are ResNet-50, so that the feature sizes extracted in the residual error stage are the same after each visible light mode pedestrian image, the near infrared mode pedestrian image and the pedestrian profile pass through each convolution layer (ResNet-50 comprises a convolution operation and four subsequent residual error stages). The images of the three branches are in ResNet-50 four residual stages, and the visible light image characteristics and the near infrared image characteristics of the single pedestrian extracted by the first branch and the second branch in each residual stage are respectively defined asAnd->Defining the contour features extracted by the third branch at each residual stage as +.>
S302, extracting visible light image features F from the first branch by the features generated in each residual stage vis And a second branch of extracted near infrared image features F ir The corresponding contour feature F is extracted by weighting to the third branch through a multiplication and coordination method ir2c And F vis2c The above formula is expressed as follows:
wherein, the liquid crystal display device comprises a liquid crystal display device,representing the addition of the corresponding elements>Representing the multiplication of the corresponding elements, conv represents a convolution operation with a convolution kernel of 3 x 3 size.
The contour features smooth contour edges through convolution with the size of 3 x 3, and then perform element multiplication with the image features to obtain primarily enhanced features; and obtaining a final fusion result by element addition operation of the initially enhanced features, the original image features and the original contour features.
In this embodiment, the concept of adding and multiplying and residual is used to fuse the image features and the contour features.
And step 4, defining a characteristic energy function, and updating the contour guiding characteristics.
Specifically, an energy function is defined for the contour guiding feature obtained by fusing each residual stage in the step 3, and the feature is weighted again to highlight the representing capability of the contour information in the contour guiding feature. The characteristic energy function is expressed as follows:
wherein:
F∈R H×W×C representing contour guidance features after each residual phase fusionContour guidance feature of a single pedestrian image, HW, C represent the length, width dimensions and channel number of the profile-guiding feature, respectively; f (F) ijk Is the value of the feature vector F at the (i, j, k) position, and the corresponding super-parameters alpha, beta and gamma are set to 5, 0.0001 and 0.5, mu and phi, respectively 2 Calculating the contour guiding feature F epsilon R by means of mean square error H×W×C Is distinguished if one neuron is very discrete. For the fused contour guidance feature, the neuron values at the contour position are significantly different from those of the neuron values not at the contour position, so that a larger energy function E is given.
Based on the energy function, the formula is used: f=σ (E) ≡f, a new weight is given to the contour guide feature.
Wherein, as indicated by the letter Hadamard product, σ (. Cndot.) indicates that the relu function is active.
Neurons at the contour locations will thus be given higher weights and thus get more attention. The fusion feature weighted by the energy function strengthens the representation capability of the contour information, and is sent to the next residual stage of the third branch feature extraction network.
Step 5, horizontally dividing the visible light image features to obtain visible light image local features; performing horizontal segmentation on the near infrared image characteristics to obtain near infrared image local characteristics; and horizontally dividing the updated contour guiding features to obtain contour guiding local features.
Specifically, the features extracted by the feature extraction network are sent to a dual-granularity segmentation module, wherein the contour guiding features extracted by the third branch are horizontally segmented into 2 local features through coarse-granularity branches, and the image features extracted by the first branch and the second branch are horizontally segmented into 4 local features through fine-granularity branches, and the local features respectively represent the head, the body, the leg and the foot of a pedestrian.
Here, the visible light image feature and the near infrared image feature extracted by the first branch and the second branch are horizontally divided into 4 blocks of local features, so that they are called fine-granularity division; the contour guiding feature extracted by the third branch is horizontally divided into 2 blocks of local features, so that the contour guiding feature is called coarse granularity segmentation, and the two segmentation modes are called double granularity segmentation modules together.
Specifically, a PCB (Part-based Convolutional Baseline) horizontal segmentation method is adopted for carrying out feature horizontal segmentation.
The contour guiding features finally extracted by the third branch are input into coarse-granularity branches, and the contour guiding features are horizontally and two-equally divided in a high dimension to obtain the upper half body features of the pedestrians and the local features of the lower half bodies of the pedestrians, which are defined as the contour guiding local featuresThe visible light image features extracted from the first branch are input into fine-granularity branches, the image features are horizontally and quartered in a high dimension and cut into four parts of features of the head, the trunk, the legs and the feet of a pedestrian, and four parts of local features are defined as visible light image local features>The near infrared image features extracted by the second branch are input into fine-granularity branches, the image features are horizontally and quartered in a high dimension and cut into four parts of features of the head, the trunk, the legs and the feet of a pedestrian, and the four parts of the features are defined as near infrared image local features
And 5, carrying out identity prediction on the pedestrians according to the contour guiding local features, the visible light image local features and the near infrared image local features, and obtaining a pedestrian re-identification result. The method specifically comprises the following steps:
step 501, re-weighting the visible light image local features and the near infrared image local features using non-local attention to preserve high-order semantics.
Specifically, what will be obtainedAnd->Re-weighting by non-local attentionAnd->The flow is as follows:
to be obtainedAnd->The three-by-one convolution operations u (), v (), z (),
calculating the similarity between local features by using the self-similarity to obtain a local feature similarity matrix:
wherein, the liquid crystal display device comprises a liquid crystal display device,representing the similarity between two local features, obtained by the convolution operations u (), v () followed by transpose multiplication of the local features:
finally, multiplying the similarity matrix by the local features obtained after the convolution operation z () completes the non-local attention re-weighting:
step 502, inputting the re-weighted visible light image local features and near infrared image local features into a global pooling layer to obtain visible light image local feature vectors and near infrared image local feature vectors; inputting the contour guiding local feature into a global pooling layer to obtain a contour guiding local feature vector.
In particular, guiding contours to local featuresContour-guided local feature vector is obtained by a global pooling layer>Will be re-weightedAnd->Obtaining the local feature vector of the visible light image through a global pooling layer>And near infrared image local feature vector->
Step 503, local feature vector of visible light imageNear infrared image local feature vector->And contour-guided local feature vector->And inputting the full connection layer to obtain a final pedestrian re-identification result, and optimizing by using an identity loss function and a triplet loss function.
The identity loss function is as follows:
wherein C represents the number of identities of pedestrians in the training set, and p i Is the predicted value of the model to the identity i of the pedestrian, q i Is the probability of the class and α is a constant.
The triplet loss function is as follows:
wherein J is a margin hyper-parameter, D (F i ,F j ) Representing local features F i And F is equal to j Euclidean distance between [ z ]] + Represents max (Z, 0), Z denotes
Next, to verify the advancement of the method described in this embodiment, experiments were performed based on RegDB dataset and on SYSU-MM01 dataset.
Experiment platform: all experiments were performed on the server of the Titan V GPU, experiments were performed on PyCharm software using the Python programming language, and the deep learning framework used for programming was pytorch.
Experiment 1: introduction to data set
The SYSU-MM01 dataset contains 491 images of different pedestrians taken by four visible light cameras and two near infrared cameras in indoor and outdoor environments. Each pedestrian is photographed by at least one visible light camera and one near infrared camera. Wherein the training set contains 19,659 visible pedestrian images and 12,792 near infrared pedestrian images for 395 pedestrian identities. While the test set contains 96 pedestrian identities, of which 3,803 near infrared pedestrian images constitute the query set. The set of bellery is determined by test patterns including a full search pattern and an indoor search pattern. In the global search mode, all pedestrian images photographed by the visible light phase are used as a gamma set. In the indoor search mode, only images captured by two indoor visible light cameras are used as a gamma set.
The RegDB dataset is a small dataset containing both visible and infrared modality pedestrian images, the dataset containing 412 pedestrian IDs (identity tags) in total, wherein each ID contains 10 visible images and 10 infrared images; in the experiment, a visible light image and an infrared image are respectively used as queries, and meanwhile, a picture in another mode is used as a gap for the experiment; the two cases correspond to the visible-thermal and thermal-visible respectively; during the experiment, 206 IDs were randomly assigned for training and the remaining 206 IDs were used for testing.
Experiment 2: experimental details implementation
In order to compare the effectiveness of the present embodiment, the effectiveness of the SYSU-MM01 and RegDB datasets is improved compared with the current mainstream method, the comparison indexes are rank-k and mAp, and the comparison results are shown in FIGS. 3-4.
Example two
The embodiment discloses a cross-mode pedestrian re-identification system based on contour map guidance, which comprises:
a profile generation module configured to: a visible light mode pedestrian image and a near infrared mode pedestrian image are obtained, and a pedestrian contour map is generated according to the visible light mode pedestrian image and the near infrared mode pedestrian image;
a feature fusion module configured to: extracting visible light image features and near infrared image features according to the visible light mode pedestrian images and the near infrared mode pedestrian images; acquiring contour guiding features by a multiplication and coordination method according to the pedestrian contour map, the visible light image features and the near infrared image features; defining a feature energy function, and updating the contour guiding feature;
a pedestrian re-identification module configured to: performing horizontal segmentation on the visible light image characteristics to obtain visible light image local characteristics; performing horizontal segmentation on the near infrared image characteristics to obtain near infrared image local characteristics; horizontally dividing the updated contour guiding features to obtain contour guiding local features; and carrying out identity prediction on the pedestrians according to the contour guiding local features, the visible light image local features and the near infrared image local features, and obtaining a pedestrian re-identification result.
It should be noted that, the profile generating module, the feature fusion module and the pedestrian re-recognition module correspond to the steps in the first embodiment, and the modules are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure in the first embodiment. It should be noted that the modules described above may be implemented as part of a system in a computer system, such as a set of computer-executable instructions.
Example III
The third embodiment of the invention provides an electronic device, which comprises a memory, a processor and computer instructions stored on the memory and running on the processor, wherein the steps of the cross-mode pedestrian re-identification method based on contour map guidance are completed when the computer instructions are run by the processor.
Example IV
The fourth embodiment of the invention provides a computer readable storage medium for storing computer instructions, which when executed by a processor, complete the steps of the cross-mode pedestrian re-recognition method based on contour map guidance.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing embodiments are directed to various embodiments, and details of one embodiment may be found in the related description of another embodiment.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. The cross-mode pedestrian re-identification method based on contour map guidance is characterized by comprising the following steps of:
a visible light mode pedestrian image and a near infrared mode pedestrian image are obtained, and a pedestrian contour map is generated according to the visible light mode pedestrian image and the near infrared mode pedestrian image;
extracting visible light image features and near infrared image features according to the visible light mode pedestrian images and the near infrared mode pedestrian images;
acquiring contour guiding features by a multiplication and coordination method according to the pedestrian contour map, the visible light image features and the near infrared image features; defining a feature energy function, and updating the contour guiding feature;
performing horizontal segmentation on the visible light image characteristics to obtain visible light image local characteristics; performing horizontal segmentation on the near infrared image characteristics to obtain near infrared image local characteristics; horizontally dividing the updated contour guiding features to obtain contour guiding local features;
and carrying out identity prediction on the pedestrians according to the contour guiding local features, the visible light image local features and the near infrared image local features, and obtaining a pedestrian re-identification result.
2. The contour map-guided cross-modality pedestrian re-recognition method of claim 1, wherein the generating a pedestrian contour map from a visible light modality pedestrian image and a near infrared modality pedestrian image comprises:
inputting the visible light model pedestrian image into a contour detector to obtain a visible light model pedestrian contour map;
inputting the near-infrared mode pedestrian image into a contour detector to obtain a near-infrared mode pedestrian contour map;
and splicing the visible light mode pedestrian profile and the corresponding near infrared mode pedestrian profile to obtain a pedestrian profile.
3. The contour map-guided cross-modality pedestrian re-recognition method of claim 1, wherein the extracting visible light image features and near infrared image features from the visible light modality pedestrian image and the near infrared modality pedestrian image comprises:
inputting the visible light mode pedestrian image and the near infrared mode pedestrian image into a double-flow feature extraction network;
the first branch of the double-flow feature extraction network extracts visible light mode pedestrian images to obtain visible light image features; the second branch of the double-flow feature extraction network extracts the near-infrared mode pedestrian image to acquire near-infrared image features;
the main network of the dual-flow characteristic extraction network is ResNet-50, the first branch and the second branch do not share the parameters of the first convolution block to extract the modal private characteristics, and the first branch and the second branch share the residual block parameters to extract the modal common characteristics.
4. The contour map-guided cross-modality pedestrian re-recognition method of claim 1, wherein the acquiring contour guide features from the pedestrian contour map, the visible light image features, and the near infrared image features comprises:
inputting the pedestrian profile into a third branch for processing to obtain profile characteristics, wherein a backbone network of the third branch is ResNet-50;
and weighting the visible light image features and the near infrared image features to corresponding contour features through a multiplication and coordination method to obtain contour guiding features.
5. The contour map guiding-based cross-modal pedestrian re-recognition method as claimed in claim 1, wherein the defining a feature energy function updates the contour guiding feature specifically as follows:
defining a characteristic energy function, determining the weight of the contour guiding feature according to the characteristic energy function and the neuron discrete degree, and updating the contour guiding feature according to the weight.
6. The contour map-guided cross-modality pedestrian re-recognition method of claim 1, wherein the characteristic energy function is expressed as:
wherein F is a feature vector, alpha, beta and gamma are super parameters, mu and phi 2 Guiding the degree of discretization of the neurons in the features for the contours.
7. The contour map guiding-based cross-modal pedestrian re-recognition method as set forth in claim 1, wherein the obtaining the pedestrian re-recognition result according to the contour guiding local feature, the visible light image local feature and the near infrared image local feature to perform identity prediction on the pedestrian includes:
re-weighting the local features of the visible light image and the near infrared image through non-local attention, inputting the re-weighted local features of the visible light image and the re-weighted local features of the near infrared image into a global pooling layer, and obtaining local feature vectors of the visible light image and local feature vectors of the near infrared image;
inputting the contour guiding local feature into a global pooling layer to obtain a contour guiding local feature vector;
and inputting the contour guiding local feature vector, the visible light image local feature vector and the near infrared image local feature vector into a full-connection layer for identity prediction, and obtaining a pedestrian re-recognition result.
8. Cross-modal pedestrian re-recognition system based on contour map guidance is characterized by comprising:
a profile generation module configured to: a visible light mode pedestrian image and a near infrared mode pedestrian image are obtained, and a pedestrian contour map is generated according to the visible light mode pedestrian image and the near infrared mode pedestrian image;
a feature fusion module configured to: extracting visible light image features and near infrared image features according to the visible light mode pedestrian images and the near infrared mode pedestrian images; acquiring contour guiding features by a multiplication and coordination method according to the pedestrian contour map, the visible light image features and the near infrared image features; defining a feature energy function, and updating the contour guiding feature;
a pedestrian re-identification module configured to: performing horizontal segmentation on the visible light image characteristics to obtain visible light image local characteristics; performing horizontal segmentation on the near infrared image characteristics to obtain near infrared image local characteristics; horizontally dividing the updated contour guiding features to obtain contour guiding local features; and carrying out identity prediction on the pedestrians according to the contour guiding local features, the visible light image local features and the near infrared image local features, and obtaining a pedestrian re-identification result.
9. An electronic device comprising a memory and a processor and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the steps of any of claims 1-7.
10. A computer readable storage medium storing computer instructions which, when executed by a processor, perform the steps of any of claims 1-7.
CN202310721195.8A 2023-06-16 2023-06-16 Cross-mode pedestrian re-identification method and system based on contour map guidance Pending CN116994332A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310721195.8A CN116994332A (en) 2023-06-16 2023-06-16 Cross-mode pedestrian re-identification method and system based on contour map guidance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310721195.8A CN116994332A (en) 2023-06-16 2023-06-16 Cross-mode pedestrian re-identification method and system based on contour map guidance

Publications (1)

Publication Number Publication Date
CN116994332A true CN116994332A (en) 2023-11-03

Family

ID=88532890

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310721195.8A Pending CN116994332A (en) 2023-06-16 2023-06-16 Cross-mode pedestrian re-identification method and system based on contour map guidance

Country Status (1)

Country Link
CN (1) CN116994332A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220309794A1 (en) * 2021-03-26 2022-09-29 Yandex Self Driving Group Llc Methods and electronic devices for detecting objects in surroundings of a self-driving car

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220309794A1 (en) * 2021-03-26 2022-09-29 Yandex Self Driving Group Llc Methods and electronic devices for detecting objects in surroundings of a self-driving car

Similar Documents

Publication Publication Date Title
Xie et al. Multilevel cloud detection in remote sensing images based on deep learning
Hui et al. Effective building extraction from high-resolution remote sensing images with multitask driven deep neural network
Wan et al. Region-aware reflection removal with unified content and gradient priors
CN109815843B (en) Image processing method and related product
CN112381075B (en) Method and system for carrying out face recognition under specific scene of machine room
CN109508663A (en) A kind of pedestrian's recognition methods again based on multi-level supervision network
CN110263603A (en) Face identification method and device based on center loss and residual error visual simulation network
CN114170516A (en) Vehicle weight recognition method and device based on roadside perception and electronic equipment
CN116994332A (en) Cross-mode pedestrian re-identification method and system based on contour map guidance
CN111291612A (en) Pedestrian re-identification method and device based on multi-person multi-camera tracking
Zhou et al. Position-aware relation learning for RGB-thermal salient object detection
CN111291695B (en) Training method and recognition method for recognition model of personnel illegal behaviors and computer equipment
CN111353385B (en) Pedestrian re-identification method and device based on mask alignment and attention mechanism
US11055572B2 (en) System and method of training an appearance signature extractor
Yu et al. Shallow detail and semantic segmentation combined bilateral network model for lane detection
CN112633089B (en) Video pedestrian re-identification method, intelligent terminal and storage medium
Ma et al. MSFNET: multi-stage fusion network for semantic segmentation of fine-resolution remote sensing data
CN113449739A (en) Data processing method, device and system
CN108694347B (en) Image processing method and device
CN112084874A (en) Object detection method and device and terminal equipment
Hadjkacem et al. Multi-shot human re-identification using a fast multi-scale video covariance descriptor
Talebi et al. Nonparametric scene parsing in the images of buildings
Han et al. A Two‐Branch Pedestrian Detection Method for Small and Blurred Target
CN115240121B (en) Joint modeling method and device for enhancing local features of pedestrians
CN117351518B (en) Method and system for identifying unsupervised cross-modal pedestrian based on level difference

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination