CN115497121A - Cross-modal pedestrian re-identification method based on space-time characteristics and different center loss - Google Patents

Cross-modal pedestrian re-identification method based on space-time characteristics and different center loss Download PDF

Info

Publication number
CN115497121A
CN115497121A CN202211169495.1A CN202211169495A CN115497121A CN 115497121 A CN115497121 A CN 115497121A CN 202211169495 A CN202211169495 A CN 202211169495A CN 115497121 A CN115497121 A CN 115497121A
Authority
CN
China
Prior art keywords
pedestrian
cross
loss
modal
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211169495.1A
Other languages
Chinese (zh)
Inventor
张强
苏鹏
刘瑞
周东生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University
Original Assignee
Dalian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University filed Critical Dalian University
Priority to CN202211169495.1A priority Critical patent/CN115497121A/en
Publication of CN115497121A publication Critical patent/CN115497121A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a cross-modal pedestrian re-recognition method based on space-time characteristics and different center losses. The invention simultaneously extracts the channel characteristics and the spatial characteristics, so that the representation of the pedestrian is more identifiable, and the characteristic distribution of the same pedestrian characteristic is more compact by using the different-center sample loss.

Description

Cross-modal pedestrian re-identification method based on space-time characteristics and different center loss
Technical Field
The invention belongs to the field of image retrieval algorithms, relates to a pedestrian re-identification technology based on cross-modal images, and particularly relates to a cross-modal pedestrian re-identification method based on space-time characteristics and different center loss.
Background
The pedestrian re-identification is an important branch in the field of image retrieval, and has wide application in the aspects of ensuring social public safety and the like. With the appearance of more and more monitoring devices integrated with infrared acquisition devices, cross-mode pedestrian re-identification by using visible light images and infrared images becomes an important research direction.
Compared with the traditional single-mode pedestrian re-identification, the cross-mode pedestrian re-identification is more challenging. In addition to factors such as view angle change, occlusion, posture change, etc., cross-modal pedestrian re-identification also faces the challenge of huge modal differences. In order to reduce the influence caused by the mode difference, some methods adopt a mode conversion mode to convert the infrared image and the visible light image into the same mode. Other methods use a feature learning approach to align the image features of two modalities in a uniform feature space.
The existing cross-modal pedestrian re-identification method utilizes a single-flow network to connect a mode alignment module to extract image detail characteristics, and then utilizes cross entropy loss and central cluster loss to train network parameters, so as to achieve the purpose of cross-modal pedestrian re-identification. However, the center cluster loss, while better handling the differences between modalities, does not constrain the intra-modality differences tightly; the modality alignment module extracts detailed features from the channel level, but there are some key detailed features available at the spatial level. If this information can be utilized simultaneously, the performance of the cross-modal pedestrian re-identification model can be further improved.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention provides a cross-modal pedestrian re-identification method based on space-time characteristics and different center loss, channel characteristics and space characteristics are extracted simultaneously, so that the representation of pedestrians is more identifiability, and different center sample loss is used, so that the characteristic distribution of the same pedestrian characteristic is more compact.
The technical scheme adopted by the invention for solving the technical problem is as follows: a cross-modal pedestrian re-recognition method based on space-time characteristics and different center losses is characterized in that a cross-modal pedestrian re-recognition model extracts pedestrian image characteristics from space, channels and overall dimensions, and a cross-modal pedestrian re-recognition model is trained by adopting a loss function consisting of identity loss, different center sample loss, center cluster loss and total characteristic loss, so that cross-modal pedestrian re-recognition of a pedestrian image is realized.
As a further embodiment of the invention, the method specifically comprises the following steps:
s1: constructing a cross-modal pedestrian re-identification model, wherein the cross-modal pedestrian re-identification model comprises a modal alignment module and a spatial feature extraction module which are connected in parallel;
s2: performing data enhancement on the pedestrian image input into the cross-mode pedestrian re-identification model;
s3: extracting global features, spatial local features and channel local features of the pedestrian image by using the cross-modal pedestrian re-identification model;
s4: calculating the different-center sample loss and the identity loss of the extracted spatial local features, and calculating the center cluster loss and the identity loss of the extracted channel local features;
s5: splicing the global features, the spatial local features and the channel local features to be used as total features, and calculating total feature loss;
s6: adding all losses in the steps S4 and S5 to form a loss function of the whole cross-modal pedestrian re-recognition model, and training and optimizing parameters in the cross-modal pedestrian re-recognition model according to the loss function;
s7: after the training of the cross-modal pedestrian re-recognition model is completed, inputting the pedestrian image to be inquired and the test concentrated image into the cross-modal pedestrian re-recognition model, calculating the similarity between the pedestrian image and the test concentrated image, and returning the M values with the highest similarity, namely the result of the cross-modal pedestrian re-recognition.
As a further embodiment of the present invention, the data enhancement on the pedestrian image in step S2 specifically includes: the visible light images and the infrared images are mixed to form a plurality of batches.
As a further embodiment of the present invention, the off-center sample loss of the extracted spatial local feature is calculated in step S4, and the calculation formula is as follows:
Figure BDA0003859124290000021
wherein L is HCS Represents the off-center sample loss, ρ is the margin parameter, δ is a balance coefficient, [ x ]] + = max (x, 0) standard hinge loss, | | x a -x b || 2 Denotes x a And x b A second norm therebetween; p represents the total number of different classes in a mini-batch,
Figure BDA0003859124290000022
and
Figure BDA0003859124290000023
respectively representing the characteristic representation of the jth visible light image and the jth infrared image of the class i,
Figure BDA0003859124290000024
and
Figure BDA0003859124290000025
respectively representing the central characteristics of a visible light mode and an infrared mode class i in a mini-batch;
Figure BDA0003859124290000026
and
Figure BDA0003859124290000027
the average value of all samples of the class i in each mode is obtained, and the calculation formula is as follows:
Figure BDA0003859124290000031
Figure BDA0003859124290000032
wherein K represents that the number of visible light images and infrared images in one mini-batch is K.
The beneficial effects of the invention include: the method is characterized in that a spatial feature extraction module is connected in parallel on the basis of the original mode alignment module, and the channel features and the spatial features are extracted at the same time, so that the representation of pedestrians is more identifiable; meanwhile, the different-center sample loss is used for simultaneously processing the cross-modal and intra-modal changes, so that the feature distribution of the same pedestrian feature is more compact. The method has reference significance for exploring the feature representation of the pedestrian and improving the accuracy of pedestrian re-identification.
Drawings
FIG. 1 is a diagram of a model framework of the present invention;
FIG. 2 is an illustration of off-center sample loss according to the present invention;
FIG. 3 is a graph of the effect of off-center sample loss versus t-SNE.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Furthermore, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Example 1
A cross-modal pedestrian re-identification method based on space-time characteristics and different center losses is characterized in that a space characteristic extraction structure is connected in parallel to an original cross-modal pedestrian re-identification model to construct a new cross-modal pedestrian re-identification model, and FIG. 1 is a model framework diagram of the invention.
First, data enhancement is performed on the infrared image and the visible light image in the data set. They are then fed into a single stream network consisting of Resnet50 and MPM, resulting in an embedded profile X. Next, using the three branches, spatial features, channel features, and global feature representations of the embedded feature map X are extracted, respectively. And finally, respectively using the identity loss and the related cross-mode triple loss for the three feature representations, and optimizing the parameters of the network to obtain a cross-mode pedestrian re-identification model.
The method specifically comprises the following technical steps:
1. and constructing a network model and initializing network parameters.
The main network of the invention is a single-flow network composed of Resnet50 and MAM, and the main purpose of the network is to extract pedestrian characteristics of a coarse granularity
Figure BDA0003859124290000041
Wherein, C, H, W represent the channel number, length and width of the characteristic diagram respectively. Resnet50 initializes the network parameters with the pre-training parameters on ImageNet and changes stride to 1 to remove the last volume block of the last downsampled layer, i.e., layer 4. MAM is essentially an example normalized structure (specifically, see: qiang Wu, pingyang Dai, jie Chen, chia-Wen Lin, yongjian Wu, feiyue Huang, bineng Zhong, and Rongrong Ji. Discovery cross-modulation numbers for visual-induced person-identification. In Proceedings of the IEEE/CVF Conference company Vision and Pattern Recognition, pages4330-4339,2021. Nether) that is embedded behind the layer3 and layer4 layers of Res 50. Then, there are three branches behind the backbone network, one is a spatial feature extraction module branch, one is a channel feature extraction module branch, and the last is a global feature extraction branch.
And (3) spatial feature branching, namely averagely dividing the pedestrian features X with coarse granularity into p small blocks, and extracting spatial local features from each small block by operations of pooling, 1 × 1 convolution, remodeling and the like. Specifically, it can be expressed as:
S i =Reshape(Conv(pool(X i )))
wherein,
Figure BDA0003859124290000042
a feature map for each patch is shown. And finally, connecting all the spatial local features to be used as the final feature representation of the spatial feature extraction module.
The channel feature branch is mainly composed of a mode alignment module and a 1 × 1 convolution down-sampling layer.
And the global feature extraction branch performs pooling and reshaping operation on the coarse-grained pedestrian features X. And finally, connecting the pedestrian characteristics of the three branches to be used as a representation of the final pedestrian characteristic.
2. Image pre-processing
Images in the data set were randomly cropped to 288 × 144 images, and then the diversity of the data was increased using random horizontal inversion and random graying. For the visible light image, a local random channel enhancement is additionally used, namely a region in the visible light image is randomly selected, and the pixel value of the region is randomly replaced by one of a gray value, a pixel value of an R channel, a pixel value of a G channel and a pixel value of a B channel. This is done to force the model to be less sensitive to color changes, facilitating its learning of color-independent features.
3. Loss function
The invention uses various loss functions in the process of training the network model. Specifically including off-center sample loss, center cluster loss, identity loss, and other losses.
The off-center sample loss is an improvement over the conventional triple loss, which optimizes both the inter-class center distance and the sample-to-class center distance. As shown in fig. 2, the boxes and circles represent different classes, the open represents each sample feature, and the filled represents the center feature calculated from the sample. In a mini-batch, on one hand, the different-center sample loss enables the center features of the same type and different modes to be close to each other, and the center features of different types to be far away from each other; on the other hand, each sample is made as close as possible to the center of the class. Its formula is as follows:
Figure BDA0003859124290000051
where ρ is the margin parameter and δ is a balance coefficient. [ x ]] + = max (X, 0) standard hinge loss, | X a -x b || 2 Represents x a And x b Two norms in between. P represents the total number of different classes in a mini-batch.
Figure BDA0003859124290000052
And
Figure BDA0003859124290000053
and respectively representing the characteristics of the jth visible light image and the jth infrared image of the class i.
Figure BDA0003859124290000054
And
Figure BDA0003859124290000055
respectively representing the central characteristics of the visible light mode and the infrared mode class i in a mini-batch, and the central characteristics are obtained by averaging samples of all classes i in respective modes, and the calculation formula is as follows:
Figure BDA0003859124290000056
Figure BDA0003859124290000057
and K represents that the number of the visible light pictures and the number of the infrared pictures in one mini-batch are both K. The off-center sample loss makes the sample distribution more compact, and its effect compared to the conventional triple loss is shown in fig. 3.
The identity loss is essentially cross entropy loss, cross-modal pedestrian re-identification is regarded as a multi-classification problem, and each identity of a pedestrian is regarded as a class.
The central cluster losses and other losses are those used in the literature referred to above. The total loss used to remove the center cluster loss is denoted as the other loss.
For spatial feature extraction module branches, the invention uses identity loss and off-center sample loss; for the channel feature extraction module branches, identity loss and central cluster loss are used; for the global feature extraction branch, other penalties are used.
According to the method, the local features of the pedestrians are extracted from the space angle, the channel angle and the global angle according to the requirements of the cross-modal pedestrian re-identification task, the feature representation of the pedestrians is enhanced, the distances among different modal features are shortened, and the matching accuracy of the cross-modal pedestrian re-identification task is improved.
It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications derived therefrom are intended to be within the scope of the invention.

Claims (4)

1. A cross-modal pedestrian re-recognition method based on space-time characteristics and different center losses is characterized in that a cross-modal pedestrian re-recognition model extracts pedestrian image characteristics from space, channels and overall dimensions, and a loss function consisting of identity loss, different center sample loss, center cluster loss and total characteristic loss is adopted to train the cross-modal pedestrian re-recognition model, so that cross-modal pedestrian re-recognition of a pedestrian image is achieved.
2. The cross-modal pedestrian re-identification method based on spatiotemporal features and different center losses according to claim 1, characterized by comprising the following steps:
s1: constructing a cross-modal pedestrian re-identification model, wherein the cross-modal pedestrian re-identification model comprises a modal alignment module and a spatial feature extraction module which are connected in parallel;
s2: performing data enhancement on the pedestrian image input with the cross-mode pedestrian re-identification model;
s3: extracting global features, spatial local features and channel local features of the pedestrian image by using the cross-modal pedestrian re-identification model;
s4: calculating the heterocentric sample loss and the identity loss of the extracted spatial local features, and calculating the central cluster loss and the identity loss of the extracted channel local features;
s5: splicing the global features, the spatial local features and the channel local features to be used as total features, and calculating total feature loss;
s6: adding all losses in the steps S4 and S5 to form a loss function of the whole cross-modal pedestrian re-recognition model, and training and optimizing parameters in the cross-modal pedestrian re-recognition model according to the loss function;
s7: after the training of the cross-modal pedestrian re-recognition model is completed, inputting the pedestrian image to be inquired and the test concentrated image into the cross-modal pedestrian re-recognition model, calculating the similarity between the pedestrian image and the test concentrated image, and returning the M values with the highest similarity, namely the result of the cross-modal pedestrian re-recognition.
3. The cross-modal pedestrian re-identification method based on spatio-temporal features and different center loss according to claim 2, wherein the data enhancement is performed on the pedestrian image in step S2, specifically: the visible light images and the infrared images are mixed to form a plurality of batches.
4. The cross-modal pedestrian re-identification method based on spatio-temporal features and different-center loss according to claim 3, wherein the different-center sample loss of the extracted spatial local features is calculated in step S4, and the calculation formula is as follows:
Figure FDA0003859124280000021
wherein L is HCS Representing iso-centric samplesLoss, ρ is the margin parameter, δ is a balance coefficient, [ x ]] + = max (x, 0) standard hinge loss, | x a -x b || 2 Denotes x a And x b A second norm therebetween; p represents the total number of different classes in a mini-batch,
Figure FDA0003859124280000022
and
Figure FDA0003859124280000023
respectively representing the characteristics of the jth visible light image and the jth infrared image of the class i,
Figure FDA0003859124280000024
and
Figure FDA0003859124280000025
respectively representing the central characteristics of a visible light mode and an infrared mode class i in a mini-batch;
Figure FDA0003859124280000026
and
Figure FDA0003859124280000027
the average value of all samples of the class i in each mode is obtained, and the calculation formula is as follows:
Figure FDA0003859124280000028
Figure FDA0003859124280000029
wherein K represents that the number of visible light images and infrared images in one mini-batch is K.
CN202211169495.1A 2022-09-22 2022-09-22 Cross-modal pedestrian re-identification method based on space-time characteristics and different center loss Pending CN115497121A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211169495.1A CN115497121A (en) 2022-09-22 2022-09-22 Cross-modal pedestrian re-identification method based on space-time characteristics and different center loss

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211169495.1A CN115497121A (en) 2022-09-22 2022-09-22 Cross-modal pedestrian re-identification method based on space-time characteristics and different center loss

Publications (1)

Publication Number Publication Date
CN115497121A true CN115497121A (en) 2022-12-20

Family

ID=84469865

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211169495.1A Pending CN115497121A (en) 2022-09-22 2022-09-22 Cross-modal pedestrian re-identification method based on space-time characteristics and different center loss

Country Status (1)

Country Link
CN (1) CN115497121A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117765572A (en) * 2024-02-22 2024-03-26 东北大学 Pedestrian re-recognition method based on federal learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117765572A (en) * 2024-02-22 2024-03-26 东北大学 Pedestrian re-recognition method based on federal learning
CN117765572B (en) * 2024-02-22 2024-08-09 东北大学 Pedestrian re-recognition method based on federal learning

Similar Documents

Publication Publication Date Title
WO2020239026A1 (en) Image processing method and device, method for training neural network, and storage medium
CN112926396B (en) Action identification method based on double-current convolution attention
WO2021082480A1 (en) Image classification method and related device
CN108734653B (en) Image style conversion method and device
CN113343826A (en) Training method of human face living body detection model, human face living body detection method and device
CN113343937B (en) Lip language identification method based on deep convolution and attention mechanism
CN112200115B (en) Face recognition training method, recognition method, device, equipment and storage medium
KR102592668B1 (en) Facial recognition method and device
CN115497121A (en) Cross-modal pedestrian re-identification method based on space-time characteristics and different center loss
CN112084895B (en) Pedestrian re-identification method based on deep learning
CN113537254A (en) Image feature extraction method and device, electronic equipment and readable storage medium
CN114863229A (en) Image classification method and training method and device of image classification model
CN114943937A (en) Pedestrian re-identification method and device, storage medium and electronic equipment
CN110414586B (en) Anti-counterfeit label counterfeit checking method, device, equipment and medium based on deep learning
CN115965864A (en) Lightweight attention mechanism network for crop disease identification
CN117711043A (en) Identity login authentication method, system, equipment and medium based on shielding face
CN115147932A (en) Static gesture recognition method and system based on deep learning
CN114120413A (en) Model training method, image synthesis method, device, equipment and program product
CN112836605B (en) Near-infrared and visible light cross-modal face recognition method based on modal augmentation
CN114332993A (en) Face recognition method and device, electronic equipment and computer readable storage medium
CN114359993A (en) Model training method, face recognition device, face recognition equipment, face recognition medium and product
CN117173744A (en) Cross-mode pedestrian re-identification method
CN115641643A (en) Gait recognition model training method, gait recognition device and gait recognition equipment
CN115019057A (en) Image feature extraction model determining method and device and image identification method and device
CN114241573A (en) Facial micro-expression recognition method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination