CN115497121A - Cross-modal pedestrian re-identification method based on space-time characteristics and different center loss - Google Patents
Cross-modal pedestrian re-identification method based on space-time characteristics and different center loss Download PDFInfo
- Publication number
- CN115497121A CN115497121A CN202211169495.1A CN202211169495A CN115497121A CN 115497121 A CN115497121 A CN 115497121A CN 202211169495 A CN202211169495 A CN 202211169495A CN 115497121 A CN115497121 A CN 115497121A
- Authority
- CN
- China
- Prior art keywords
- pedestrian
- cross
- loss
- modal
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 20
- 239000000284 extract Substances 0.000 claims abstract description 4
- 238000000605 extraction Methods 0.000 claims description 12
- 238000012549 training Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000012360 testing method Methods 0.000 claims description 4
- 230000006870 function Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012806 monitoring device Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000007634 remodeling Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/42—Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a cross-modal pedestrian re-recognition method based on space-time characteristics and different center losses. The invention simultaneously extracts the channel characteristics and the spatial characteristics, so that the representation of the pedestrian is more identifiable, and the characteristic distribution of the same pedestrian characteristic is more compact by using the different-center sample loss.
Description
Technical Field
The invention belongs to the field of image retrieval algorithms, relates to a pedestrian re-identification technology based on cross-modal images, and particularly relates to a cross-modal pedestrian re-identification method based on space-time characteristics and different center loss.
Background
The pedestrian re-identification is an important branch in the field of image retrieval, and has wide application in the aspects of ensuring social public safety and the like. With the appearance of more and more monitoring devices integrated with infrared acquisition devices, cross-mode pedestrian re-identification by using visible light images and infrared images becomes an important research direction.
Compared with the traditional single-mode pedestrian re-identification, the cross-mode pedestrian re-identification is more challenging. In addition to factors such as view angle change, occlusion, posture change, etc., cross-modal pedestrian re-identification also faces the challenge of huge modal differences. In order to reduce the influence caused by the mode difference, some methods adopt a mode conversion mode to convert the infrared image and the visible light image into the same mode. Other methods use a feature learning approach to align the image features of two modalities in a uniform feature space.
The existing cross-modal pedestrian re-identification method utilizes a single-flow network to connect a mode alignment module to extract image detail characteristics, and then utilizes cross entropy loss and central cluster loss to train network parameters, so as to achieve the purpose of cross-modal pedestrian re-identification. However, the center cluster loss, while better handling the differences between modalities, does not constrain the intra-modality differences tightly; the modality alignment module extracts detailed features from the channel level, but there are some key detailed features available at the spatial level. If this information can be utilized simultaneously, the performance of the cross-modal pedestrian re-identification model can be further improved.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention provides a cross-modal pedestrian re-identification method based on space-time characteristics and different center loss, channel characteristics and space characteristics are extracted simultaneously, so that the representation of pedestrians is more identifiability, and different center sample loss is used, so that the characteristic distribution of the same pedestrian characteristic is more compact.
The technical scheme adopted by the invention for solving the technical problem is as follows: a cross-modal pedestrian re-recognition method based on space-time characteristics and different center losses is characterized in that a cross-modal pedestrian re-recognition model extracts pedestrian image characteristics from space, channels and overall dimensions, and a cross-modal pedestrian re-recognition model is trained by adopting a loss function consisting of identity loss, different center sample loss, center cluster loss and total characteristic loss, so that cross-modal pedestrian re-recognition of a pedestrian image is realized.
As a further embodiment of the invention, the method specifically comprises the following steps:
s1: constructing a cross-modal pedestrian re-identification model, wherein the cross-modal pedestrian re-identification model comprises a modal alignment module and a spatial feature extraction module which are connected in parallel;
s2: performing data enhancement on the pedestrian image input into the cross-mode pedestrian re-identification model;
s3: extracting global features, spatial local features and channel local features of the pedestrian image by using the cross-modal pedestrian re-identification model;
s4: calculating the different-center sample loss and the identity loss of the extracted spatial local features, and calculating the center cluster loss and the identity loss of the extracted channel local features;
s5: splicing the global features, the spatial local features and the channel local features to be used as total features, and calculating total feature loss;
s6: adding all losses in the steps S4 and S5 to form a loss function of the whole cross-modal pedestrian re-recognition model, and training and optimizing parameters in the cross-modal pedestrian re-recognition model according to the loss function;
s7: after the training of the cross-modal pedestrian re-recognition model is completed, inputting the pedestrian image to be inquired and the test concentrated image into the cross-modal pedestrian re-recognition model, calculating the similarity between the pedestrian image and the test concentrated image, and returning the M values with the highest similarity, namely the result of the cross-modal pedestrian re-recognition.
As a further embodiment of the present invention, the data enhancement on the pedestrian image in step S2 specifically includes: the visible light images and the infrared images are mixed to form a plurality of batches.
As a further embodiment of the present invention, the off-center sample loss of the extracted spatial local feature is calculated in step S4, and the calculation formula is as follows:
wherein L is HCS Represents the off-center sample loss, ρ is the margin parameter, δ is a balance coefficient, [ x ]] + = max (x, 0) standard hinge loss, | | x a -x b || 2 Denotes x a And x b A second norm therebetween; p represents the total number of different classes in a mini-batch,andrespectively representing the characteristic representation of the jth visible light image and the jth infrared image of the class i,andrespectively representing the central characteristics of a visible light mode and an infrared mode class i in a mini-batch;
andthe average value of all samples of the class i in each mode is obtained, and the calculation formula is as follows:
wherein K represents that the number of visible light images and infrared images in one mini-batch is K.
The beneficial effects of the invention include: the method is characterized in that a spatial feature extraction module is connected in parallel on the basis of the original mode alignment module, and the channel features and the spatial features are extracted at the same time, so that the representation of pedestrians is more identifiable; meanwhile, the different-center sample loss is used for simultaneously processing the cross-modal and intra-modal changes, so that the feature distribution of the same pedestrian feature is more compact. The method has reference significance for exploring the feature representation of the pedestrian and improving the accuracy of pedestrian re-identification.
Drawings
FIG. 1 is a diagram of a model framework of the present invention;
FIG. 2 is an illustration of off-center sample loss according to the present invention;
FIG. 3 is a graph of the effect of off-center sample loss versus t-SNE.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Furthermore, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Example 1
A cross-modal pedestrian re-identification method based on space-time characteristics and different center losses is characterized in that a space characteristic extraction structure is connected in parallel to an original cross-modal pedestrian re-identification model to construct a new cross-modal pedestrian re-identification model, and FIG. 1 is a model framework diagram of the invention.
First, data enhancement is performed on the infrared image and the visible light image in the data set. They are then fed into a single stream network consisting of Resnet50 and MPM, resulting in an embedded profile X. Next, using the three branches, spatial features, channel features, and global feature representations of the embedded feature map X are extracted, respectively. And finally, respectively using the identity loss and the related cross-mode triple loss for the three feature representations, and optimizing the parameters of the network to obtain a cross-mode pedestrian re-identification model.
The method specifically comprises the following technical steps:
1. and constructing a network model and initializing network parameters.
The main network of the invention is a single-flow network composed of Resnet50 and MAM, and the main purpose of the network is to extract pedestrian characteristics of a coarse granularityWherein, C, H, W represent the channel number, length and width of the characteristic diagram respectively. Resnet50 initializes the network parameters with the pre-training parameters on ImageNet and changes stride to 1 to remove the last volume block of the last downsampled layer, i.e., layer 4. MAM is essentially an example normalized structure (specifically, see: qiang Wu, pingyang Dai, jie Chen, chia-Wen Lin, yongjian Wu, feiyue Huang, bineng Zhong, and Rongrong Ji. Discovery cross-modulation numbers for visual-induced person-identification. In Proceedings of the IEEE/CVF Conference company Vision and Pattern Recognition, pages4330-4339,2021. Nether) that is embedded behind the layer3 and layer4 layers of Res 50. Then, there are three branches behind the backbone network, one is a spatial feature extraction module branch, one is a channel feature extraction module branch, and the last is a global feature extraction branch.
And (3) spatial feature branching, namely averagely dividing the pedestrian features X with coarse granularity into p small blocks, and extracting spatial local features from each small block by operations of pooling, 1 × 1 convolution, remodeling and the like. Specifically, it can be expressed as:
S i =Reshape(Conv(pool(X i )))
wherein,a feature map for each patch is shown. And finally, connecting all the spatial local features to be used as the final feature representation of the spatial feature extraction module.
The channel feature branch is mainly composed of a mode alignment module and a 1 × 1 convolution down-sampling layer.
And the global feature extraction branch performs pooling and reshaping operation on the coarse-grained pedestrian features X. And finally, connecting the pedestrian characteristics of the three branches to be used as a representation of the final pedestrian characteristic.
2. Image pre-processing
Images in the data set were randomly cropped to 288 × 144 images, and then the diversity of the data was increased using random horizontal inversion and random graying. For the visible light image, a local random channel enhancement is additionally used, namely a region in the visible light image is randomly selected, and the pixel value of the region is randomly replaced by one of a gray value, a pixel value of an R channel, a pixel value of a G channel and a pixel value of a B channel. This is done to force the model to be less sensitive to color changes, facilitating its learning of color-independent features.
3. Loss function
The invention uses various loss functions in the process of training the network model. Specifically including off-center sample loss, center cluster loss, identity loss, and other losses.
The off-center sample loss is an improvement over the conventional triple loss, which optimizes both the inter-class center distance and the sample-to-class center distance. As shown in fig. 2, the boxes and circles represent different classes, the open represents each sample feature, and the filled represents the center feature calculated from the sample. In a mini-batch, on one hand, the different-center sample loss enables the center features of the same type and different modes to be close to each other, and the center features of different types to be far away from each other; on the other hand, each sample is made as close as possible to the center of the class. Its formula is as follows:
where ρ is the margin parameter and δ is a balance coefficient. [ x ]] + = max (X, 0) standard hinge loss, | X a -x b || 2 Represents x a And x b Two norms in between. P represents the total number of different classes in a mini-batch.Andand respectively representing the characteristics of the jth visible light image and the jth infrared image of the class i.Andrespectively representing the central characteristics of the visible light mode and the infrared mode class i in a mini-batch, and the central characteristics are obtained by averaging samples of all classes i in respective modes, and the calculation formula is as follows:
and K represents that the number of the visible light pictures and the number of the infrared pictures in one mini-batch are both K. The off-center sample loss makes the sample distribution more compact, and its effect compared to the conventional triple loss is shown in fig. 3.
The identity loss is essentially cross entropy loss, cross-modal pedestrian re-identification is regarded as a multi-classification problem, and each identity of a pedestrian is regarded as a class.
The central cluster losses and other losses are those used in the literature referred to above. The total loss used to remove the center cluster loss is denoted as the other loss.
For spatial feature extraction module branches, the invention uses identity loss and off-center sample loss; for the channel feature extraction module branches, identity loss and central cluster loss are used; for the global feature extraction branch, other penalties are used.
According to the method, the local features of the pedestrians are extracted from the space angle, the channel angle and the global angle according to the requirements of the cross-modal pedestrian re-identification task, the feature representation of the pedestrians is enhanced, the distances among different modal features are shortened, and the matching accuracy of the cross-modal pedestrian re-identification task is improved.
It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications derived therefrom are intended to be within the scope of the invention.
Claims (4)
1. A cross-modal pedestrian re-recognition method based on space-time characteristics and different center losses is characterized in that a cross-modal pedestrian re-recognition model extracts pedestrian image characteristics from space, channels and overall dimensions, and a loss function consisting of identity loss, different center sample loss, center cluster loss and total characteristic loss is adopted to train the cross-modal pedestrian re-recognition model, so that cross-modal pedestrian re-recognition of a pedestrian image is achieved.
2. The cross-modal pedestrian re-identification method based on spatiotemporal features and different center losses according to claim 1, characterized by comprising the following steps:
s1: constructing a cross-modal pedestrian re-identification model, wherein the cross-modal pedestrian re-identification model comprises a modal alignment module and a spatial feature extraction module which are connected in parallel;
s2: performing data enhancement on the pedestrian image input with the cross-mode pedestrian re-identification model;
s3: extracting global features, spatial local features and channel local features of the pedestrian image by using the cross-modal pedestrian re-identification model;
s4: calculating the heterocentric sample loss and the identity loss of the extracted spatial local features, and calculating the central cluster loss and the identity loss of the extracted channel local features;
s5: splicing the global features, the spatial local features and the channel local features to be used as total features, and calculating total feature loss;
s6: adding all losses in the steps S4 and S5 to form a loss function of the whole cross-modal pedestrian re-recognition model, and training and optimizing parameters in the cross-modal pedestrian re-recognition model according to the loss function;
s7: after the training of the cross-modal pedestrian re-recognition model is completed, inputting the pedestrian image to be inquired and the test concentrated image into the cross-modal pedestrian re-recognition model, calculating the similarity between the pedestrian image and the test concentrated image, and returning the M values with the highest similarity, namely the result of the cross-modal pedestrian re-recognition.
3. The cross-modal pedestrian re-identification method based on spatio-temporal features and different center loss according to claim 2, wherein the data enhancement is performed on the pedestrian image in step S2, specifically: the visible light images and the infrared images are mixed to form a plurality of batches.
4. The cross-modal pedestrian re-identification method based on spatio-temporal features and different-center loss according to claim 3, wherein the different-center sample loss of the extracted spatial local features is calculated in step S4, and the calculation formula is as follows:
wherein L is HCS Representing iso-centric samplesLoss, ρ is the margin parameter, δ is a balance coefficient, [ x ]] + = max (x, 0) standard hinge loss, | x a -x b || 2 Denotes x a And x b A second norm therebetween; p represents the total number of different classes in a mini-batch,andrespectively representing the characteristics of the jth visible light image and the jth infrared image of the class i,andrespectively representing the central characteristics of a visible light mode and an infrared mode class i in a mini-batch;
andthe average value of all samples of the class i in each mode is obtained, and the calculation formula is as follows:
wherein K represents that the number of visible light images and infrared images in one mini-batch is K.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211169495.1A CN115497121A (en) | 2022-09-22 | 2022-09-22 | Cross-modal pedestrian re-identification method based on space-time characteristics and different center loss |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211169495.1A CN115497121A (en) | 2022-09-22 | 2022-09-22 | Cross-modal pedestrian re-identification method based on space-time characteristics and different center loss |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115497121A true CN115497121A (en) | 2022-12-20 |
Family
ID=84469865
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211169495.1A Pending CN115497121A (en) | 2022-09-22 | 2022-09-22 | Cross-modal pedestrian re-identification method based on space-time characteristics and different center loss |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115497121A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117765572A (en) * | 2024-02-22 | 2024-03-26 | 东北大学 | Pedestrian re-recognition method based on federal learning |
-
2022
- 2022-09-22 CN CN202211169495.1A patent/CN115497121A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117765572A (en) * | 2024-02-22 | 2024-03-26 | 东北大学 | Pedestrian re-recognition method based on federal learning |
CN117765572B (en) * | 2024-02-22 | 2024-08-09 | 东北大学 | Pedestrian re-recognition method based on federal learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020239026A1 (en) | Image processing method and device, method for training neural network, and storage medium | |
CN112926396B (en) | Action identification method based on double-current convolution attention | |
WO2021082480A1 (en) | Image classification method and related device | |
CN108734653B (en) | Image style conversion method and device | |
CN113343826A (en) | Training method of human face living body detection model, human face living body detection method and device | |
CN113343937B (en) | Lip language identification method based on deep convolution and attention mechanism | |
CN112200115B (en) | Face recognition training method, recognition method, device, equipment and storage medium | |
KR102592668B1 (en) | Facial recognition method and device | |
CN115497121A (en) | Cross-modal pedestrian re-identification method based on space-time characteristics and different center loss | |
CN112084895B (en) | Pedestrian re-identification method based on deep learning | |
CN113537254A (en) | Image feature extraction method and device, electronic equipment and readable storage medium | |
CN114863229A (en) | Image classification method and training method and device of image classification model | |
CN114943937A (en) | Pedestrian re-identification method and device, storage medium and electronic equipment | |
CN110414586B (en) | Anti-counterfeit label counterfeit checking method, device, equipment and medium based on deep learning | |
CN115965864A (en) | Lightweight attention mechanism network for crop disease identification | |
CN117711043A (en) | Identity login authentication method, system, equipment and medium based on shielding face | |
CN115147932A (en) | Static gesture recognition method and system based on deep learning | |
CN114120413A (en) | Model training method, image synthesis method, device, equipment and program product | |
CN112836605B (en) | Near-infrared and visible light cross-modal face recognition method based on modal augmentation | |
CN114332993A (en) | Face recognition method and device, electronic equipment and computer readable storage medium | |
CN114359993A (en) | Model training method, face recognition device, face recognition equipment, face recognition medium and product | |
CN117173744A (en) | Cross-mode pedestrian re-identification method | |
CN115641643A (en) | Gait recognition model training method, gait recognition device and gait recognition equipment | |
CN115019057A (en) | Image feature extraction model determining method and device and image identification method and device | |
CN114241573A (en) | Facial micro-expression recognition method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |