CN113761995A - Cross-mode pedestrian re-identification method based on double-transformation alignment and blocking - Google Patents
Cross-mode pedestrian re-identification method based on double-transformation alignment and blocking Download PDFInfo
- Publication number
- CN113761995A CN113761995A CN202010814790.2A CN202010814790A CN113761995A CN 113761995 A CN113761995 A CN 113761995A CN 202010814790 A CN202010814790 A CN 202010814790A CN 113761995 A CN113761995 A CN 113761995A
- Authority
- CN
- China
- Prior art keywords
- image
- visible light
- infrared
- pedestrian
- images
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 22
- 230000000903 blocking effect Effects 0.000 title claims abstract description 12
- 230000009466 transformation Effects 0.000 claims abstract description 29
- 238000005070 sampling Methods 0.000 claims abstract description 28
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 claims abstract description 13
- 238000012549 training Methods 0.000 claims abstract description 12
- 238000006243 chemical reaction Methods 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 7
- 230000011218 segmentation Effects 0.000 claims description 7
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 4
- 230000004927 fusion Effects 0.000 claims description 4
- 238000013507 mapping Methods 0.000 claims description 4
- 239000013598 vector Substances 0.000 claims description 4
- 230000003044 adaptive effect Effects 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 2
- 238000012544 monitoring process Methods 0.000 abstract description 3
- 239000000523 sample Substances 0.000 description 12
- 238000011160 research Methods 0.000 description 4
- 238000002679 ablation Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 102100040160 Rabankyrin-5 Human genes 0.000 description 1
- 101710086049 Rabankyrin-5 Proteins 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Traffic Control Systems (AREA)
- Image Processing (AREA)
- Closed-Circuit Television Systems (AREA)
Abstract
The invention provides a cross-mode pedestrian re-identification method based on double-transformation alignment and blocking. Firstly, extracting the features of the input infrared and visible light pedestrian images by using a basic branch network, linearly regressing a group of affine transformation parameters by using the high-level features of the images, and then generating an aligned image by using the parameters, wherein the image can effectively relieve the modal difference of misalignment. And then, horizontally dividing the aligned image into three blocks, taking out the characteristics of the three block images, and fusing the three block images with the aligned global characteristics and the original image characteristics to obtain the total characteristics of the visible light and the infrared image. Next, the total features of the infrared and visible images are mapped to the same embedding space. And finally, performing joint training by combining the identity loss and the most difficult batch sampling loss function with the weight to improve the identification precision. The invention is mainly applied to the video monitoring intelligent analysis application system, and has wide application prospect in the fields of image retrieval, intelligent security and the like.
Description
Technical Field
The invention relates to a cross-modal pedestrian re-identification method based on double transformation alignment and blocking, and a new network model DTASN (Dual transform alignment and segmentation network), which relates to the problem of cross-modal pedestrian re-identification in the field of video intelligent monitoring and belongs to the field of computer vision and intelligent information processing.
Background
Pedestrian Re-Identification (ReID), a technique in the field of computer vision, is aimed at retrieving a Person of interest among a plurality of non-overlapping cameras, generally considered as a sub-problem of image retrieval. An efficient ReID algorithm can relieve the pain of video watching and accelerate the investigation process. The pedestrian re-identification has broad application prospects in the fields of video monitoring, intelligent security and the like, and has attracted extensive attention in the academic and industrial fields, so that the pedestrian re-identification becomes a research hotspot which has both high research value and high challenge in the field of computer vision.
Currently, most research is mainly focused on the RGB-RGB (single modality) pedestrian re-identification problem, where both probe and galery pedestrians are visible camera captures. However, visible light cameras may not be able to capture appearance information in lighting changes, especially when lighting conditions are insufficient (e.g., at night or in dark environments). Thanks to the development of the technology, most of the current new-generation cameras can automatically switch between the visible light mode and the infrared mode according to the light conditions. Therefore, it is necessary to develop some methods to solve the visible and infrared image cross-modality ReID problem. Different from the traditional pedestrian re-identification, the Visible light and infrared image cross-modal pedestrian re-identification is to match Visible light pedestrian images with different spectrums with pedestrian images captured by an infrared camera, and the Visible light image and infrared image cross-modal pedestrian re-identification VI-ReiD (Visible and associated person re-identification) mainly solves the cross-modal image matching. The VI-ReID generally searches for an infrared (or visible light) pedestrian image in the entire camera apparatus using a visible light (or infrared) pedestrian image.
The pedestrian image (cut pedestrian) is typically obtained by an automatic detector or tracker. However, due to imperfect results of human detection/tracking, misalignment of images is usually unavoidable, i.e. there are semantic misplacement errors such as partial occlusion, missing parts (only part of the body), excessive background, etc. To solve the semantic misplacement problem in ReID. Some efforts have attempted to improve the accuracy of pedestrian matching by reducing the cross-modal differences of heterogeneous data. In addition, there are methods that focus on solving the problem of pedestrian misalignment to improve the accuracy of pedestrian matching and thus reduce modal differences to some extent. In addition to the above difficulties, the appearance of pedestrians is also greatly changed due to changes in the posture and the angle of view. Many practical problems can cause spatial semantic misalignment between images, that is, the content semantics of two matching images corresponding to the same spatial position are different, thereby limiting the robustness and effectiveness of the human re-identification technology. Therefore, it is important to develop a model with strong discrimination capability to simultaneously process cross-modal changes, which not only can reduce the cross-modal differences of heterogeneous data, but also can alleviate image differences caused by misalignment between images in the modalities, thereby improving the accuracy of cross-modal pedestrian re-identification.
Disclosure of Invention
The invention provides a cross-mode pedestrian re-identification method based on double-transformation alignment and blocking, and designs a multi-path double-transformation alignment and segmentation network structure DTASN, wherein each training batch sampling strategy is as follows: randomly selecting P pedestrians from the training data set, then randomly selecting K visible light pedestrian images and K infrared pedestrian images for each pedestrian to form batch training data containing 2PK pedestrian images, and finally sending the 2PK pedestrian images into a network for training. Under the supervision of label information, self-learning capacity of a convolutional neural network is utilized to respectively perform self-adaptive alignment correction on a visible light image and an infrared image which are seriously staggered, and the aligned and corrected images are horizontally segmented to obtain images of local blocks, so that the aim of improving cross-modal pedestrian re-identification precision is fulfilled.
A cross-mode pedestrian re-identification method based on double-transformation alignment and blocking comprises the following steps:
(1) method for extracting visible light pedestrian image by using visible light-based branch networkIs characterized by obtainingInfrared pedestrian image extraction method using infrared-based branch networkIs characterized by obtaining
(2) Taking out the characteristics of a fifth residual block (conv _5x) from the visible light base branch network, inputting the characteristics into a grid network of a visible light image space transformation module, and linearly regressing a group of affine transformation parametersAnd generating a visible light image transformation grid, and then generating a new visible light pedestrian image through a bilinear samplerThen toCarrying out feature extraction to obtain the global features of the visible light pedestrians after transformation
(3) Taking out the characteristics of a fifth residual block (conv _5x) from the infrared base branch network, inputting the characteristics into a grid network of an infrared image space transformation module, and linearly regressing a group of affine transformation parametersAnd generating an infrared image transformation grid, and then generating a new infrared pedestrian alignment image through a bilinear samplerThen toTo carry outExtracting the features to obtain global features
(4) New visible light pedestrian imageHorizontally cutting into an upper non-overlapping block, a middle non-overlapping block and a lower non-overlapping block; then extracting the characteristics of the three blocks respectively to obtain the characteristicsAndfinally, the global features of the image are alignedSumming the three image characteristics to obtain the total characteristics of the visible light conversion alignment and segmentation network
(5) New infrared pedestrian imageHorizontally cutting into an upper non-overlapping block, a middle non-overlapping block and a lower non-overlapping block; then extracting the characteristics of the three blocks respectively to obtain the characteristicsAndfinally, the global features of the image are alignedSumming the three image characteristics to obtain the total characteristics of the infrared conversion alignment and segmentation network
(6) Will be provided withFeatures extracted from visible light basic branch networkPerforming weighted addition fusion to obtain the total characteristics of visible light branchWill be provided withFeatures extracted from infrared basic branch networkCarrying out weighted addition fusion to obtain the total characteristics of the infrared branchesThen the characteristics of the visible light imageAnd features of infrared imagesAnd mapping the data to the same characteristic embedding space, and training by combining an identity loss function and a most difficult batch sampling loss function with weight, thereby finally improving the cross-modal pedestrian re-identification precision.
Drawings
FIG. 1 is a block diagram of a cross-mode pedestrian re-identification method based on double-transformation alignment and blocking according to the present invention;
fig. 2 is a diagram of a visible light transform alignment and blocking branch according to the present invention.
Fig. 3 is a block diagram of the infrared conversion alignment and blocking branch circuit of the present invention.
Detailed Description
The invention will be further described with reference to figures 1, 2 and 3:
the network structure and the principle of the DTASN model are as follows:
the network model framework learns feature representations and distance metrics in an end-to-end manner through a multipath double-aligned and block network while maintaining high resolvability. The frame includes three components: (1) the device comprises a feature extraction module, (2) a feature embedding module and (3) a loss calculation module. The backbone network junctions of all paths are the adopted deep residual network ResNet 50. Due to the lack of available data, the present invention initializes the network using a pre-trained ResNet50 model in order to speed up the convergence of the training process. To enhance the attention to the local features, the present invention applies a location attention module on each path.
For visible and infrared cross-modal pedestrian re-identification, the similarity lies in achromatic information of pedestrian contours and textures, and the significant difference lies in the imaging spectrum. Therefore, the invention designs a twin network model to extract the visual characteristics of the infrared and visible pedestrian images. As shown in fig. 1, the present invention uses two networks with the same structure to extract the feature representation of the visible light and the infrared image, and it is noted that the weights are not shared between them. The feature extraction module mainly comprises two main networks for processing visible light and infrared data: a base branching network and an alignment and segmentation network.
(1) A base branch network:
consisting of two identical sub-networks, whose weights are not shared, the backbone of the network is ResNet 50. The input images are all three-channel images, and the height and the width of the three-channel images are as follows: 288 × 144. Input images of the hypothetical visible and infrared-based branched networks are used separatelyAndit is shown that the base branch network feature extractor is denoted by phi (-). ThenCan be expressed as a visible light image extracted by using a visible light basic branch networkThe depth characteristic of (a) is,can be represented as an infrared image extracted by using an infrared basic branch networkThe depth characteristic of (a); all output feature vectors are 2048 in length.
(2) Space transformation module
Alignment principle of visible light and infrared conversion: linearly regressing a group of affine transformation parameters by using the fifth residual block characteristic conv _5x in the visible light and infrared base branchesAndthen, the coordinate relation corresponding to the images before and after the affine transformation is established by the following formula (1):
wherein,is the ith target coordinate in the regular grid of the target image,is the source coordinates of the sample points in the input image,andis an affine transformation matrix in which13And theta23Controlling the shift, theta, of the converted image11,θ12,θ21And theta22Controlling the size and rotation change of the converted image; sampling an image grid by using bilinear sampling during affine transformation;andfor the input image of the bilinear sampler, the new image of visible light and infrared outputted by space transformation is assumed to beAndthe correspondence between them is:
wherein,anda pixel value representing a coordinate (m, n) position in each channel in the target image,andrepresenting the pixel value, H, at (n, m) coordinates in each channel in the source imageAnd W represents the height and width of the target image (or source image); bilinear sampling is continuously derivable, so the above equation is continuously derivable and allows gradient back propagation, thus enabling pedestrian adaptive alignment. Global feature availability for aligned imagesAndand (4) showing. In addition, in order to learn more discriminative features, the present invention divides the transformed image horizontally into three non-overlapping fixed blocks.
(3) Visible light conversion alignment and blocking branch
As shown in fig. 2, the visible light image aligned by transformation is first horizontally divided into three non-overlapping blocks, i.e., an upper block, a middle block and a lower block; the first block height range pixels are 1 × 96, the second block height range pixels are 97 × 192, the third block height range pixels are 193 × 288, and the three block width pixels are all 144; then, the three area block images are respectively copied to the corresponding positions of 3 newly redefined sub-images with height and width of 288 × 144 and pixel values of all 0; then, extracting the transformed global features and 3 block sub-image features through 4 residual error networks respectively; the extracted features are respectivelyAndthe invention selects the global characteristic and the new image characteristics of 3 blocks to directly solve to obtain the total characteristic of the transformed image
Finally, willFeatures related to visible-light-based branched networksObtaining the final characteristics of the visible light image by means of weighted addition fusionNamely, it isWhere λ is a predefined trade-off parameter in the interval 0 to 1.
(4) Infrared conversion alignment and blocking branch
As shown in fig. 3, firstly, horizontally dividing the transformed and aligned infrared image into an upper, a middle and a lower non-overlapping blocks; the first block height range pixels are 1 × 96, the second block height range pixels are 97 × 192, the third block height range pixels are 193 × 288, and the three block width pixels are all 144; then, the three area block images are respectively copied to the corresponding positions of 3 newly redefined sub-images with height and width of 288 × 144 and pixel values of all 0; then, extracting the transformed global features and 3 block sub-image features through 4 residual error networks respectively; the extracted features are respectivelyAndthe invention selects to directly sum the global characteristic and the 3 block sub-image characteristics to obtain the total characteristic of the transformed image
Finally, willFeatures related to infrared-based branched networksObtaining the final characteristics of the visible light image by means of weighted addition fusionNamely, it isWhere λ is a predefined trade-off parameter in the interval 0 to 1 to balance the contributions of the two features.
(5) Feature embedding and loss computation
In order to reduce the difference of cross mode between the infrared image and the visible light image, the same nesting function f is usedθ,fθEssentially a fully connected layer (assuming its parameters are theta), characterizing the visible imageAnd infrared image characteristicsAnd mapping to the same feature space to obtain nested featuresAndis abbreviated asAnd andrespectively representing one-dimensional feature vectors with the output length of 512; for simplicity of presentation, use is made ofTo represent a visible light image batchThe jth image of the ith person in (1), similarly for an infrared image of a batchThe same is also true.
Identity loss function:
suppose thatAndthen theAndrespectively represent the input pedestrianAndthe identity prediction probability of (a); for example,representing predictive input visible light imagesIs the probability of k; use ofAndinput image representing true identity iOf (2), i.e. ofAndthen the identity loss function for predicting identity using cross-entropy loss in a batch is defined as:
weighted most difficult batch sampling loss function:
due to LidOnly the identity of each input sample is considered, and whether the input visible light and infrared belong to the same identity is not emphasized; in order to further relieve the cross-modal difference between the infrared image and the visible light image, the invention uses a single-batch self-adaptive weighted most difficult triple sampling loss function, which is different from TriHard loss, because TriHard loss only considers the information of extreme samples, thus causing extremely large local gradient and network collapseID identities andsame positive sampleFor the positive sample pair, the larger the Euclidean distance in the nested feature space is, the larger the weight distribution is; in the same way, forThe ID and ID can also be calculated in all visible light images of the batch respectivelyDifferent negative examplesFor the negative sample pair, the larger the Euclidean distance in the nested feature space is, the smaller the weight distribution is; it can therefore be seen that different distances (i.e., different degrees of difficulty) are assigned different weights; therefore, the most difficult triple sampling loss function with the weight inherits the advantage of optimizing the relative distance between the positive sample pair and the negative sample pair, avoids introducing any redundant parameter, and enables the triple sampling loss function to be more flexible and strong in adaptability; thus, the anchor point samples are for each visible light image in each batchWeighted least difficult triple sampling loss functionIs calculated as
Where p is the corresponding positive sample set and n isNegative set, Wi pIs a positive sample distance weight, Wi nRepresenting the distance weight of the negative sample; similarly, for each infrared image anchor point sample in each batchWeighted least difficult triple sampling loss functionThe calculation is as follows:
thus, the overall most difficult triplet sampling loss function with weights is:
finally, the total loss function is defined as:
Lwrt=Lid+λLc_wrt (14)
where λ is a predefined parameter for balancing ID identity loss LidAnd the most difficult triplet sampling loss with weight, Lc_wrtThe contribution of (c).
The invention carries out network structure ablation research on data sets of RegDB and SYSU-MM01, wherein Baseline represents a reference network, LidRepresents a recognition loss, Lc_wrtRepresents the most difficult triple sampling loss function with weights, RE is random erasure, PA represents the location attention module PAM, ST represents the STN spatial transformation network,HDB means horizontally chunking. In addition, the method is compared with some mainstream algorithms, evaluation is carried out by using a single query setting, and Rank-1, Rank-5, Rank-10 and mAP (average matching precision) are used as evaluation indexes. The experimental results are shown in tables 1, 2, 3 and 4, and the experimental accuracy is greatly improved compared with the reference network and other comparison algorithms.
TABLE 1 ablation study on network Structure RegDB data
TABLE 2 ablation study of network architecture on YSU-MM01 data
Table 3 comparison with mainstream algorithm results on RegDB dataset
Table 4 compares the results of the mainstream algorithm against the SYSU-MM01 dataset
Claims (6)
1. A cross-mode pedestrian re-identification method based on double-transformation alignment and blocking is characterized by comprising the following steps:
(1) method for extracting visible light pedestrian image by using visible light-based branch networkIs characterized by obtainingInfrared pedestrian image extraction method using infrared-based branch networkIs characterized by obtaining
(2) Taking out the characteristics of a fifth residual block (conv _5x) from the visible light base branch network, inputting the characteristics into a grid network of a visible light image space transformation module, and linearly regressing a group of affine transformation parametersAnd generating a visible light image transformation grid, and then generating a new visible light pedestrian alignment image through a bilinear samplerThen toCarrying out feature extraction to obtain global features
(3) Taking out the characteristics of a fifth residual block (conv _5x) from the infrared base branch network, inputting the characteristics into a grid network of an infrared image space transformation module, and linearly regressing a group of affine transformation parametersAnd generating an infrared image transformation grid, and then generating a new infrared pedestrian alignment image through a bilinear samplerThen toCarrying out feature extraction to obtain global features
(4) New visible light pedestrian imageHorizontally cutting into an upper non-overlapping block, a middle non-overlapping block and a lower non-overlapping block; then extracting the characteristics of the three blocks respectively to obtain the characteristicsAndfinally, the global features of the image are alignedSumming the three image characteristics to obtain the total characteristics of the visible light conversion alignment and segmentation network
(5) New infrared pedestrian imageHorizontally cutting into an upper non-overlapping block, a middle non-overlapping block and a lower non-overlapping block; then extracting the characteristics of the three blocks respectively to obtain the characteristicsAndfinally, the global features of the image are alignedSumming the three image characteristics to obtain the total characteristics of the infrared conversion alignment and segmentation network
(6) Will be provided withFeatures extracted from visible light basic branch networkPerforming weighted addition fusion to obtain the total characteristics of visible light branchWill be provided withFeatures extracted from infrared basic branch networkCarrying out weighted addition fusion to obtain the total characteristics of the infrared branchesThen the characteristics of the visible light imageAnd features of infrared imagesAnd mapping the data to the same characteristic embedding space, and training by combining an identity loss function and a most difficult batch sampling loss function with weight, thereby finally improving the cross-modal pedestrian re-identification precision.
2. The method of claim 1, wherein the sampling strategy for each training batch in step (1) is: randomly selecting P pedestrians from a training data set, then randomly selecting K visible light pedestrian images and K infrared pedestrian images for each pedestrian to form batch training data containing 2PK pedestrian images, and finally sending the 2PK pedestrian images into a network for training;representing a visible light image extracted using a visible light basic branch networkThe depth characteristic of (a) is,representing infrared images extracted using an infrared-based branched networkThe depth characteristic of (a); all output feature vectors are 2048 in length.
3. The method according to claim 1, wherein the transformation alignment is performed in steps (2) and (3) by using the fifth residual block conv _5x extracted from the visible light basic branch (infrared basic branch) to linearly regress a set of affine transformation parametersAndthen, establishing a coordinate relation corresponding to the images before and after the affine transformation through a formula (1):
wherein,is the ith target coordinate in the regular grid of the target image,is the source coordinates of the sample points in the input image,andis an affine transformation matrix in which13And theta23Controlling the shift, theta, of the converted image11,θ12,θ21And theta22Controlling the size and rotation change of the converted image; sampling an image grid by using bilinear sampling during affine transformation;andfor the input image of the bilinear sampler, the new image of visible light and infrared outputted by space transformation is assumed to beAndthe correspondence between them is:
wherein,anda pixel value representing a coordinate (m, n) position in each channel in the target image,andrepresents the pixel value at the (n, m) coordinate in each channel in the source image, and H and W represent the height and width of the target image (or source image); bilinear sampling is continuously derivable, so the above equation is continuously derivable and allows gradient back propagation, thereby enabling pedestrian adaptive alignment, aligning global features of the image with available global featuresAndit is shown that, in addition, the present invention horizontally divides the transformed image into three non-overlapping fixed blocks in order to learn more discriminative features.
4. The method according to claim 1, wherein in step (4), the transformed aligned image is first horizontally sliced into an upper, a middle and a lower blocks, respectively; the pixels in the first height range are 1-96, the pixels in the second height range are 97-192, the pixels in the third height range are 193-288, and the pixels in the three width ranges are all 144; then, the three area block images are respectively copied to the corresponding positions of 3 newly redefined sub-images with height and width of 288 × 144 and pixel values of all 0; next, the transformed global is extracted through 4 Resnet50 residual networks, respectivelyA feature and 3 sub-map features; the obtained characteristics are respectivelyAndthe invention selects a mode of directly summing the global characteristic and the 3 block sub-image characteristics to obtain the total characteristic of the transformed image
Finally, willAnd the characteristics of the original map in the step (1)Obtaining the final characteristics of the visible light image by means of weighted addition fusionNamely, it isWhere λ is a predefined trade-off parameter in the interval 0 to 1 to balance the contributions of the two features.
5. The method according to claim 1, wherein in step (5), the transformed aligned image is first horizontally sliced into an upper, a middle and a lower blocks, respectively; the pixels in the first height range are 1-96, the pixels in the second height range are 97-192, the pixels in the third height range are 193-288, and the pixels in the three width ranges are all 144; then, the three region block images are respectively copied to the image data3 newly defined new height and width pixels are 288 multiplied by 144, and the pixel values are all 0 at the corresponding positions of the subgraph; next, extracting the transformed global features and 3 sub-graph features through 4 Resnet50 residual error networks respectively; the obtained characteristics are respectivelyAndthe invention selects a mode of directly summing the global characteristic and the 3 block sub-image characteristics to obtain the total characteristic of the transformed image
Finally, willAnd the characteristics of the original map in the step (1)Obtaining the final characteristics of the visible light image by means of weighted addition fusionNamely, it isWhere λ is a predefined trade-off parameter in the interval 0 to 1 to balance the contributions of the two features.
6. The method according to claim 1, wherein in step (6) for reducing the cross-modal difference between the infrared image and the visible light image, the same nesting function f is usedθ,fθEssentially a fully connected layer (assuming its parameters are theta), characterizing the visible imageAnd infrared image characteristicsAnd mapping to the same feature space to obtain nested featuresAndis abbreviated asAnd andrespectively representing one-dimensional feature vectors with the output length of 512; for simplicity of presentation, use is made ofTo represent a visible light image batchThe jth image of the ith person in (1), similarly for an infrared image of a batchAre also denoted by the same; suppose thatAndthen theAndrespectively represent the input pedestrianAndthe identity prediction probability of (a); for example,representing predictive input visible light imagesIs the probability of k; use ofAndinput image representing true identity iOf (2), i.e. ofAndthen the identity loss function for predicting identity using cross-entropy loss in a batch is defined as:
due to LidOnly the identity of each input sample is considered, and whether the input visible light and infrared belong to the same identity is not emphasized; in order to further relieve the cross-modal difference between the infrared image and the visible light image, the TriHardloss (the most difficult triple sampling loss) only considers the information of extreme samples, so that the local gradient is extremely large, and the network is broken down; unlike TriHardloss, the present invention uses the most difficult triple sampling loss function for single batch adaptive weighting; the core idea is that for each infrared image sample in a batchID identities andsame positive sampleFor the positive sample pair, the larger the Euclidean distance in the nested feature space is, the larger the weight distribution is; in the same way, forThe ID and ID can also be calculated in all visible light images of the batch respectivelyDifferent negative examplesFor negative example pairs, in nested featuresThe bigger the European distance in the space is, the smaller the weight distribution is; it can therefore be seen that different distances (with different degrees of difficulty) are assigned different weights; therefore, the most difficult triple sampling loss function with the weight inherits the advantage of optimizing the relative distance between the positive sample pair and the negative sample pair, avoids introducing any redundant parameter, and enables the triple sampling loss function to be more flexible and strong in adaptability; thus, the anchor point samples are for each visible light image in each batchWeighted least difficult triple sampling loss functionThe calculation is as follows:
where p is the corresponding positive sample set, n is the negative sample set, Wi pIs a positive sample distance weight, Wi nRepresenting the distance weight of the negative sample; similarly, for each infrared image anchor point sample in each batchWeighted least difficult triple sampling loss functionThe calculation is as follows:
thus, the overall most difficult triplet sampling loss function with weights is:
finally, the total loss function is defined as:
Lwrt=Lid+λLc_wrt (11)
where λ is a predefined parameter for balancing ID identity loss LidAnd the most difficult triplet sampling loss with weight, Lc_wrtThe contribution of (c).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010814790.2A CN113761995A (en) | 2020-08-13 | 2020-08-13 | Cross-mode pedestrian re-identification method based on double-transformation alignment and blocking |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010814790.2A CN113761995A (en) | 2020-08-13 | 2020-08-13 | Cross-mode pedestrian re-identification method based on double-transformation alignment and blocking |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113761995A true CN113761995A (en) | 2021-12-07 |
Family
ID=78785620
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010814790.2A Pending CN113761995A (en) | 2020-08-13 | 2020-08-13 | Cross-mode pedestrian re-identification method based on double-transformation alignment and blocking |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113761995A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114612937A (en) * | 2022-03-15 | 2022-06-10 | 西安电子科技大学 | Single-mode enhancement-based infrared and visible light fusion pedestrian detection method |
CN116071369A (en) * | 2022-12-13 | 2023-05-05 | 哈尔滨理工大学 | Infrared image processing method and device |
WO2023231233A1 (en) * | 2022-05-31 | 2023-12-07 | 浪潮电子信息产业股份有限公司 | Cross-modal target re-identification method and apparatus, device, and medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107480178A (en) * | 2017-07-01 | 2017-12-15 | 广州深域信息科技有限公司 | A kind of pedestrian's recognition methods again compared based on image and video cross-module state |
US10176405B1 (en) * | 2018-06-18 | 2019-01-08 | Inception Institute Of Artificial Intelligence | Vehicle re-identification techniques using neural networks for image analysis, viewpoint-aware pattern recognition, and generation of multi- view vehicle representations |
CN111325115A (en) * | 2020-02-05 | 2020-06-23 | 山东师范大学 | Countermeasures cross-modal pedestrian re-identification method and system with triple constraint loss |
-
2020
- 2020-08-13 CN CN202010814790.2A patent/CN113761995A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107480178A (en) * | 2017-07-01 | 2017-12-15 | 广州深域信息科技有限公司 | A kind of pedestrian's recognition methods again compared based on image and video cross-module state |
US10176405B1 (en) * | 2018-06-18 | 2019-01-08 | Inception Institute Of Artificial Intelligence | Vehicle re-identification techniques using neural networks for image analysis, viewpoint-aware pattern recognition, and generation of multi- view vehicle representations |
CN111325115A (en) * | 2020-02-05 | 2020-06-23 | 山东师范大学 | Countermeasures cross-modal pedestrian re-identification method and system with triple constraint loss |
Non-Patent Citations (4)
Title |
---|
BO LI ET AL.: "Visible Infrared Cross-Modality Person Re-Identification Network Based on Adaptive Pedestrian Alignment" * |
MANG YE ET AL.: "Deep Learning for Person Re-identification: A Survey and Outlook" * |
MANG YE ET AL.: "Visible Thermal Person Re-Identification via Dual-Constrained Top-Ranking" * |
罗浩 ET AL.: "基于深度学习的行人重识别研究进展" * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114612937A (en) * | 2022-03-15 | 2022-06-10 | 西安电子科技大学 | Single-mode enhancement-based infrared and visible light fusion pedestrian detection method |
WO2023231233A1 (en) * | 2022-05-31 | 2023-12-07 | 浪潮电子信息产业股份有限公司 | Cross-modal target re-identification method and apparatus, device, and medium |
CN116071369A (en) * | 2022-12-13 | 2023-05-05 | 哈尔滨理工大学 | Infrared image processing method and device |
CN116071369B (en) * | 2022-12-13 | 2023-07-14 | 哈尔滨理工大学 | Infrared image processing method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111259786B (en) | Pedestrian re-identification method based on synchronous enhancement of appearance and motion information of video | |
CN108960140B (en) | Pedestrian re-identification method based on multi-region feature extraction and fusion | |
CN110728263B (en) | Pedestrian re-recognition method based on strong discrimination feature learning of distance selection | |
CN107832672B (en) | Pedestrian re-identification method for designing multi-loss function by utilizing attitude information | |
CN112651262B (en) | Cross-modal pedestrian re-identification method based on self-adaptive pedestrian alignment | |
CN113761995A (en) | Cross-mode pedestrian re-identification method based on double-transformation alignment and blocking | |
WO2023087636A1 (en) | Anomaly detection method and apparatus, and electronic device, storage medium and computer program product | |
CN114861761B (en) | Loop detection method based on twin network characteristics and geometric verification | |
CN111709317A (en) | Pedestrian re-identification method based on multi-scale features under saliency model | |
CN115841683A (en) | Light-weight pedestrian re-identification method combining multi-level features | |
Rong et al. | Picking point recognition for ripe tomatoes using semantic segmentation and morphological processing | |
CN116311384A (en) | Cross-modal pedestrian re-recognition method and device based on intermediate mode and characterization learning | |
CN117274627A (en) | Multi-temporal snow remote sensing image matching method and system based on image conversion | |
Chen et al. | Self-supervised feature learning for long-term metric visual localization | |
Zhang et al. | Fine-grained-based multi-feature fusion for occluded person re-identification | |
CN118038494A (en) | Cross-modal pedestrian re-identification method for damage scene robustness | |
Zhang et al. | Two-stage domain adaptation for infrared ship target segmentation | |
Gao et al. | Occluded person re-identification based on feature fusion and sparse reconstruction | |
CN113011359A (en) | Method for simultaneously detecting plane structure and generating plane description based on image and application | |
CN116597267B (en) | Image recognition method, device, computer equipment and storage medium | |
CN116597177A (en) | Multi-source image block matching method based on dual-branch parallel depth interaction cooperation | |
Zhang et al. | Depth image based object Localization using binocular camera and dual-stream convolutional neural network | |
Xi et al. | EMA‐GAN: A Generative Adversarial Network for Infrared and Visible Image Fusion with Multiscale Attention Network and Expectation Maximization Algorithm | |
CN114154576B (en) | Feature selection model training method and system based on hybrid supervision | |
CN112784674B (en) | Cross-domain identification method of key personnel search system based on class center self-adaption |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20211207 |
|
WD01 | Invention patent application deemed withdrawn after publication |