CN114495281A - Cross-modal pedestrian re-identification method based on integral and partial constraints - Google Patents

Cross-modal pedestrian re-identification method based on integral and partial constraints Download PDF

Info

Publication number
CN114495281A
CN114495281A CN202210124910.5A CN202210124910A CN114495281A CN 114495281 A CN114495281 A CN 114495281A CN 202210124910 A CN202210124910 A CN 202210124910A CN 114495281 A CN114495281 A CN 114495281A
Authority
CN
China
Prior art keywords
pedestrian
loss
infrared
image
rgb
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210124910.5A
Other languages
Chinese (zh)
Inventor
吕址函
朱松豪
梁志伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202210124910.5A priority Critical patent/CN114495281A/en
Publication of CN114495281A publication Critical patent/CN114495281A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

According to the overall and partial constraint-based cross-modal pedestrian re-identification method, local pedestrian features are extracted deeply from two different modalities by using a mixed crossed dual-path feature learning network, and then the extracted features are horizontally cut into p components and then are mapped to a public space for horizontally cutting the local features and the global features of an image, so that the pedestrian feature characterization capability is improved; finally, through the common cooperation of the modal specific identity loss, the cross entropy loss and the proposed loss function, the difference between the modalities is reduced, and the overall performance is improved; during training data, random horizontal flipping and random erasure of the enhancement data are used to expand the training data.

Description

Cross-modal pedestrian re-identification method based on integral and partial constraints
Technical Field
The invention belongs to the technical field of pedestrian re-identification, and particularly relates to a cross-modal pedestrian re-identification method based on integral and partial constraints.
Background
Pedestrian re-identification is a specific pedestrian retrieval task that uses computer vision techniques to determine whether a particular pedestrian is present in an image or video. In recent years, with the continuous development of society, people pay more and more attention to public safety problems, and the pedestrian re-identification arouses great research interest. At present, most of researches mainly process human images shot by a visible camera, however, the methods have many limitations. For example, many criminal events occur at night and conventional video cameras cannot capture clear images. Thus, these methods are not effective in the case of insufficient light.
Most of the existing research focuses on pedestrian re-identification in a visual modality, and compared with a visual image, an infrared image lacks rich color information, so that a common pedestrian re-identification method based on a visible light image is not feasible in the infrared image. It was found from a search of prior art documents that Wu et al presented a large scale cross modal pedestrian re-identification dataset named SYSY-MM01, while they evaluated three common neural network structures: single stream, dual stream and asymmetric fully connected layer, and deep zero padding is proposed for training single stream networks. Zheng et al introduced a joint learning framework that coupled end-to-end pedestrian re-identification learning and data generation to solve the infrared-visible light pedestrian re-identification problem. Mang et al propose a dynamic double attention clustering learning framework that avoids the learning of models from being easily disturbed by noise and becoming unstable. Li et al adds an X modality in the network to account for the modality difference. Many existing methods focus on reducing the difference between infrared and visual modalities, and recognition accuracy is less than ideal in the case of cable sharing. The method solves the problem of cross-modal pedestrian re-identification to a certain extent, but still has the defects.
Therefore, the cross-modal pedestrian re-identification has the following problems to be solved urgently, (1) the difference between the visible mode and the infrared mode is large, and the existing method has a certain effect but still has a large rising space; (2) and the cross-mode pedestrian re-identification data set is less, so that the training data is insufficient. The pedestrian identification method is not only a problem in cross-modal pedestrian identification, but also a common problem in pedestrian identification, the academic world does not have a data set with complex scene and large scale for research, and the industrial world has a large amount of data but cannot be sourced due to privacy problems.
Disclosure of Invention
In order to solve the problems, the invention provides a hybrid cross dual-path feature learning network (HCDFL), which deeply extracts local pedestrian features from two different modes; the method has the advantages that a novel overall constraint function and a partial triple-center loss function are utilized, inter-class and intra-class differences are improved from two aspects of different modes and the same mode respectively, local features of pedestrians are better represented, and overall recognition performance is improved; meanwhile, random horizontal flipping and random erasure of the enhancement data are used to expand the training data.
The invention relates to a cross-modal pedestrian re-identification method based on integral and partial constraints, which comprises the following steps:
s1, extracting pedestrian information characteristics under different modes from the RGB image and the infrared image under the same scene by using two independent and same branch networks respectively;
s2, uniformly dividing the extracted features into P horizontal components from top to bottom, projecting the P horizontal components to a public space, and outputting a joint representation of the modal specific features and the modal common features;
s3, constructing a multi-loss function, wherein the multi-loss function comprises mode specific identity loss, cross entropy loss, proposed overall constraint and partial triple-center loss, mixing and crossing the combined features by using the multi-loss function, and reducing image difference between the infrared mode and the RGB mode through mode distance constraint so as to obtain the optimal recognition performance.
Further, the multiple loss function is:
Figure BDA0003499988860000021
therein
Figure BDA0003499988860000022
And
Figure BDA0003499988860000023
respectively, representing the loss of identity, L, of the particular modality softmax of the RGB branch and the infrared branchCERepresents the cross entropy loss, LWCPTLRepresenting the overall constraint and the partial triplet-center loss function; λ represents the pre-training coefficients used to balance the overall loss function.
Further, the overall constraint process in the loss function includes two steps: firstly, the distance between different pedestrian samples in the same mode is enlarged, and the distance between the same pedestrian sample in the RGB mode and the same pedestrian sample in the infrared mode is reduced; then, the distance between the same pedestrian samples in the two modes is continuously reduced, the similarity of the identity of the pedestrian is improved, and the difference between different pedestrian samples in the modes is reduced; given the depth characteristics of pedestrians in different modes
Figure BDA0003499988860000024
Wherein i is more than or equal to 1 and less than or equal to N,
Figure BDA0003499988860000025
and
Figure BDA0003499988860000026
respectively representing the identity of the ith pedestrian in RGB and infrared modalities,
Figure BDA0003499988860000027
and
Figure BDA0003499988860000028
respectively representing the p-th identity and the q-th identity of the ith pedestrian under RGB and infrared modes, and the specific formula of the integral constraint LW is as follows:
Figure BDA0003499988860000029
the partial-center triplet loss formula is as follows:
Figure BDA0003499988860000031
where LP represents a partial triplet-center loss, xiAnd ziRespectively representing RGB image features and infrared image features, c1yiAnd c2yiRespectively representing the ith class center, y in RGB and infrared modalitiesiAn identity tag representing the ith sample, a represents an offset, N represents a batch number,
Figure BDA0003499988860000032
represents the Euclidean distance, [ x ]]+=max(0,x);
In summary, the overall constraint and partial triplet-center loss function can be expressed as:
LWCPTL=LW+LP
further, modality specific identity loss: because the pedestrian characteristics in the RGB image and the infrared image are very different, different networks are used to obtain the characteristic representation in different modalities, and the Softmax loss is used to predict the pedestrian identity in each modality, and the formula can be expressed as follows:
Figure BDA0003499988860000033
Figure BDA0003499988860000034
in the formula
Figure BDA0003499988860000035
And
Figure BDA0003499988860000036
respectively represent belonging to
Figure BDA0003499988860000037
And
Figure BDA0003499988860000038
the ith RGB image feature and the infrared image feature of the class,
Figure BDA0003499988860000039
and
Figure BDA00034999888600000310
respectively represent the weight W in the last full connection layerVAnd WIJ (th) column of (b)VAnd bIRespectively representing RGB and infrared modal bias, M representing head of the line, NVAnd NIRespectively representing the number of RGB image and infrared image training samples in the same batch,
Figure BDA00034999888600000311
and
Figure BDA00034999888600000312
respectively representing the loss of identity functions of the RGB image and the infrared image.
Further, in order to make the feature characterization of the same pedestrian have similarity under different modes, a cross entropy loss function shown as follows is introduced:
Figure BDA00034999888600000313
wherein yiThe true label representing the ith input image, i.e., p part features of each input image, share the label information of the image.
The invention has the beneficial effects that: the invention discloses a cross-modal pedestrian re-identification method based on integral and partial constraints, and provides a hybrid cross dual-path feature learning network, which is provided with a modal sharing parameter layer and a modal unique parameter layer for carrying out feature extraction on pedestrian pictures of different modalities; secondly, the network horizontally cuts the pedestrian features, so that local features and global features of the image can be better learned, and the pedestrian feature characterization capability is improved; meanwhile, aiming at the model embedding layer, the network forms a plurality of different batch combinations by cross-combining the features, which is beneficial to feature matching and modal distance constraint; when a loss function is designed, consistency constraint of feature distribution in different modal data classes and a correlation constraint criterion between the classes are fully considered, and a novel overall constraint and partial triple-center loss function are provided for improving modal differences and enabling samples of the same class to be closer to class centers and far away from other class centers; during training data, random horizontal flipping and random erasure of the enhancement data are used to expand the training data.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is an end-to-end block diagram illustration of cross-modal pedestrian re-identification based on global constraints and partial triplet-center loss in accordance with the present invention;
FIG. 3 is a schematic diagram of the combination of triple loss, center loss, and softmax loss, respectively;
FIG. 4 is a schematic diagram of the overall constraint of the present invention;
FIG. 5 is a schematic of the partial-triplet center loss of the present invention;
FIG. 6 is a diagram illustrating the recognition effect of the present invention on the SYSU-MM01 and RegDB data sets.
Detailed Description
In order that the present invention may be more readily and clearly understood, a more particular description of the invention briefly described above will be rendered by reference to specific embodiments that are illustrated in the appended drawings.
With reference to fig. 1 and fig. 2, the method provided by the present invention first extracts pedestrian information in different modalities by using RGB branches and infrared branches of which the main networks are ResNet50, and uniformly divides the extracted features into p horizontal components from top to bottom by using an average pooling layer; then, projecting the horizontal cutting features to a common space, and outputting a joint representation of the modality specific features and the modality common features; and finally, mixing and crossing the joint features by using the modal specific identity loss, the cross entropy loss, the proposed overall constraint and the proposed partial triple-center loss, and obtaining the optimal recognition performance through modal distance constraint.
The overall constraint and partial triple-center loss proposed by the invention firstly integrally constrain the distances between different modes, thereby reducing the difference between RGB and infrared modes; secondly, the loss function learns the centers of the RGB mode and the infrared mode respectively by combining the triple loss and the center loss, so that the same class sample is closer to the class center and is far away from other class centers, and the intra-mode class difference is improved.
1. Hybrid cross dual path feature learning network
In visible single-mode-based pedestrian re-identification, a common method is to horizontally segment a pedestrian image, extract local features, and then perform feature matching on the pedestrian image, but in infrared-visible light pedestrian re-identification, an image shot by an infrared camera is greatly different from a visible image; the infrared image retains only inherent features such as the overall appearance of the pedestrian and the posture information of the pedestrian, but loses important information such as color and illumination. Therefore, the conventional method cannot be used to solve the infrared-visible light pedestrian re-recognition problem.
As shown in fig. 2, a dual-stream structure is used as a basic structure, mainly because the single-stream structure uses a common feature extraction network, and features of RGB and infrared images cannot be accurately extracted in the process; in addition, the single-flow structure shares global parameters, so that the local characteristics of pedestrians are seriously ignored. In the double-flow structure, the shallow network parameters are individually specific to each mode, and the deep network parameters are shared, so that the local characteristics and the global characteristics are considered, and the identification performance is improved. Therefore, the invention adopts the traditional two-way local feature network, and the network consists of a feature extractor and a feature embedding part.
The infrared-visible pedestrian re-identification dataset may be represented as D ═ { V, I }, where V represents an RGB image and I represents an infrared image.
In the feature extraction stage, after the features of the backbone network ResNet50 are extracted, the corresponding pedestrian features are respectively obtained, and the final average pooling layer and the network with the subsequent structure are removed, so that the purposes of expanding the area of the receiving domain and enriching the feature granularity are achieved. Particularly, the two branches use the same network structure, and the design can enable the high-level feature output to express high-level semantics better and enable the identity discrimination capability of the feature to be stronger.
In the characteristic embedding stage, firstly, horizontally dividing the pedestrian characteristics into p identical components for learning a low-dimensional embedding space between two heterogeneous modes; then, using the global pooling layer on each part, p 2048-dimensional features are obtained. In order to further reduce the feature dimension, dimension reduction operation is carried out on each 2048-dimensional component feature by adopting a 1 × 1 convolution layer, and finally 256-dimensional feature expression is obtained;
meanwhile, in order to avoid gradient extinction and calculate internal covariant offset, a batch standardization layer is added behind each full-connection layer; finally, the shared layer is used as a projection function to project the characteristics of two different modes to a common embedding space so as to close the difference between the two modes.
In the training stage, the network model is trained by combining the modal specific identity loss, the cross entropy loss, the proposed overall constraint and the proposed partial triplet-center loss so as to improve the accuracy of recognition. And dividing the joint representation characteristics of the RGB branches and the infrared branches into three groups by using mixed cross training, wherein the three groups are respectively part constraint, integral constraint and cross entropy loss, and the part constraint and the integral constraint form the proposed integral constraint and part triplet-center loss function. In the testing stage, the characteristics of the detection image and the gallery image are respectively extracted and then connected with the characteristics of the high-dimensional image to form a final characteristic descriptor.
After pedestrian features are respectively extracted through a backbone network by a traditional double-path feature learning network, the features are fused through a weight sharing module and are directly output, and cross-modal information is tried to be directly learned from two original modes. The results of the relevant experiments show that these methods are not sufficient to narrow the gap between the two modalities. The network provided by the invention combines pedestrian characteristics in a cross way to form a plurality of different batch combinations and combines multiple loss functions to cooperate together. The characteristic cross combination mode is beneficial to balancing the expression learning capacity of the model aiming at the characteristic characteristics and the shared characteristics of different modal data, and the matching capacity among the multimodal data is effectively improved.
The mode specific loss function directly utilizes mode information and reserves the most original pedestrian characteristics; the cross entropy loss is used for identifying the identity of the pedestrian, and RGB and infrared modal characteristics are extracted to form a batch; within the same batch, the characteristics of the RGB image and the infrared image have consistency, so that pairs of batchs are respectively constructed by using partial constraint and integral constraint.
Constructing a multi-loss function by joint cooperation as shown in formula (1), wherein the multi-loss function comprises modal specific identity loss, cross entropy loss, overall constraint loss and partial triple-center loss; the overall loss function of the proposed framework can be expressed as:
Figure BDA0003499988860000061
wherein the content of the first and second substances,
Figure BDA0003499988860000062
and
Figure BDA0003499988860000063
respectively, representing the loss of identity, L, of the particular modality softmax of the RGB branch and the infrared branchceRepresents the cross entropy loss LWCPTRepresenting the overall constraint and the partial triplet-center loss function. λ represents the pre-training coefficients used to balance the overall loss function.
Modality specific identity loss: due to the fact that pedestrian characteristics in the RGB image and the infrared image are different greatly, different networks are used for obtaining characteristic representations in different modes. The Softmax loss is used to predict pedestrian identity in each modality, and the formula can be expressed as:
Figure BDA0003499988860000064
wherein the content of the first and second substances,
Figure BDA0003499988860000065
and
Figure BDA0003499988860000066
respectively represent belonging to
Figure BDA0003499988860000067
And
Figure BDA0003499988860000068
the ith RGB image feature and the infrared image feature of the class,
Figure BDA0003499988860000069
and
Figure BDA00034999888600000610
respectively represent the weight W in the last full connection layerVAnd WIJ (th) column of (b)VAnd bIRespectively representing RGB and infrared modal bias, M representing head of the line, NVAnd NIRespectively representing the number of RGB image and infrared image training samples in the same batch.
Cross entropy loss: in order to make the feature characterization of the same pedestrian have similarity under different modes, a cross entropy loss function shown as follows is introduced:
Figure BDA0003499988860000071
i.e. p part features of each input image share the label information of the image.
2. Global constraint and partial triplet-centric loss
The invention provides a novel overall constraint and partial triple-center loss, and the function improves the difference between classes and in classes from two aspects of different modes and the same mode respectively and improves the overall identification performance.
The triple loss function is often applied to the fields of face recognition, pedestrian re-recognition and the like, and not only has the characteristic of shortening the intra-class distance, but also has the characteristic of increasing the inter-class distance; for the infrared-visible light pedestrian re-identification task, the pedestrian images have the inter-class distance in the same mode and have the inter-class distances in different modes.
The triplet loss function is formulated as follows:
Figure BDA0003499988860000072
wherein the content of the first and second substances,
Figure BDA0003499988860000073
respectively representing the feature representations of the anchor point, the positive sample image and the negative sample image,
Figure BDA0003499988860000074
and
Figure BDA0003499988860000075
is the same as the identity information of (2), and
Figure BDA0003499988860000076
and
Figure BDA0003499988860000077
is different, alpha represents an offset, N represents a batch number,
Figure BDA0003499988860000078
represents the Euclidean distance, [ x ]]+=max(0,x)。
As can be seen from fig. 3(a), although the two loss functions are combined to achieve a good effect, the data distribution is not uniform, and the model performance is not stable. The center loss is firstly applied to the field of face recognition, is used for constraining the distance between a sample and the center of the class of the sample, and learns a center for each class. The central loss function is formulated as follows:
Figure BDA0003499988860000079
wherein x isiFor the characteristic representation, yiTo correspond to xiClass (c) ofyiRepresents a category yiM, represents the minimum batch size,
Figure BDA00034999888600000710
representing the euclidean distance.
As can be seen in connection with fig. 3(b), the key to the overall constraint loss learning of features between modalities is to narrow the cross-modality differences. Due to the drastic visual change, the cross-modal differences may be large, which will greatly reduce the pedestrian re-recognition performance, and thus the cross-modal differences need to be reduced as a whole.
With reference to fig. 4, the overall constraint process in the loss function proposed by the present invention includes two steps: firstly, expanding the distance between different pedestrian samples in the same mode, and simultaneously reducing the distance between the same pedestrian sample in the RGB mode and the infrared mode; then, the distance between the same pedestrian samples in the two modes is continuously reduced, the similarity of the identity of the pedestrian is improved, and the difference between different pedestrian samples in the modes is reduced. Given the depth characteristics of pedestrians in different modes
Figure BDA0003499988860000081
Wherein i is more than or equal to 1 and less than or equal to N,
Figure BDA0003499988860000082
and
Figure BDA0003499988860000083
respectively representing the identity of the ith pedestrian in RGB and infrared modalities,
Figure BDA0003499988860000084
and
Figure BDA0003499988860000085
respectively representing the p-th identity and the q-th identity of the ith pedestrian under RGB and infrared modes, and the specific formula is as follows:
Figure BDA0003499988860000086
with reference to fig. 5, by combining two loss functions, samples in two modalities can be considered at the same time, which is beneficial to reduce intra-class differences in the same modality, reduce differences between modalities, and improve recognition accuracy.
The partial-center triplet loss formula is as follows:
Figure BDA0003499988860000087
wherein x isiAnd ziRepresenting RGB image features and infrared image features, respectively, c1yiAnd c2yiRespectively representing the ith class center, y in RGB and infrared modalitiesiAn identity tag representing the ith sample, a represents an offset, N represents a batch number,
Figure BDA0003499988860000088
represents the Euclidean distance, [ x ]]+=max(0,x)。
In summary, the overall constraint and partial triplet-center loss function can be expressed as:
LWCPTL=LW+LP (8)
the result shown in fig. 6 is a schematic diagram of the recognition effect of the method of the present invention on the SYSU-MM01 and RegDB data sets, and it can be seen from the diagram that the recognition effect of the method is ideal and the recognition accuracy is high.
In summary, the present invention addresses the problems in cross-modal pedestrian re-identification, and on one hand, proposes a hybrid cross-over dual-path feature learning network (HCDFL) that deeply extracts local pedestrian features from two different modalities. The network model firstly extracts pedestrian features under different modes, then horizontally cuts the extracted features into p components and maps the components to a public space, and the components are used for horizontally cutting local features and global features of an image, so that the pedestrian feature characterization capability is improved; finally, the overall performance is improved through the common cooperation of the modal specific identity loss, the cross entropy loss and the proposed loss function; on the other hand, the invention provides a novel overall constraint and partial triple-center loss, and the function improves the difference between classes and in classes from two aspects of different modes and the same mode respectively, so as to better represent the local characteristics of pedestrians and improve the overall identification performance. The loss function provided by the invention firstly utilizes integral constraint to reduce the difference of different modes; then, by fusing the triple loss and the center loss, the difference between different types in the same modality is expanded, so that the samples in the same type are closer to the center of the sample and are far away from the centers of other types. In addition, due to the limited amount of image data, random horizontal flipping and random erasure of enhancement data are used to augment the training data during training. The experimental results on the two common data sets SYSU-MM01 and RegDB show that the method proposed herein achieves excellent recognition performance.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and all equivalent variations made by using the contents of the present specification and the drawings are within the protection scope of the present invention.

Claims (5)

1. A cross-modal pedestrian re-identification method based on integral and partial constraints is characterized by comprising the following steps:
s1, extracting pedestrian information characteristics under different modes from the RGB image and the infrared image under the same scene by using two independent and same branch networks respectively;
s2, uniformly dividing the extracted features into P horizontal components from top to bottom, projecting the P horizontal components to a common space, and outputting a joint representation of the modal specific features and the modal common features;
s3, constructing a multi-loss function, mixing and crossing the combined features by using the multi-loss function, and reducing the image difference between the infrared mode and the RGB mode through mode distance constraint so as to obtain the optimal recognition performance.
2. The method of claim 1, wherein the pedestrian cross-modal re-identification method based on whole and partial constraints is characterized in that,
the multiple loss function is:
Figure FDA0003499988850000011
therein
Figure FDA0003499988850000012
And
Figure FDA0003499988850000013
respectively, representing the loss of identity, L, of the particular modality softmax of the RGB branch and the infrared branchCERepresents the cross entropy loss, LWCPTLRepresenting the overall constraint and the partial triplet-center loss function; λ represents the pre-training coefficients used to balance the overall loss function.
3. The method of claim 2, wherein the pedestrian cross-modal re-identification method based on whole and partial constraints is characterized in that,
the overall constraint process comprises two steps: firstly, expanding the distance between different pedestrian samples in the same mode, and simultaneously reducing the distance between the same pedestrian sample in the RGB mode and the infrared mode; then, the distance between the same pedestrian samples in the two modes is continuously reduced, the similarity of the identity of the pedestrian is improved, and the difference between different pedestrian samples in the modes is reduced; given the depth characteristics of pedestrians in different modes
Figure FDA0003499988850000014
Wherein i is more than or equal to 1 and less than or equal to N,
Figure FDA0003499988850000015
Figure FDA0003499988850000016
Figure FDA0003499988850000017
and
Figure FDA0003499988850000018
respectively representing the identity of the ith pedestrian in RGB and infrared modalities,
Figure FDA0003499988850000019
and
Figure FDA00034999888500000110
respectively representing the p-th and q-th identities of the ith pedestrian under RGB and infrared modes, and integrally constraining LWThe specific formula of (2) is as follows:
Figure FDA00034999888500000111
the partial-center triplet loss formula is as follows:
Figure FDA0003499988850000021
wherein L isPRepresents the partial triplet-center loss, xiAnd ziRespectively representing RGB image features and infrared image features, c1yiAnd c2yiRespectively representing the ith class center, y in RGB and infrared modalitiesiAn identity tag representing the ith sample, a represents an offset, N represents a batch number,
Figure FDA0003499988850000022
represents the Euclidean distance, [ x ]]+=max(0,x);
In summary, the overall constraint and partial triplet-center loss function can be expressed as:
LWCPTL=LW+LP
4. the method according to claim 2, wherein the method comprises the following steps:
modality specific identity loss: because the pedestrian characteristics in the RGB image and the infrared image are very different, different networks are used to obtain the characteristic representation in different modalities, and the Softmax loss is used to predict the pedestrian identity in each modality, and the formula can be expressed as follows:
Figure FDA0003499988850000023
Figure FDA0003499988850000024
in the formula
Figure FDA0003499988850000025
And
Figure FDA0003499988850000026
respectively represent belonging to
Figure FDA0003499988850000027
And
Figure FDA0003499988850000028
the ith RGB image feature and the infrared image feature of the class,
Figure FDA0003499988850000029
and
Figure FDA00034999888500000210
respectively represent the weight W in the last full connection layerVAnd WIJ (th) column of (b)VAnd bIRespectively representing RGB and infrared modal bias, M representing head of the line, NVAnd NIRespectively representing RGB image and infrared image training samples in the same batchThe number of the (c) component(s),
Figure FDA00034999888500000211
and
Figure FDA00034999888500000212
respectively representing the loss of identity functions of the RGB image and the infrared image.
5. The method of claim 2, wherein the pedestrian cross-modal re-identification method based on whole and partial constraints is characterized in that,
the cross entropy loss function is:
Figure FDA00034999888500000213
wherein y isiThe true label representing the ith input image, i.e., p part features of each input image, share the label information of the image.
CN202210124910.5A 2022-02-10 2022-02-10 Cross-modal pedestrian re-identification method based on integral and partial constraints Pending CN114495281A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210124910.5A CN114495281A (en) 2022-02-10 2022-02-10 Cross-modal pedestrian re-identification method based on integral and partial constraints

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210124910.5A CN114495281A (en) 2022-02-10 2022-02-10 Cross-modal pedestrian re-identification method based on integral and partial constraints

Publications (1)

Publication Number Publication Date
CN114495281A true CN114495281A (en) 2022-05-13

Family

ID=81477698

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210124910.5A Pending CN114495281A (en) 2022-02-10 2022-02-10 Cross-modal pedestrian re-identification method based on integral and partial constraints

Country Status (1)

Country Link
CN (1) CN114495281A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019148898A1 (en) * 2018-02-01 2019-08-08 北京大学深圳研究生院 Adversarial cross-media retrieving method based on restricted text space
CN111597876A (en) * 2020-04-01 2020-08-28 浙江工业大学 Cross-modal pedestrian re-identification method based on difficult quintuple
US20200285896A1 (en) * 2019-03-09 2020-09-10 Tongji University Method for person re-identification based on deep model with multi-loss fusion training strategy
CN113569639A (en) * 2021-06-25 2021-10-29 湖南大学 Cross-modal pedestrian re-identification method based on sample center loss function

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019148898A1 (en) * 2018-02-01 2019-08-08 北京大学深圳研究生院 Adversarial cross-media retrieving method based on restricted text space
US20200285896A1 (en) * 2019-03-09 2020-09-10 Tongji University Method for person re-identification based on deep model with multi-loss fusion training strategy
CN111597876A (en) * 2020-04-01 2020-08-28 浙江工业大学 Cross-modal pedestrian re-identification method based on difficult quintuple
CN113569639A (en) * 2021-06-25 2021-10-29 湖南大学 Cross-modal pedestrian re-identification method based on sample center loss function

Similar Documents

Publication Publication Date Title
Jiang et al. CmSalGAN: RGB-D salient object detection with cross-view generative adversarial networks
Liu et al. Enhancing the discriminative feature learning for visible-thermal cross-modality person re-identification
Agnese et al. A survey and taxonomy of adversarial neural networks for text‐to‐image synthesis
CN108537136A (en) The pedestrian's recognition methods again generated based on posture normalized image
CN110580302B (en) Sketch image retrieval method based on semi-heterogeneous joint embedded network
CN111160264B (en) Cartoon character identity recognition method based on generation countermeasure network
CN106960182B (en) A kind of pedestrian's recognition methods again integrated based on multiple features
CN104504362A (en) Face detection method based on convolutional neural network
CN110598018B (en) Sketch image retrieval method based on cooperative attention
CN111539255A (en) Cross-modal pedestrian re-identification method based on multi-modal image style conversion
CN111832511A (en) Unsupervised pedestrian re-identification method for enhancing sample data
CN109492528A (en) A kind of recognition methods again of the pedestrian based on gaussian sum depth characteristic
CN114662497A (en) False news detection method based on cooperative neural network
CN113361474B (en) Double-current network image counterfeiting detection method and system based on image block feature extraction
CN112001279A (en) Cross-modal pedestrian re-identification method based on dual attribute information
CN115690669A (en) Cross-modal re-identification method based on feature separation and causal comparison loss
CN108986103A (en) A kind of image partition method merged based on super-pixel and more hypergraphs
CN113722528B (en) Method and system for rapidly retrieving photos for sketch
CN116778530A (en) Cross-appearance pedestrian re-identification detection method based on generation model
CN114495281A (en) Cross-modal pedestrian re-identification method based on integral and partial constraints
Gong et al. Person re-identification based on two-stream network with attention and pose features
Li et al. AR-CNN: an attention ranking network for learning urban perception
Yang et al. SSRR: Structural Semantic Representation Reconstruction for Visible-Infrared Person Re-Identification
CN114519897A (en) Human face in-vivo detection method based on color space fusion and recurrent neural network
Zhang et al. Triplet interactive attention network for cross-modality person re-identification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination