CN114495281A - Cross-modal pedestrian re-identification method based on integral and partial constraints - Google Patents
Cross-modal pedestrian re-identification method based on integral and partial constraints Download PDFInfo
- Publication number
- CN114495281A CN114495281A CN202210124910.5A CN202210124910A CN114495281A CN 114495281 A CN114495281 A CN 114495281A CN 202210124910 A CN202210124910 A CN 202210124910A CN 114495281 A CN114495281 A CN 114495281A
- Authority
- CN
- China
- Prior art keywords
- pedestrian
- loss
- infrared
- image
- rgb
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 230000006870 function Effects 0.000 claims abstract description 45
- 238000012549 training Methods 0.000 claims abstract description 17
- 230000008569 process Effects 0.000 claims description 6
- 208000032538 Depersonalisation Diseases 0.000 claims description 5
- 238000012512 characterization method Methods 0.000 abstract description 5
- 238000010586 diagram Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 5
- 239000000284 extract Substances 0.000 description 5
- 230000009286 beneficial effect Effects 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 4
- 238000000605 extraction Methods 0.000 description 3
- 238000011176 pooling Methods 0.000 description 3
- 239000000126 substance Substances 0.000 description 3
- 230000009977 dual effect Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000008033 biological extinction Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
According to the overall and partial constraint-based cross-modal pedestrian re-identification method, local pedestrian features are extracted deeply from two different modalities by using a mixed crossed dual-path feature learning network, and then the extracted features are horizontally cut into p components and then are mapped to a public space for horizontally cutting the local features and the global features of an image, so that the pedestrian feature characterization capability is improved; finally, through the common cooperation of the modal specific identity loss, the cross entropy loss and the proposed loss function, the difference between the modalities is reduced, and the overall performance is improved; during training data, random horizontal flipping and random erasure of the enhancement data are used to expand the training data.
Description
Technical Field
The invention belongs to the technical field of pedestrian re-identification, and particularly relates to a cross-modal pedestrian re-identification method based on integral and partial constraints.
Background
Pedestrian re-identification is a specific pedestrian retrieval task that uses computer vision techniques to determine whether a particular pedestrian is present in an image or video. In recent years, with the continuous development of society, people pay more and more attention to public safety problems, and the pedestrian re-identification arouses great research interest. At present, most of researches mainly process human images shot by a visible camera, however, the methods have many limitations. For example, many criminal events occur at night and conventional video cameras cannot capture clear images. Thus, these methods are not effective in the case of insufficient light.
Most of the existing research focuses on pedestrian re-identification in a visual modality, and compared with a visual image, an infrared image lacks rich color information, so that a common pedestrian re-identification method based on a visible light image is not feasible in the infrared image. It was found from a search of prior art documents that Wu et al presented a large scale cross modal pedestrian re-identification dataset named SYSY-MM01, while they evaluated three common neural network structures: single stream, dual stream and asymmetric fully connected layer, and deep zero padding is proposed for training single stream networks. Zheng et al introduced a joint learning framework that coupled end-to-end pedestrian re-identification learning and data generation to solve the infrared-visible light pedestrian re-identification problem. Mang et al propose a dynamic double attention clustering learning framework that avoids the learning of models from being easily disturbed by noise and becoming unstable. Li et al adds an X modality in the network to account for the modality difference. Many existing methods focus on reducing the difference between infrared and visual modalities, and recognition accuracy is less than ideal in the case of cable sharing. The method solves the problem of cross-modal pedestrian re-identification to a certain extent, but still has the defects.
Therefore, the cross-modal pedestrian re-identification has the following problems to be solved urgently, (1) the difference between the visible mode and the infrared mode is large, and the existing method has a certain effect but still has a large rising space; (2) and the cross-mode pedestrian re-identification data set is less, so that the training data is insufficient. The pedestrian identification method is not only a problem in cross-modal pedestrian identification, but also a common problem in pedestrian identification, the academic world does not have a data set with complex scene and large scale for research, and the industrial world has a large amount of data but cannot be sourced due to privacy problems.
Disclosure of Invention
In order to solve the problems, the invention provides a hybrid cross dual-path feature learning network (HCDFL), which deeply extracts local pedestrian features from two different modes; the method has the advantages that a novel overall constraint function and a partial triple-center loss function are utilized, inter-class and intra-class differences are improved from two aspects of different modes and the same mode respectively, local features of pedestrians are better represented, and overall recognition performance is improved; meanwhile, random horizontal flipping and random erasure of the enhancement data are used to expand the training data.
The invention relates to a cross-modal pedestrian re-identification method based on integral and partial constraints, which comprises the following steps:
s1, extracting pedestrian information characteristics under different modes from the RGB image and the infrared image under the same scene by using two independent and same branch networks respectively;
s2, uniformly dividing the extracted features into P horizontal components from top to bottom, projecting the P horizontal components to a public space, and outputting a joint representation of the modal specific features and the modal common features;
s3, constructing a multi-loss function, wherein the multi-loss function comprises mode specific identity loss, cross entropy loss, proposed overall constraint and partial triple-center loss, mixing and crossing the combined features by using the multi-loss function, and reducing image difference between the infrared mode and the RGB mode through mode distance constraint so as to obtain the optimal recognition performance.
Further, the multiple loss function is:
thereinAndrespectively, representing the loss of identity, L, of the particular modality softmax of the RGB branch and the infrared branchCERepresents the cross entropy loss, LWCPTLRepresenting the overall constraint and the partial triplet-center loss function; λ represents the pre-training coefficients used to balance the overall loss function.
Further, the overall constraint process in the loss function includes two steps: firstly, the distance between different pedestrian samples in the same mode is enlarged, and the distance between the same pedestrian sample in the RGB mode and the same pedestrian sample in the infrared mode is reduced; then, the distance between the same pedestrian samples in the two modes is continuously reduced, the similarity of the identity of the pedestrian is improved, and the difference between different pedestrian samples in the modes is reduced; given the depth characteristics of pedestrians in different modesWherein i is more than or equal to 1 and less than or equal to N,andrespectively representing the identity of the ith pedestrian in RGB and infrared modalities,andrespectively representing the p-th identity and the q-th identity of the ith pedestrian under RGB and infrared modes, and the specific formula of the integral constraint LW is as follows:
the partial-center triplet loss formula is as follows:
where LP represents a partial triplet-center loss, xiAnd ziRespectively representing RGB image features and infrared image features, c1yiAnd c2yiRespectively representing the ith class center, y in RGB and infrared modalitiesiAn identity tag representing the ith sample, a represents an offset, N represents a batch number,represents the Euclidean distance, [ x ]]+=max(0,x);
In summary, the overall constraint and partial triplet-center loss function can be expressed as:
LWCPTL=LW+LP。
further, modality specific identity loss: because the pedestrian characteristics in the RGB image and the infrared image are very different, different networks are used to obtain the characteristic representation in different modalities, and the Softmax loss is used to predict the pedestrian identity in each modality, and the formula can be expressed as follows:
in the formulaAndrespectively represent belonging toAndthe ith RGB image feature and the infrared image feature of the class,andrespectively represent the weight W in the last full connection layerVAnd WIJ (th) column of (b)VAnd bIRespectively representing RGB and infrared modal bias, M representing head of the line, NVAnd NIRespectively representing the number of RGB image and infrared image training samples in the same batch,andrespectively representing the loss of identity functions of the RGB image and the infrared image.
Further, in order to make the feature characterization of the same pedestrian have similarity under different modes, a cross entropy loss function shown as follows is introduced:
wherein yiThe true label representing the ith input image, i.e., p part features of each input image, share the label information of the image.
The invention has the beneficial effects that: the invention discloses a cross-modal pedestrian re-identification method based on integral and partial constraints, and provides a hybrid cross dual-path feature learning network, which is provided with a modal sharing parameter layer and a modal unique parameter layer for carrying out feature extraction on pedestrian pictures of different modalities; secondly, the network horizontally cuts the pedestrian features, so that local features and global features of the image can be better learned, and the pedestrian feature characterization capability is improved; meanwhile, aiming at the model embedding layer, the network forms a plurality of different batch combinations by cross-combining the features, which is beneficial to feature matching and modal distance constraint; when a loss function is designed, consistency constraint of feature distribution in different modal data classes and a correlation constraint criterion between the classes are fully considered, and a novel overall constraint and partial triple-center loss function are provided for improving modal differences and enabling samples of the same class to be closer to class centers and far away from other class centers; during training data, random horizontal flipping and random erasure of the enhancement data are used to expand the training data.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is an end-to-end block diagram illustration of cross-modal pedestrian re-identification based on global constraints and partial triplet-center loss in accordance with the present invention;
FIG. 3 is a schematic diagram of the combination of triple loss, center loss, and softmax loss, respectively;
FIG. 4 is a schematic diagram of the overall constraint of the present invention;
FIG. 5 is a schematic of the partial-triplet center loss of the present invention;
FIG. 6 is a diagram illustrating the recognition effect of the present invention on the SYSU-MM01 and RegDB data sets.
Detailed Description
In order that the present invention may be more readily and clearly understood, a more particular description of the invention briefly described above will be rendered by reference to specific embodiments that are illustrated in the appended drawings.
With reference to fig. 1 and fig. 2, the method provided by the present invention first extracts pedestrian information in different modalities by using RGB branches and infrared branches of which the main networks are ResNet50, and uniformly divides the extracted features into p horizontal components from top to bottom by using an average pooling layer; then, projecting the horizontal cutting features to a common space, and outputting a joint representation of the modality specific features and the modality common features; and finally, mixing and crossing the joint features by using the modal specific identity loss, the cross entropy loss, the proposed overall constraint and the proposed partial triple-center loss, and obtaining the optimal recognition performance through modal distance constraint.
The overall constraint and partial triple-center loss proposed by the invention firstly integrally constrain the distances between different modes, thereby reducing the difference between RGB and infrared modes; secondly, the loss function learns the centers of the RGB mode and the infrared mode respectively by combining the triple loss and the center loss, so that the same class sample is closer to the class center and is far away from other class centers, and the intra-mode class difference is improved.
1. Hybrid cross dual path feature learning network
In visible single-mode-based pedestrian re-identification, a common method is to horizontally segment a pedestrian image, extract local features, and then perform feature matching on the pedestrian image, but in infrared-visible light pedestrian re-identification, an image shot by an infrared camera is greatly different from a visible image; the infrared image retains only inherent features such as the overall appearance of the pedestrian and the posture information of the pedestrian, but loses important information such as color and illumination. Therefore, the conventional method cannot be used to solve the infrared-visible light pedestrian re-recognition problem.
As shown in fig. 2, a dual-stream structure is used as a basic structure, mainly because the single-stream structure uses a common feature extraction network, and features of RGB and infrared images cannot be accurately extracted in the process; in addition, the single-flow structure shares global parameters, so that the local characteristics of pedestrians are seriously ignored. In the double-flow structure, the shallow network parameters are individually specific to each mode, and the deep network parameters are shared, so that the local characteristics and the global characteristics are considered, and the identification performance is improved. Therefore, the invention adopts the traditional two-way local feature network, and the network consists of a feature extractor and a feature embedding part.
The infrared-visible pedestrian re-identification dataset may be represented as D ═ { V, I }, where V represents an RGB image and I represents an infrared image.
In the feature extraction stage, after the features of the backbone network ResNet50 are extracted, the corresponding pedestrian features are respectively obtained, and the final average pooling layer and the network with the subsequent structure are removed, so that the purposes of expanding the area of the receiving domain and enriching the feature granularity are achieved. Particularly, the two branches use the same network structure, and the design can enable the high-level feature output to express high-level semantics better and enable the identity discrimination capability of the feature to be stronger.
In the characteristic embedding stage, firstly, horizontally dividing the pedestrian characteristics into p identical components for learning a low-dimensional embedding space between two heterogeneous modes; then, using the global pooling layer on each part, p 2048-dimensional features are obtained. In order to further reduce the feature dimension, dimension reduction operation is carried out on each 2048-dimensional component feature by adopting a 1 × 1 convolution layer, and finally 256-dimensional feature expression is obtained;
meanwhile, in order to avoid gradient extinction and calculate internal covariant offset, a batch standardization layer is added behind each full-connection layer; finally, the shared layer is used as a projection function to project the characteristics of two different modes to a common embedding space so as to close the difference between the two modes.
In the training stage, the network model is trained by combining the modal specific identity loss, the cross entropy loss, the proposed overall constraint and the proposed partial triplet-center loss so as to improve the accuracy of recognition. And dividing the joint representation characteristics of the RGB branches and the infrared branches into three groups by using mixed cross training, wherein the three groups are respectively part constraint, integral constraint and cross entropy loss, and the part constraint and the integral constraint form the proposed integral constraint and part triplet-center loss function. In the testing stage, the characteristics of the detection image and the gallery image are respectively extracted and then connected with the characteristics of the high-dimensional image to form a final characteristic descriptor.
After pedestrian features are respectively extracted through a backbone network by a traditional double-path feature learning network, the features are fused through a weight sharing module and are directly output, and cross-modal information is tried to be directly learned from two original modes. The results of the relevant experiments show that these methods are not sufficient to narrow the gap between the two modalities. The network provided by the invention combines pedestrian characteristics in a cross way to form a plurality of different batch combinations and combines multiple loss functions to cooperate together. The characteristic cross combination mode is beneficial to balancing the expression learning capacity of the model aiming at the characteristic characteristics and the shared characteristics of different modal data, and the matching capacity among the multimodal data is effectively improved.
The mode specific loss function directly utilizes mode information and reserves the most original pedestrian characteristics; the cross entropy loss is used for identifying the identity of the pedestrian, and RGB and infrared modal characteristics are extracted to form a batch; within the same batch, the characteristics of the RGB image and the infrared image have consistency, so that pairs of batchs are respectively constructed by using partial constraint and integral constraint.
Constructing a multi-loss function by joint cooperation as shown in formula (1), wherein the multi-loss function comprises modal specific identity loss, cross entropy loss, overall constraint loss and partial triple-center loss; the overall loss function of the proposed framework can be expressed as:
wherein the content of the first and second substances,andrespectively, representing the loss of identity, L, of the particular modality softmax of the RGB branch and the infrared branchceRepresents the cross entropy loss LWCPTRepresenting the overall constraint and the partial triplet-center loss function. λ represents the pre-training coefficients used to balance the overall loss function.
Modality specific identity loss: due to the fact that pedestrian characteristics in the RGB image and the infrared image are different greatly, different networks are used for obtaining characteristic representations in different modes. The Softmax loss is used to predict pedestrian identity in each modality, and the formula can be expressed as:
wherein the content of the first and second substances,andrespectively represent belonging toAndthe ith RGB image feature and the infrared image feature of the class,andrespectively represent the weight W in the last full connection layerVAnd WIJ (th) column of (b)VAnd bIRespectively representing RGB and infrared modal bias, M representing head of the line, NVAnd NIRespectively representing the number of RGB image and infrared image training samples in the same batch.
Cross entropy loss: in order to make the feature characterization of the same pedestrian have similarity under different modes, a cross entropy loss function shown as follows is introduced:
i.e. p part features of each input image share the label information of the image.
2. Global constraint and partial triplet-centric loss
The invention provides a novel overall constraint and partial triple-center loss, and the function improves the difference between classes and in classes from two aspects of different modes and the same mode respectively and improves the overall identification performance.
The triple loss function is often applied to the fields of face recognition, pedestrian re-recognition and the like, and not only has the characteristic of shortening the intra-class distance, but also has the characteristic of increasing the inter-class distance; for the infrared-visible light pedestrian re-identification task, the pedestrian images have the inter-class distance in the same mode and have the inter-class distances in different modes.
The triplet loss function is formulated as follows:
wherein the content of the first and second substances,respectively representing the feature representations of the anchor point, the positive sample image and the negative sample image,andis the same as the identity information of (2), andandis different, alpha represents an offset, N represents a batch number,represents the Euclidean distance, [ x ]]+=max(0,x)。
As can be seen from fig. 3(a), although the two loss functions are combined to achieve a good effect, the data distribution is not uniform, and the model performance is not stable. The center loss is firstly applied to the field of face recognition, is used for constraining the distance between a sample and the center of the class of the sample, and learns a center for each class. The central loss function is formulated as follows:
wherein x isiFor the characteristic representation, yiTo correspond to xiClass (c) ofyiRepresents a category yiM, represents the minimum batch size,representing the euclidean distance.
As can be seen in connection with fig. 3(b), the key to the overall constraint loss learning of features between modalities is to narrow the cross-modality differences. Due to the drastic visual change, the cross-modal differences may be large, which will greatly reduce the pedestrian re-recognition performance, and thus the cross-modal differences need to be reduced as a whole.
With reference to fig. 4, the overall constraint process in the loss function proposed by the present invention includes two steps: firstly, expanding the distance between different pedestrian samples in the same mode, and simultaneously reducing the distance between the same pedestrian sample in the RGB mode and the infrared mode; then, the distance between the same pedestrian samples in the two modes is continuously reduced, the similarity of the identity of the pedestrian is improved, and the difference between different pedestrian samples in the modes is reduced. Given the depth characteristics of pedestrians in different modesWherein i is more than or equal to 1 and less than or equal to N,andrespectively representing the identity of the ith pedestrian in RGB and infrared modalities,andrespectively representing the p-th identity and the q-th identity of the ith pedestrian under RGB and infrared modes, and the specific formula is as follows:
with reference to fig. 5, by combining two loss functions, samples in two modalities can be considered at the same time, which is beneficial to reduce intra-class differences in the same modality, reduce differences between modalities, and improve recognition accuracy.
The partial-center triplet loss formula is as follows:
wherein x isiAnd ziRepresenting RGB image features and infrared image features, respectively, c1yiAnd c2yiRespectively representing the ith class center, y in RGB and infrared modalitiesiAn identity tag representing the ith sample, a represents an offset, N represents a batch number,represents the Euclidean distance, [ x ]]+=max(0,x)。
In summary, the overall constraint and partial triplet-center loss function can be expressed as:
LWCPTL=LW+LP (8)
the result shown in fig. 6 is a schematic diagram of the recognition effect of the method of the present invention on the SYSU-MM01 and RegDB data sets, and it can be seen from the diagram that the recognition effect of the method is ideal and the recognition accuracy is high.
In summary, the present invention addresses the problems in cross-modal pedestrian re-identification, and on one hand, proposes a hybrid cross-over dual-path feature learning network (HCDFL) that deeply extracts local pedestrian features from two different modalities. The network model firstly extracts pedestrian features under different modes, then horizontally cuts the extracted features into p components and maps the components to a public space, and the components are used for horizontally cutting local features and global features of an image, so that the pedestrian feature characterization capability is improved; finally, the overall performance is improved through the common cooperation of the modal specific identity loss, the cross entropy loss and the proposed loss function; on the other hand, the invention provides a novel overall constraint and partial triple-center loss, and the function improves the difference between classes and in classes from two aspects of different modes and the same mode respectively, so as to better represent the local characteristics of pedestrians and improve the overall identification performance. The loss function provided by the invention firstly utilizes integral constraint to reduce the difference of different modes; then, by fusing the triple loss and the center loss, the difference between different types in the same modality is expanded, so that the samples in the same type are closer to the center of the sample and are far away from the centers of other types. In addition, due to the limited amount of image data, random horizontal flipping and random erasure of enhancement data are used to augment the training data during training. The experimental results on the two common data sets SYSU-MM01 and RegDB show that the method proposed herein achieves excellent recognition performance.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and all equivalent variations made by using the contents of the present specification and the drawings are within the protection scope of the present invention.
Claims (5)
1. A cross-modal pedestrian re-identification method based on integral and partial constraints is characterized by comprising the following steps:
s1, extracting pedestrian information characteristics under different modes from the RGB image and the infrared image under the same scene by using two independent and same branch networks respectively;
s2, uniformly dividing the extracted features into P horizontal components from top to bottom, projecting the P horizontal components to a common space, and outputting a joint representation of the modal specific features and the modal common features;
s3, constructing a multi-loss function, mixing and crossing the combined features by using the multi-loss function, and reducing the image difference between the infrared mode and the RGB mode through mode distance constraint so as to obtain the optimal recognition performance.
2. The method of claim 1, wherein the pedestrian cross-modal re-identification method based on whole and partial constraints is characterized in that,
the multiple loss function is:
thereinAndrespectively, representing the loss of identity, L, of the particular modality softmax of the RGB branch and the infrared branchCERepresents the cross entropy loss, LWCPTLRepresenting the overall constraint and the partial triplet-center loss function; λ represents the pre-training coefficients used to balance the overall loss function.
3. The method of claim 2, wherein the pedestrian cross-modal re-identification method based on whole and partial constraints is characterized in that,
the overall constraint process comprises two steps: firstly, expanding the distance between different pedestrian samples in the same mode, and simultaneously reducing the distance between the same pedestrian sample in the RGB mode and the infrared mode; then, the distance between the same pedestrian samples in the two modes is continuously reduced, the similarity of the identity of the pedestrian is improved, and the difference between different pedestrian samples in the modes is reduced; given the depth characteristics of pedestrians in different modesWherein i is more than or equal to 1 and less than or equal to N, andrespectively representing the identity of the ith pedestrian in RGB and infrared modalities,andrespectively representing the p-th and q-th identities of the ith pedestrian under RGB and infrared modes, and integrally constraining LWThe specific formula of (2) is as follows:
the partial-center triplet loss formula is as follows:
wherein L isPRepresents the partial triplet-center loss, xiAnd ziRespectively representing RGB image features and infrared image features, c1yiAnd c2yiRespectively representing the ith class center, y in RGB and infrared modalitiesiAn identity tag representing the ith sample, a represents an offset, N represents a batch number,represents the Euclidean distance, [ x ]]+=max(0,x);
In summary, the overall constraint and partial triplet-center loss function can be expressed as:
LWCPTL=LW+LP。
4. the method according to claim 2, wherein the method comprises the following steps:
modality specific identity loss: because the pedestrian characteristics in the RGB image and the infrared image are very different, different networks are used to obtain the characteristic representation in different modalities, and the Softmax loss is used to predict the pedestrian identity in each modality, and the formula can be expressed as follows:
in the formulaAndrespectively represent belonging toAndthe ith RGB image feature and the infrared image feature of the class,andrespectively represent the weight W in the last full connection layerVAnd WIJ (th) column of (b)VAnd bIRespectively representing RGB and infrared modal bias, M representing head of the line, NVAnd NIRespectively representing RGB image and infrared image training samples in the same batchThe number of the (c) component(s),andrespectively representing the loss of identity functions of the RGB image and the infrared image.
5. The method of claim 2, wherein the pedestrian cross-modal re-identification method based on whole and partial constraints is characterized in that,
the cross entropy loss function is:
wherein y isiThe true label representing the ith input image, i.e., p part features of each input image, share the label information of the image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210124910.5A CN114495281A (en) | 2022-02-10 | 2022-02-10 | Cross-modal pedestrian re-identification method based on integral and partial constraints |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210124910.5A CN114495281A (en) | 2022-02-10 | 2022-02-10 | Cross-modal pedestrian re-identification method based on integral and partial constraints |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114495281A true CN114495281A (en) | 2022-05-13 |
Family
ID=81477698
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210124910.5A Pending CN114495281A (en) | 2022-02-10 | 2022-02-10 | Cross-modal pedestrian re-identification method based on integral and partial constraints |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114495281A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019148898A1 (en) * | 2018-02-01 | 2019-08-08 | 北京大学深圳研究生院 | Adversarial cross-media retrieving method based on restricted text space |
CN111597876A (en) * | 2020-04-01 | 2020-08-28 | 浙江工业大学 | Cross-modal pedestrian re-identification method based on difficult quintuple |
US20200285896A1 (en) * | 2019-03-09 | 2020-09-10 | Tongji University | Method for person re-identification based on deep model with multi-loss fusion training strategy |
CN113569639A (en) * | 2021-06-25 | 2021-10-29 | 湖南大学 | Cross-modal pedestrian re-identification method based on sample center loss function |
-
2022
- 2022-02-10 CN CN202210124910.5A patent/CN114495281A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019148898A1 (en) * | 2018-02-01 | 2019-08-08 | 北京大学深圳研究生院 | Adversarial cross-media retrieving method based on restricted text space |
US20200285896A1 (en) * | 2019-03-09 | 2020-09-10 | Tongji University | Method for person re-identification based on deep model with multi-loss fusion training strategy |
CN111597876A (en) * | 2020-04-01 | 2020-08-28 | 浙江工业大学 | Cross-modal pedestrian re-identification method based on difficult quintuple |
CN113569639A (en) * | 2021-06-25 | 2021-10-29 | 湖南大学 | Cross-modal pedestrian re-identification method based on sample center loss function |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Jiang et al. | CmSalGAN: RGB-D salient object detection with cross-view generative adversarial networks | |
Liu et al. | Enhancing the discriminative feature learning for visible-thermal cross-modality person re-identification | |
Agnese et al. | A survey and taxonomy of adversarial neural networks for text‐to‐image synthesis | |
CN108537136A (en) | The pedestrian's recognition methods again generated based on posture normalized image | |
CN110580302B (en) | Sketch image retrieval method based on semi-heterogeneous joint embedded network | |
CN111160264B (en) | Cartoon character identity recognition method based on generation countermeasure network | |
CN106960182B (en) | A kind of pedestrian's recognition methods again integrated based on multiple features | |
CN104504362A (en) | Face detection method based on convolutional neural network | |
CN110598018B (en) | Sketch image retrieval method based on cooperative attention | |
CN111539255A (en) | Cross-modal pedestrian re-identification method based on multi-modal image style conversion | |
CN111832511A (en) | Unsupervised pedestrian re-identification method for enhancing sample data | |
CN109492528A (en) | A kind of recognition methods again of the pedestrian based on gaussian sum depth characteristic | |
CN114662497A (en) | False news detection method based on cooperative neural network | |
CN113361474B (en) | Double-current network image counterfeiting detection method and system based on image block feature extraction | |
CN112001279A (en) | Cross-modal pedestrian re-identification method based on dual attribute information | |
CN115690669A (en) | Cross-modal re-identification method based on feature separation and causal comparison loss | |
CN108986103A (en) | A kind of image partition method merged based on super-pixel and more hypergraphs | |
CN113722528B (en) | Method and system for rapidly retrieving photos for sketch | |
CN116778530A (en) | Cross-appearance pedestrian re-identification detection method based on generation model | |
CN114495281A (en) | Cross-modal pedestrian re-identification method based on integral and partial constraints | |
Gong et al. | Person re-identification based on two-stream network with attention and pose features | |
Li et al. | AR-CNN: an attention ranking network for learning urban perception | |
Yang et al. | SSRR: Structural Semantic Representation Reconstruction for Visible-Infrared Person Re-Identification | |
CN114519897A (en) | Human face in-vivo detection method based on color space fusion and recurrent neural network | |
Zhang et al. | Triplet interactive attention network for cross-modality person re-identification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |