CN116434143A

CN116434143A - Cross-modal pedestrian re-identification method and system based on feature reconstruction

Info

Publication number: CN116434143A
Application number: CN202310406803.6A
Authority: CN
Inventors: 陈思; 邱刘翔; 王大寒; 朱顺痣; 吴芸; 许华荣
Original assignee: Xiamen University of Technology
Current assignee: Xiamen University of Technology
Priority date: 2023-04-17
Filing date: 2023-04-17
Publication date: 2023-07-14

Abstract

The invention relates to a cross-mode pedestrian re-identification method based on feature reconstruction, which comprises the following steps: 1) Extracting visible light pictures and infrared pictures of a plurality of pedestrians in pairs from the data set to form a visible light training data set and an infrared training data set; 2) Constructing a cross-modal pedestrian re-recognition network model based on feature reconstruction, wherein the cross-modal pedestrian re-recognition network model mainly comprises a specific feature extraction module, a multi-scale feature extraction module, a Token perception multi-scale feature fusion module and a cross-modal feature reconstruction module; training a cross-mode pedestrian re-recognition network model through a visible light training data set and an infrared training data set to obtain generalizable model parameters; 3) And using the trained cross-modal pedestrian re-recognition network model for cross-modal retrieval to realize cross-modal pedestrian re-recognition. The method and the system are beneficial to obtaining a more stable, robust and accurate cross-mode pedestrian re-identification result.

Description

Cross-modal pedestrian re-identification method and system based on feature reconstruction

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a cross-mode pedestrian re-identification method and system based on feature reconstruction.

Background

Pedestrian re-recognition is one of the key technologies of the intelligent video monitoring system, and aims to search out the same pedestrians in a non-intersecting camera system. The application scene of pedestrian re-identification is very wide, such as large-scale places like airports, malls, campuses and the like. Previously, there have been many efforts focused on pedestrian re-recognition tasks in visible light scenes, but not considering illumination changes in real scenes. The current camera system can be automatically switched to a visible light or infrared light mode according to real-time illumination conditions so as to ensure all-weather monitoring. Less work is focused on the task of cross-modal pedestrian re-recognition, which refers to finding infrared (visible light) images of the same pedestrians in an infrared (visible light) search library according to images to be searched for in visible light (infrared light). The cross-mode pedestrian re-recognition not only aims to solve the problems encountered by common pedestrian re-recognition such as variable pedestrian gestures, shielding, camera angle difference, background disorder, illumination change and the like, but also solves the problem of mode difference between images.

At present, the cross-mode pedestrian re-identification can be divided into two types: image-based methods and feature-based methods. For image-based methodsIn other words, it is intended to transfer one modality image to another modality image. Li et al (D.Li, X.Wei, X.Hong, Y.Gong, infrared-visible cross-model person re-identification with an X modality, in: proceedings oftheAAAI Conference onArtificial Intelligence,2020, pp.4610-4617.) designed a lightweight shared network for learning model cues in visible light pictures and then using these cues to generate intermediate modality images. Wang et al (Z.Wang, Z.Wang, Y.Zheng, Y.Chuang, S.Satoh, learning to reduce dual-level discrepancy for infrared-visible person re-identification, in: proceedings of the IEEE Conference on ComputerVision and Pattern Recognition,2019, pp.618-626.) propose D ² The RL model extracts the mode information of pedestrians under different modes by adopting an anti-learning method, and then mutually transfers the learned mode information through a generation network to form an intermediate mode pedestrian picture which is used as an additional mode picture to be provided for the network for learning, so that the mode difference is reduced. However, by adopting the method of countermeasure learning, not only the generated pictures have the characteristics of discontinuity and loss of semantic information, but also the problem that the network is difficult to converge is generated.

For the method based on feature learning, the purpose is to learn the features shared by pedestrians in different modes, so that the negative influence caused by the mode difference is reduced. Currently, in order to be able to obtain robust pedestrian sharing features, many methods employ convolutional networks or transfomer networks as the underlying backbone networks. A simple but high performance network based on CNN with heterocenter loss was designed to reduce intra-class cross-modal differences, as in Zhu et al (Y.Zhu, Z.Yang, L.Wang, S.Zhao, X.Hu, D.Tao, hetero-center loss for cross-modality person re-identification, neuro-excitation 386 (2020) 97-109), to obtain pedestrian discrimination characteristics. Furthermore Liang et al (T.Liang, Y.Jin, Y.Gao, W.Liu, S.Feng, T.Wang, Y.Li, CMTR: cross-modality Transformer for visible-infrared person re-identification, arXiv preprint arXiv:2110.08994 (2021)) introduced a pure Transformer structure into Cross-modality pedestrian re-identification to discover features of distinguishing significance to the pedestrian. The hybrid model of convolution and Transfomer also compensates for the lack of long-range modeling capability of convolutional networks and the insensitivity of Transfomer to local features. Chen et al (C.Chen, M.Ye, M.Qi, J.Wu, J.Jiang, C.Lin, structure-aware positional Transformer for visible-infrared person re-identification, IEEE Trans. Image Process.31 (2022) 2352-2364.) propose a Structure-aware position transducer model SPOT in combination with CNN to explore the structural features of the human body under different modalities, so as to obtain the characteristics of unchanged modalities. However, the existing cross-mode pedestrian re-identification method is still deficient in multi-scale feature extraction. In addition, the connection of pedestrian characteristics in different modes is not well explored.

Disclosure of Invention

The invention aims to provide a cross-mode pedestrian re-identification method and system based on feature reconstruction, which are beneficial to obtaining a more stable, robust and accurate cross-mode pedestrian re-identification result.

In order to achieve the above purpose, the invention adopts the following technical scheme: a cross-mode pedestrian re-identification method based on feature reconstruction comprises the following steps:

1) Extracting visible light pictures and infrared pictures of a plurality of pedestrians in pairs from the data set to form a visible light training data set and an infrared training data set;

2) Constructing a cross-modal pedestrian re-recognition network model based on feature reconstruction, wherein the cross-modal pedestrian re-recognition network model mainly comprises a specific feature extraction module, a multi-scale feature extraction module, a Token perception multi-scale feature fusion module and a cross-modal feature reconstruction module; training a cross-mode pedestrian re-recognition network model through a visible light training data set and an infrared training data set to obtain generalizable model parameters;

3) And using the trained cross-modal pedestrian re-recognition network model for cross-modal retrieval to realize cross-modal pedestrian re-recognition.

Further, in step 1), the dataset is a RegDB cross-mode pedestrian re-recognition dataset, and M visible light pictures and M infrared pictures of N pedestrians are extracted from the RegDB cross-mode pedestrian re-recognition dataset in a paired manner.

Further, in step 2), the implementation method of the cross-mode pedestrian re-recognition network model is as follows:

a) The method comprises the steps that pedestrian features are extracted from an input visible light picture and an input infrared picture through independent specific feature extraction modules respectively, and then the extracted pedestrian features are input into a multi-scale feature extraction module at the same time;

b) The multi-scale feature extraction module is used for extracting multi-scale pedestrian features of the visible light picture and the infrared picture through a plurality of feature extraction modules with different scales;

c) The method comprises the steps that multi-scale pedestrian features are sent to a Token-perceived multi-scale feature fusion module, the Token-perceived multi-scale feature fusion module models the relation between the multi-scale pedestrian features by adopting bidirectional interaction from local and global view angles of a learnable Token sequence, and interference of pedestrian irrelevant features under different scales is reduced; repeating the local and global visual angle bidirectional interaction process for a plurality of times to obtain a final visible light and infrared multi-scale characteristic relation graph and a visible light and infrared Token sequence containing multi-scale information;

d) Combining the obtained multi-scale characteristic relation diagram with the original pedestrian characteristics, sending the multi-scale characteristic relation diagram to a last characteristic extraction module of a multi-scale characteristic extraction module for further characteristic learning, and then carrying out pooling and horizontal segmentation to obtain visible light and infrared global characteristics and local characteristics of pedestrians;

e) Inputting visible light and infrared global features and local features of pedestrians and visible light and infrared Token sequences containing multi-scale information into a cross-modal feature reconstruction module to reconstruct cross-modal features and discover the connection of the features of pedestrians under different modes;

f) In order to reduce noise generated by pedestrian features in the reconstruction process, feature reconstruction loss is constructed, loss calculation is performed on the reconstructed features and target modal features, and errors of the reconstructed features and the target modal features are minimized through an optimizer so as to enhance the connection of the features between the two modalities.

Further, in the step B), the multi-scale feature extraction module comprises four feature extraction modules Stage-1, stage-2, stage-3 and Stage-4; the pedestrian feature size extracted by the specific feature extraction module is 3×288×144, the feature map size is 256×72×36 after passing through the first feature extraction module Stage-1, the feature map size is 512×36×18 after passing through the second feature extraction module Stage-2, and the feature map size is 1024×18×9 after passing through the third feature extraction module Stage-3.

In the step C, the self-adaptive pooling is utilized to unify and then splice different scale features of pedestrians, and the bidirectional mixed structure of convolution and Transformer is utilized to model the multi-scale features of the pedestrians, so that the interference of the independent features of the pedestrians in different scales is reduced; utilizing a learnable Token sequence to perform relation mining on the multi-scale characteristics of pedestrians under local and global view angles; pedestrian multiscale feature M for visible light _vis The process of turning from a local view to a global view is expressed as:

T′ _vis ＝LN(FFN(MHA(T，FL(M _vis )，FL(M _vis ))))+T)

wherein T is represented as a learnable Token sequence, the number is set to be 6, FL represents operation of flattening three-dimensional pedestrian characteristics into two-dimensional characteristics, MHA represents a multi-head attention mechanism, FFN represents a forward feedback operation, and LN represents a layer normalization operation;

the process of turning from the global view to the local view is expressed as:

M′ _vis ＝Conv(RS(MHA(FL(M _vis )，T′ _vis ，T′ _vis )+M _vis )

where Conv denotes a convolution operation and RS denotes an operation of converting a two-dimensional feature into a three-dimensional feature.

Further, in step E), the implementation method of the cross-modal feature reconstruction module is as follows:

utilizing the visible light and infrared Token sequence T 'containing pedestrian multi-scale information obtained in the step C)' _vis Visible light and infrared global features for pedestrians

Local feature->

Reconstructing to enhance the relationship between two modal features, the cross-modal reconstruction of the global feature resulting in a feature->

Expressed as:

wherein Attn represents the mechanism of attention, T' _ir[0] Representing the use of the infrared Token sequence of the first pedestrian, W ^Qh 、

And->

Representing the conversion of the corresponding features into a Query, key and Value matrix; similarly, & gt, in the above formula>

Replaced by

Obtaining the local feature +.>

Further, in the step F), the specific method for constructing the feature reconstruction loss is as follows: calculating the difference between the reconstructed pedestrian characteristics and the target characteristics to obtain characteristic reconstruction loss

Updating the network model with an optimizer, expressed as:

wherein L1 represents Manhattan distance, N _p Representing the number of pedestrian feature level cuts.

The invention also provides a cross-mode pedestrian re-identification system based on characteristic reconstruction, which comprises a memory, a processor and computer program instructions which are stored on the memory and can be run by the processor, wherein the computer program instructions can realize the steps of the method when the processor runs the computer program instructions.

Compared with the prior art, the invention has the following beneficial effects: the method and the system effectively utilize multi-scale feature learning and cross-modal feature reconstruction, can obtain generalized and robust pedestrian features, can effectively solve the problems of posture change and object shielding, and can also relieve the negative influence of model performance reduction caused by modal difference.

Drawings

FIG. 1 is a schematic diagram of a cross-modality pedestrian re-recognition network model based on feature reconstruction in an embodiment of the present invention.

Detailed Description

The invention will be further described with reference to the accompanying drawings and examples.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the present application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments in accordance with the present application. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

The embodiment provides a cross-mode pedestrian re-identification method based on feature reconstruction, which comprises the following steps:

1) And extracting visible light pictures and infrared pictures of a plurality of pedestrians in pairs from the data set to form a visible light training data set and an infrared training data set.

2) The method comprises the steps of constructing a cross-modal pedestrian re-recognition network model based on feature reconstruction, wherein the cross-modal pedestrian re-recognition network model mainly comprises a specific feature extraction module, a multi-scale feature extraction module, a Token perception multi-scale feature fusion module and a cross-modal feature reconstruction module, and the architecture of the cross-modal pedestrian re-recognition network model is shown in figure 1. And training the cross-modal pedestrian re-recognition network model through the visible light training data set and the infrared training data set to obtain generalizable model parameters.

In the step 1), the data set is a RegDB cross-mode pedestrian re-identification data set, and M visible light pictures and M infrared pictures of N pedestrians are extracted from the RegDB cross-mode pedestrian re-identification data set in a paired mode.

In the step 2), the implementation method of the cross-mode pedestrian re-identification network model comprises the following steps:

a) And respectively extracting pedestrian features of the input visible light pictures and infrared pictures through independent specific feature extraction modules, and then inputting the extracted pedestrian features into a multi-scale feature extraction module at the same time.

B) The multi-scale feature extraction module is used for extracting the multi-scale pedestrian features of the visible light picture and the infrared picture through a plurality of feature extraction modules with different scales.

C) The method comprises the steps that multi-scale pedestrian features are sent to a Token-perceived multi-scale feature fusion module, the Token-perceived multi-scale feature fusion module models the relationship between the multi-scale pedestrian features by adopting a small amount of learnable Token sequences from local and global view angles in a bidirectional interaction mode, and interference of pedestrian irrelevant features under different scales is reduced; repeating the local and global visual angle bidirectional interaction process for a plurality of times to obtain a final visible light and infrared multi-scale characteristic relation diagram and a visible light and infrared Token sequence containing multi-scale information.

D) Combining the obtained multi-scale characteristic relation diagram with the original pedestrian characteristics, sending the multi-scale characteristic relation diagram to a last characteristic extraction module of the multi-scale characteristic extraction module for further characteristic learning, and then carrying out pooling and horizontal segmentation to obtain visible light and infrared global characteristics and local characteristics of pedestrians.

E) And inputting visible light and infrared global features and local features of pedestrians and visible light and infrared Token sequences containing multi-scale information into a cross-modal feature reconstruction module to reconstruct the cross-modal features and discover the connection of the features of pedestrians under different modes.

In the step B), the multi-scale feature extraction module comprises four feature extraction modules Stage-1, stage-2, stage-3 and Stage-4; the pedestrian feature size extracted by the specific feature extraction module is 3×288×144, the feature map size is 256×72×36 after passing through the first feature extraction module Stage-1, the feature map size is 512×36×18 after passing through the second feature extraction module Stage-2, and the feature map size is 1024×18×9 after passing through the third feature extraction module Stage-3.

In the step C, the adaptive pooling is utilized to scale and unify the different scale features of pedestrians, then the features are spliced, the bidirectional mixed structure of convolution and Transformer is utilized to model the multi-scale features of the pedestrians, and the interference of the independent features of the pedestrians under different scales is reduced. And utilizing the learnable Token sequence to perform relation discovery under local and global view angles on the multi-scale characteristics of the pedestrians. Pedestrian multiscale feature M in visible light _vis For example, the process of turning from a local view to a global view is expressed as:

T′ _vLs ＝LN(FFN(MHA(T,FL(M _VLs ),FL(M _VLs ))))+T)

wherein T is denoted as a learnable Token sequence, the number is set to 6, FL represents an operation of flattening a three-dimensional pedestrian feature into a two-dimensional feature, MHA represents a multi-head attention mechanism, FFN represents a feed-forward operation, and LN represents a layer normalization operation.

The process of turning from the global view to the local view is expressed as:

M′ _vis ＝Conv(RS(MHA(FL(M _vis )，T′ _vis ，T′ _vis )+M _vis )

In step E), the implementation method of the cross-modal feature reconstruction module is as follows:

Local feature->

Expressed as:

wherein Attn represents the mechanism of attention, T' _ir[0] Indicating the use of the infrared Token sequence for the first pedestrian,

and->

Replaced by->

Obtaining the local feature +.>

In the step F), the specific method for constructing the characteristic reconstruction loss is as follows: calculating the difference between the reconstructed pedestrian characteristics and the target characteristics to obtain characteristic reconstruction loss

Updating the network model with an optimizer, expressed as:

The embodiment also provides a cross-mode pedestrian re-identification system based on feature reconstruction, which comprises a memory, a processor and computer program instructions stored on the memory and capable of being run by the processor, wherein the computer program instructions can realize the method steps when the processor runs the computer program instructions.

In this embodiment, the RegDB dataset is adopted to perform comparison verification under the setting of searching infrared pictures by using visible light pictures of pedestrians, and table 1 shows the comparison result of the method proposed by the invention on the RegDB dataset and other cross-mode pedestrian re-recognition methods. As can be seen from Table 1, the method of the invention has higher accuracy and robustness compared with other cross-mode pedestrian re-recognition methods, and is embodied as Rank-1 and mAP best.

TABLE 1

In Table 1, MAUM corresponds to the method proposed by J.Liu et al (J.Liu, Y.Sun, F.Zhu, H.Pei, Y.Yang, W.Li, learning memory-augmented unidirectional metrics for cross-modality person re-identification, in: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2022, pp.19366-19375.)

MPANet corresponds to the method proposed by Q.Wu et al (Q.Wu, P.Dai, J.Chen, C.Lin, Y.Wu, F.Huang, B.Zhong, R.Ji, discover cross-modality nuances for visible-infrared person re-identification, in: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2021, pp.4330-4339.)

NFS corresponds to the method proposed by Y.Chen et al (Y.Chen, L.Wan, Z.Li, Q.Jing, Z.Sun, neural feature search for RGBinfrared person re-identification, in: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2021, pp.587-597.)

SPOT corresponds to the method proposed by C.Chen et al (C.Chen, M.Ye, M.Qi, J.Wu, J.Jiang, C.Lin, structure-aware positional Transformer for visible-infrared person re-identification, IEEE Trans. Image Process.31 (2022) 2352-2364.)

AGW corresponds to the method proposed by M.Ye et al (M.Ye, J.Shen, G.Lin, T.Xiang, L.Shao, S.C.Hoi, deep learning for person re-identification: A survey and outlook, IEEE Trans. Pattern Anal. Mach. Intel 44 (6) (2022) 2872-2893.)

DDAG corresponds to the method proposed by M.Ye et al (M.Ye, J.Shen, D.J Crandall, L.Shao, J.Luo, dynamic Dual-attentive aggregation learning for visible-infrared person re-identification, in: proceedings of the European Conference on Computer Vision,2020, pp.229-247.)

D-HSME corresponds to the method proposed by Y.Hao et al (Y.Hao, N.Wang, J.Li, X.Gao, HSME: hypersphere manifold embedding for visible thermal person re-identification, in: proceedings of the AAAI conference on artificial intelligence,2019, pp.8385-8392.)

MSPAC corresponds to the method proposed by C.zhang et al (C.Zhang, H.Liu, W.Guo, M.Ye, multi-scale cascading network with compact feature learning for RGB-infrared person re-identification, in: proceedings ofthe IEEE International Conference on Pattern Recognition,2021, pp.8679-8686.)

CMGN corresponds to the method proposed by J.Jiang et al (J.Jiang, K.Jin, M.Qi, Q.Wang, J.Wu, C.Chen, across-modular multi-granularity attention network for RGB-IRperson re-identification, neuroomutting 406 (2020) 59-67.)

SDL corresponds to the method proposed by K.Kansal et al (K.Kansal, A.V.Subramanyam, Z.Wang, S.Satoh, SDL: spectrumdisentangled representation learning for visible-infraredperson re-identification, IEEE Trans. Circuits System. Video technology.30 (10) (2020) 3422-3432.)

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the invention in any way, and any person skilled in the art may make modifications or alterations to the disclosed technical content to the equivalent embodiments. However, any simple modification, equivalent variation and variation of the above embodiments according to the technical substance of the present invention still fall within the protection scope of the technical solution of the present invention.

Claims

1. The cross-mode pedestrian re-identification method based on feature reconstruction is characterized by comprising the following steps of:

2. The cross-mode pedestrian re-recognition method based on feature reconstruction according to claim 1, wherein in step 1), the dataset is a RegDB cross-mode pedestrian re-recognition dataset, and M visible light pictures and M infrared pictures of N pedestrians are extracted from the RegDB cross-mode pedestrian re-recognition dataset in a paired manner.

3. The method for identifying the cross-modal pedestrian re-based on the feature reconstruction according to claim 1, wherein in the step 2), the method for implementing the cross-modal pedestrian re-identification network model is as follows:

4. The cross-modal pedestrian re-recognition method based on feature reconstruction as claimed in claim 3, wherein in the step B), the multi-scale feature extraction module includes four feature extraction modules Stage-1, stage-2, stage-3, stage-4; the pedestrian feature size extracted by the specific feature extraction module is 3×288×144, the feature map size is 256×72×36 after passing through the first feature extraction module Stage-1, the feature map size is 512×36×18 after passing through the second feature extraction module Stage-2, and the feature map size is 1024×18×9 after passing through the third feature extraction module Stage-3.

5. The method for identifying the cross-modal pedestrian based on the feature reconstruction according to claim 3, wherein in the step C), the self-adaptive pooling is utilized to unify and then splice different scale features of the pedestrian, the two-way mixed structure of convolution and Transformer is utilized to model the multi-scale features of the pedestrian, and the interference of the pedestrian irrelevant features under different scales is reduced; utilizing a learnable Token sequence to perform relation mining on the multi-scale characteristics of pedestrians under local and global view angles; pedestrian multiscale feature M for visible light _vis The process of turning from a local view to a global view is expressed as:

T′ _vis ＝LN*FFN(MHA(T,FL(M _vis ),FL(M _vis ))))+T)

the process of turning from the global view to the local view is expressed as:

M′ _vis ＝Conv(RS(MHA(FL(M _vis ),T _vis ,T _vis )+M _vis )

6. The method for identifying the cross-modal pedestrian based on the feature reconstruction according to claim 3, wherein in the step E), the method for implementing the cross-modal feature reconstruction module is as follows:

Local feature->

Expressed as:

wherein Attn represents the mechanism of attention, T' _ir[0] Indicating the use of the first pedestrianIs a sequence of infrared Token of (c),

and->

Replaced by->

Obtaining the local feature +.>

7. The cross-modal pedestrian re-recognition method based on feature reconstruction of claim 3, wherein in the step F), the specific method for constructing the feature reconstruction loss is as follows: calculating the difference between the reconstructed pedestrian characteristics and the target characteristics to obtain characteristic reconstruction loss

Updating the network model with an optimizer, expressed as:

8. A cross-modality pedestrian re-recognition system based on feature reconstruction, comprising a memory, a processor and computer program instructions stored on the memory and executable by the processor, which when executed by the processor, are capable of implementing the method steps of any one of claims 1 to 7.