WO2022027986A1 - 基于跨模态行人重识别方法及装置 - Google Patents

基于跨模态行人重识别方法及装置 Download PDF

Info

Publication number
WO2022027986A1
WO2022027986A1 PCT/CN2021/084753 CN2021084753W WO2022027986A1 WO 2022027986 A1 WO2022027986 A1 WO 2022027986A1 CN 2021084753 W CN2021084753 W CN 2021084753W WO 2022027986 A1 WO2022027986 A1 WO 2022027986A1
Authority
WO
WIPO (PCT)
Prior art keywords
modal
generalization
cross
features
feature
Prior art date
Application number
PCT/CN2021/084753
Other languages
English (en)
French (fr)
Inventor
王金鹏
王金桥
胡建国
唐明
林格
招继恩
朱贵波
Original Assignee
杰创智能科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杰创智能科技股份有限公司 filed Critical 杰创智能科技股份有限公司
Publication of WO2022027986A1 publication Critical patent/WO2022027986A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the present invention relates to the field of computer technology, and in particular, to a method and device for pedestrian re-identification based on cross-modalities.
  • Pedestrian re-identification is a very important part in intelligent video surveillance systems.
  • the traditional pedestrian re-identification (RGB-RGB) aims to solve the problem of retrieving the same pedestrian image in the image database from a given pedestrian image, which needs to overcome the influence of the pedestrian image cross-camera perspective, pose, scale changes and other factors.
  • the current pedestrian re-identification task is only suitable for well-lit situations, and these systems will basically fail once encountering low light or even no light at night. Therefore, the introduction of infrared cameras to extract infrared images of pedestrians for cross-modal pedestrian re-identification is a practical solution to achieve all-day intelligent video surveillance.
  • the cross-modal person re-identification method based on different sub-networks uses different sub-networks to extract images of different modalities in the training process.
  • the infrared sub-network is responsible for processing the input of infrared images
  • the RGB sub-network is responsible for processing the input of RGB images.
  • the branches of the two sub-networks are fused at the back layer of the network to obtain a shared network to form the third branch, and the three branches are jointly learned through the identity embedding loss and the triple metric loss to obtain a unified representation of cross-modal pedestrian images.
  • This method extracts features through two sub-networks and then performs embedding learning, so that the learned features cannot be well applied to different modalities. Because different sub-networks are more inclined to learn the unique features of the modality, it is difficult to learn the common feature information of the modality only by the joint learning of the back-layer network, and the effect on cross-modal distribution alignment is poor, and the performance is low. Moreover, the optimization of the network using two sub-networks is more complicated and prone to overfitting within the modes.
  • embodiments of the present invention provide a method and device for pedestrian re-identification based on cross-modalities.
  • an embodiment of the present invention provides a cross-modal pedestrian re-identification method, including:
  • the cross-modal feature extraction model is obtained by training based on cross-modal pedestrian re-identification sample images, including a feature extraction module, a modality batch normalized identity embedding module, and a single-modality identity embedding based on a mutual learning strategy module;
  • the feature extraction module is used for extracting infrared image features and RGB image features of the sample image;
  • the modal batch normalization identity embedding module is used for normalizing infrared image features and RGB image features, Obtaining cross-modal generalization features;
  • the single-modal identity embedding module based on the mutual learning strategy is used for normalizing the infrared image features respectively, obtaining the infrared single-modal generalization features, and performing RGB image features. Normalization processing to obtain RGB single-modal generalization features;
  • the loss of the cross-modal generalization feature, the infrared single-modal generalization feature and the RGB single-modal generalization feature corresponding to the sample image is calculated to optimize the The cross-modal feature extraction model described above is used until the preset convergence conditions are met.
  • the modal batch normalization identity embedding module is used to normalize infrared image features and RGB image features to obtain cross-modal generalization features, including:
  • the modality batch normalization identity embedding module is used for inputting infrared image features and RGB image features into a normalization function to obtain cross-modal generalization features.
  • the single-modal identity embedding module based on the mutual learning strategy is used to normalize the infrared image features respectively, obtain the infrared single-modal generalization features, and normalize the RGB image features, Obtain RGB single-modal generalization features, including:
  • the single-modal identity embedding module based on mutual learning strategy is used for:
  • the cross-modal feature extraction model is optimized by calculating the loss of the cross-modal generalization feature, the infrared single-modal generalization feature and the RGB single-modal generalization feature corresponding to the sample image, including:
  • the cross-entropy loss function obtain the first loss result of the cross-modal generalization feature, the second loss result of the infrared single-modal generalization feature, and the third loss result of the RGB single-modal generalization feature;
  • the fourth loss results of the infrared single-modal generalization feature and the RGB single-modal generalization feature are obtained;
  • the cross-modal feature extraction model parameters are optimized.
  • the cross-modal feature extraction model is optimized by calculating the loss of the cross-modal generalization feature, the infrared single-modal generalization feature and the RGB single-modal generalization feature corresponding to the sample image, further comprising:
  • the cross-modal feature extraction model parameters are optimized.
  • an embodiment of the present invention provides a cross-modal pedestrian re-identification device, including:
  • the first acquiring unit is used to acquire a pedestrian image with an identity mark, input the pedestrian image with an identity mark into a cross-modal feature extraction model, and determine the cross-modal generalization feature of the pedestrian image with an identity mark , infrared single-modal generalization feature and RGB single-modal generalization feature;
  • a second acquiring unit configured to acquire an image to be re-identified across modalities, and to determine image features to be re-identified across modalities
  • the recognition unit is used to calculate the image features to be re-identified across modalities, and the cross-modal generalization features, infrared single-modal generalization features and RGB single-modal generalization features of pedestrian images with identity tags The similarity of features is used for pedestrian re-identification;
  • the feature extraction module is used for extracting infrared image features and RGB image features of the sample image;
  • the modal batch normalization identity embedding module is used for normalizing infrared image features and RGB image features, Obtaining cross-modal generalization features;
  • the single-modal identity embedding module based on the mutual learning strategy is used for normalizing the infrared image features respectively, obtaining the infrared single-modal generalization features, and performing RGB image features. Normalization processing to obtain RGB single-modal generalization features;
  • the loss of the cross-modal generalization feature, the infrared single-modal generalization feature and the RGB single-modal generalization feature corresponding to the sample image is calculated to optimize the The cross-modal feature extraction model described above is used until the preset convergence conditions are met.
  • the modal batch normalization identity embedding module is used to normalize infrared image features and RGB image features to obtain cross-modal generalization features, including:
  • the modality batch normalization identity embedding module is used for inputting infrared image features and RGB image features into a normalization function to obtain cross-modal generalization features.
  • the single-modal identity embedding module based on the mutual learning strategy is used to normalize the infrared image features respectively, obtain the infrared single-modal generalization features, and normalize the RGB image features, Obtain RGB single-modal generalization features, including:
  • the single-modal identity embedding module based on mutual learning strategy is used for:
  • an embodiment of the present invention provides an electronic device, including a memory, a processor, and a computer program stored in the memory and running on the processor, the processor implementing the program as described in the first aspect when the processor executes the program Describe the steps of a cross-modal person re-identification method.
  • an embodiment of the present invention provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the cross-modal pedestrian re-identification method described in the first aspect A step of.
  • the cross-modal pedestrian re-identification method and device determine the cross-modal generalization feature of the pedestrian image with the identification mark by inputting the pedestrian image with the identification mark into the cross-modal feature extraction model. , infrared single-modality generalization features and RGB single-modality generalization features, and calculate the image features to be re-identified across modalities, and the cross-modality generalization features of pedestrian images with identity marks, infrared single-modality features
  • the similarity between the modal generalization feature and the RGB single-modal generalization feature is used to perform pedestrian re-identification; the cross-modal feature extraction model is trained based on the cross-modal pedestrian re-identification sample images, including the sample images used to extract the sample images.
  • the feature extraction module for infrared image features and RGB image features the modality batch normalization identity embedding module for obtaining cross-modal generalization features, and the infrared single-modal generalization features and RGB single-modal generalization features.
  • Monomodal Identity Embedding Module for Generic Features the infrared image features and RGB image features of the sample images are extracted through the same network, that is, the feature extraction module, so that the extracted image features have stronger generalization, and at the same time, the modal batch normalized identity embedding and the single-modal identity embedding are used.
  • FIG. 1 is a schematic flowchart of a cross-modal pedestrian re-identification method provided by a first embodiment of the present invention
  • FIG. 2 is a schematic flowchart of network training provided by the first embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of a cross-modal pedestrian re-identification device provided by a second embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of an electronic device provided by a third embodiment of the present invention.
  • FIG. 1 is a schematic flowchart of the method for pedestrian re-identification based on cross-modality provided by the first embodiment of the present invention. As shown in FIG. 1 , the method for pedestrian-based re-identification based on cross-modality provided by the first embodiment of the present invention includes the following step:
  • Step 110 Acquire a pedestrian image with an identification mark, input the pedestrian image with an identification mark into the cross-modal feature extraction model, and determine the cross-modal generalization feature and infrared single-modal generalization feature of the pedestrian image with an identification mark.
  • generalization features and RGB unimodal generalization features are provided.
  • the cross-modal feature extraction model is trained based on cross-modal pedestrian re-identification sample images, including a feature extraction module, a modality batch normalized identity embedding module, and a single-modality identity embedding module based on a mutual learning strategy.
  • pedestrian re-identification also known as pedestrian re-identification
  • pedestrian re-identification is a technology that uses computer vision technology to determine whether there is a specific pedestrian in an image or video sequence. For example, by giving the next surveillance pedestrian image with an identity tag under camera A, retrieve whether the pedestrian image exists in camera B under cross-device.
  • Pedestrian re-identification can make up for the visual limitations of fixed cameras, and can be combined with pedestrian detection/pedestrian tracking technology, which can be widely used in intelligent video surveillance, intelligent security and other fields.
  • traditional RGB-RGB person re-identification methods are only suitable for well-lit situations, processing images captured by single-modality cameras through RGB-RGB matching of humanoid appearances.
  • cross-modal person re-identification focuses on matching images under cross-modalities. It can use infrared-RGB images of pedestrians to search for infrared-RGB images of pedestrians under cross-devices.
  • IR-RGB images increase the modal differences relative to single-modal RGB images, making images of different pedestrians within a modality more similar than images of the same pedestrian across modalities.
  • the information capacity and representation of the three-channel information of RGB images and the single-channel information of infrared images are different, and different sharpness and lighting conditions may produce different effects on the two types of images. For example, applying the same lighting conditions to these two types of images is likely to increase sharpness for RGB images, while for infrared images, the brightness may be too high to blur the image.
  • infrared-RGB cross-modality
  • traditional methods based on deep learning use different sub-networks to extract features from images of different modalities, that is, the infrared sub-network is responsible for feature extraction of infrared images, and the RGB sub-network is responsible for feature extraction.
  • Feature extraction is performed on RGB images, and then the branches of the two sub-networks are fused at the back layer of the network to obtain a shared network to form a third network branch.
  • Dynamic person re-identification model uses two sub-networks to extract infrared features and RGB features respectively.
  • the network model When the capacity of the network is too large, the network model not only learns the mode of the training set data, but also learns additional observation errors, resulting in the learned model in The performance on the training set is good, but the performance on the test set is not good, the generalization ability of the model is weak, and it is prone to overfitting in the modal.
  • the obtained pedestrian image with identification mark is input into the cross-modal feature extraction model, and the cross-modal generalization feature, infrared single-modal generalization feature and RGB single-mode generalization feature of the pedestrian image with identification mark are determined.
  • Modal generalization features among them, the cross-modal feature extraction model is trained based on cross-modal pedestrian re-identification sample images, including a feature extraction module, a modal batch normalized identity embedding module, and a single model based on a mutual learning strategy. Modal identity embedded module.
  • the acquired image of a pedestrian with an identity mark may be one image or multiple images of the same pedestrian.
  • camera B also captured the image of pedestrian A at the same time, Then the image of pedestrian A captured by camera B can also be input into the cross-modal feature extraction model at the same time, so that the cross-modal generalization features, infrared single-modal generalization features and RGB single-modality features of multiple pedestrian A images can be generated.
  • the modal generalization feature constitutes a comparison database, which is used to compare with the images to be re-identified one by one, so as to determine the result of pedestrian re-identification.
  • FIG. 2 is a schematic flowchart of network training provided by the first embodiment of the present invention.
  • the feature extraction module is used to extract the infrared image features and RGB image features of the sample image. Since the feature extraction module is a single shared network, so as to learn the general features of infrared image features and RGB image features in the sample image. Infrared image features and RGB image features make the extracted image features more generalized.
  • identity embedding loss is the basic method for person re-identification, which is learned from different images of each pedestrian as a classification task.
  • identity embedding loss is the basic method for person re-identification, which is learned from different images of each pedestrian as a classification task.
  • the direct application of identity embedding loss in cross-modal person re-identification will cause the phenomenon of gradient disappearance due to the influence of different modal images, which makes the cross-modal feature extraction model unable to learn cross-modal generalization features well. Therefore, in this embodiment, the modal batch normalization identity embedding module is used to normalize the extracted infrared image features and RGB image features, and then the cross-modal generalization features are obtained, so that the modal batch normalized identity is obtained. Embedding modules can better learn generalization features across modalities.
  • this embodiment uses a separate identity embedding for each single-modal branch (infrared single-modality and RGB single-modality) for optimization.
  • each unimodal feature can be regarded as a probability distribution belonging to different pedestrians after normalization processing. The higher the similarity of other images of the pedestrian.
  • the infrared image features are respectively normalized to obtain the infrared single-modal generalization features, and the RGB image features are normalized to obtain RGB single-modal generalization features, thereby enhancing the feature extraction of infrared single-modal images and RGB single-modal image feature extraction.
  • the cross-modality generalization feature is optimized by computing the loss of the cross-modality generalization feature, the infrared single-modality generalization feature and the RGB single-modality generalization feature corresponding to the sample image.
  • the feature extraction model is performed until the preset convergence conditions are met. For example, when the number of training times reaches the threshold and the convergence conditions are met, the model training is stopped.
  • the cross-modal generalization feature, the infrared single-modal generalization feature and the RGB single-modal generalization feature of the pedestrian image with the identity mark can be accurately extracted by optimizing the training of the cross-modal feature extraction model.
  • Step 120 Acquire an image to be re-identified across modalities, and determine image features to be re-identified across modalities.
  • the cross-modal generalization feature, the infrared single-modal generalization feature and the RGB single-modal generalization feature of the pedestrian image with the identification mark are obtained through the cross-modal feature extraction model in the above step 110 , therefore, in this embodiment, by acquiring an image to be re-identified across modalities, and determining the image features to be re-identified across modalities, for example, a histogram of directional gradients (HOG) can be used to determine pedestrians to be re-identified across modalities Re-identified image features, that is, extracting image features by calculating and counting the gradient direction histograms of local regions of the image.
  • the grayscale difference statistics method can also be used to determine the image features to be re-identified across modalities, and the image feature extraction method can also be selected according to the actual situation. This is not specifically limited.
  • the image features for cross-modal pedestrian re-recognition are combined with the cross-modal generalization features of the pedestrian images with identification marks, the infrared single
  • the modal generalization feature is compared with the RGB single-modal generalization feature, so that it can be determined whether the pedestrian to be re-identified across modalities is a pedestrian with an identity mark.
  • Step 130 Calculate the image features to be re-identified for cross-modal pedestrians, and the cross-modal generalization features, infrared single-modal generalization features and RGB single-modal generalization features of the pedestrian image with the identification mark. similarity for pedestrian re-identification.
  • the image features to be re-identified across modalities obtained in step 120, and the cross-modal generalization features, infrared single-modal generalization features and RGB single-modal generalization feature calculate the similarity between the image feature in step 120 and each image feature obtained in step 110, and sort the similarity results, and the similarity with the image feature to be re-identified across modalities
  • the highest cross-modal generalization feature, or the infrared single-modal generalization feature, or the pedestrian identity corresponding to the RGB single-modal generalization feature, is the pedestrian identity to be identified.
  • the image feature A to be cross-modal pedestrian re-identification has a similarity of 100% with the cross-modal generalized feature B1 of the pedestrian image with the identification mark, and the similarity with the infrared single-modal generalized feature B2.
  • the similarity is 50%, and the similarity with the RGB single-modal generalization feature B3 is 0%, then the corresponding features sorted according to the similarity from large to small are B1>B2>B3, then the cross-modal generalization feature B1
  • the pedestrian identity in the corresponding image is the pedestrian identity in the image to be re-identified.
  • the cross-modal generalization feature, infrared image and infrared image of the pedestrian image with the identification mark are determined by inputting the pedestrian image with the identification mark into the cross-modal feature extraction model.
  • Single-modal generalization features and RGB single-modal generalization features calculate the image features to be cross-modal pedestrian re-identification, and the cross-modal generalization features of pedestrian images with identification marks, infrared single-modal generalization features Pedestrian re-identification is carried out based on the similarity between the RGB single-modal generalization feature and the RGB single-modal generalization feature; the cross-modal feature extraction model is trained based on the cross-modal pedestrian re-identification sample images, including infrared images used to extract the sample images.
  • a feature extraction module for image features and RGB image features, a modality batch-normalized identity embedding module for obtaining cross-modal generalization features, and a single-modal infrared generalization feature and RGB single-modality generalization feature The Monomodal Identity Embedding Module.
  • the infrared image features and RGB image features of the sample images are extracted through the same network, that is, the feature extraction module, so that the extracted image features have stronger generalization, and at the same time, the modal batch normalized identity embedding and the single-modal identity embedding are used.
  • the modal batch normalization identity embedding module is used to normalize infrared image features and RGB image features to obtain cross-modal generalization features, including:
  • the modality batch normalization identity embedding module is used to input infrared image features and RGB image features into a normalization function to obtain cross-modal generalization features.
  • the modal batch normalized identity embedding module is used to input infrared image features and RGB image features into the normalization function to obtain cross-modal generalization features, so that the modal batch normalized identity embedding module can Better learn generalization features across modalities.
  • the normalization function is a normalization operation with a mean value of 0 and a standard deviation of 1.
  • the infrared image features and RGB image features are input into the normalization function through the modality batch normalization identity embedding module, and the cross-modal generalization features are obtained, so that the The modality batch normalization identity embedding module can better learn the generalization features of the cross-modalities, improve the similarity of the image features of the same pedestrian in the cross-modality feature extraction model, and make the modal distribution alignment effect better, thus facilitating the Accurate pedestrian re-identification.
  • the single-modal identity embedding module based on the mutual learning strategy is used to normalize the infrared image features respectively to obtain the infrared single-modal generalization features, and, Normalize the RGB image features to obtain RGB single-modal generalization features, including:
  • a unimodal identity embedding module based on a mutual learning strategy is used to:
  • this embodiment uses a separate identity embedding for each single-modal branch (infrared single-modal and RGB single-modal) to optimize.
  • each unimodal feature can be regarded as a probability distribution belonging to different pedestrians after normalization processing. The higher the similarity of other images of the pedestrian. Therefore, in this embodiment, the single-modal identity embedding module based on the mutual learning strategy respectively inputs the infrared image features into the normalization function to obtain the infrared single-modal generalization feature; and inputs the RGB image features into the normalization function to obtain the RGB image features.
  • Single-modal generalization features thereby enhancing the feature extraction of infrared single-modal images and RGB single-modal image feature extraction.
  • the normalization function is a normalization operation with a mean value of 0 and a standard deviation of 1.
  • the infrared image features are respectively input into a normalization function to obtain infrared single-modal generalization features; and RGB image feature input normalization function to obtain RGB single-modal generalization features, so as to strengthen the feature extraction of infrared single-modal image and RGB single-modal image feature extraction, and improve the accuracy of the same pedestrian image feature in the cross-modal feature extraction model.
  • the similarity makes the modal distribution alignment effect better, which facilitates accurate pedestrian re-identification.
  • the cross-modality generalization feature is optimized by calculating the loss of the cross-modal generalization feature, the infrared single-modal generalization feature, and the RGB single-modal generalization feature corresponding to the sample image.
  • state feature extraction model including:
  • the cross-entropy loss function obtain the first loss result of the cross-modal generalization feature, the second loss result of the infrared single-modal generalization feature, and the third loss result of the RGB single-modal generalization feature;
  • the fourth loss results of the infrared single-modal generalization feature and the RGB single-modal generalization feature are obtained;
  • the cross-modal feature extraction model parameters are optimized.
  • the cross-modal feature extraction model is responsible for extracting cross-modal generalization features, infrared single-modal generalization features and RGB single-modal generalization features of pedestrian images with identity tags, in order to ensure accurate extraction
  • the parameters of the cross-modal feature extraction model need to be optimized.
  • the first loss result of the cross-modal generalization feature is obtained according to the cross-entropy loss function, that is, by Get the first loss result, where represents the first loss result, f represents the feature, W a represents the pedestrian a parameter, W j represents the pedestrian j parameter, Np represents the number of pedestrians, Fi represents the infrared single-modal generalization feature, and Fv represents the RGB single-modal generalization feature.
  • the third loss result of the RGB single-modal generalization feature By obtaining the third loss result of the RGB single-modal generalization feature according to the cross-entropy loss function, that is, by Get the third loss result, where represents the third loss result, f represents the feature, W a represents the pedestrian a parameter, W j represents the pedestrian j parameter, Np represents the number of pedestrians, and Fv represents the infrared single-modal generalization feature.
  • the fourth loss results of the infrared single-modal generalization feature and the RGB single-modal generalization feature are obtained, that is, by Get the fourth loss result, where represents the fourth loss result, KL represents the KL distance of the distribution, y represents the pedestrian probability, fi represents the infrared image feature, fv represents the RGB image feature, Wi represents the infrared branch parameter, and Wv represents the RGB branch parameter.
  • the parameters of the cross-modal feature extraction model are optimized, so that the cross-modal feature extraction model can accurately extract features.
  • the cross-modal pedestrian re-identification method optimizes the cross-modality by calculating the loss of the cross-modal generalization feature, the infrared single-modal generalization feature and the RGB single-modal generalization feature corresponding to the sample image.
  • the cross-modal feature extraction model enables the cross-modal feature extraction model to accurately extract the cross-modal generalization features, infrared single-modal generalization features and RGB single-modal generalization features of pedestrian images with identity tags, and improve pedestrian re-identification. accuracy of results.
  • the cross-modality generalization feature is optimized by calculating the loss of the cross-modal generalization feature, the infrared single-modal generalization feature, and the RGB single-modal generalization feature corresponding to the sample image.
  • the state feature extraction model also includes:
  • the cross-modal feature extraction model parameters are optimized.
  • this embodiment obtains the fifth cross-modal generalization feature according to the triple loss function.
  • the loss result i.e. through the triplet loss function: Get the fifth loss result, where represents the fifth loss result, a represents the center feature, p represents the positive example feature, n represents the negative example feature, and ⁇ represents the distance.
  • the cross-modal pedestrian re-identification method provided by the embodiment of the present invention optimizes the cross-modal feature extraction model in combination with the fifth loss result obtained based on the triple loss function, so that the cross-modal feature extraction model can accurately extract the identity
  • Cross-modal generalization features, infrared single-modal generalization features, and RGB single-modal generalization features of identified pedestrian images improve the accuracy of pedestrian re-identification results.
  • FIG. 3 is a schematic structural diagram of a device based on cross-modality pedestrian re-recognition provided by a second embodiment of the present invention.
  • the device based on cross-modality pedestrian re-recognition provided by the second embodiment of the present invention includes:
  • the first obtaining unit 310 is used to obtain a pedestrian image with an identity mark, input the pedestrian image with an identity mark into a cross-modal feature extraction model, and determine the cross-modal generalization feature of the pedestrian image with an identity mark, Infrared single-modal generalization feature and RGB single-modal generalization feature;
  • the second acquiring unit 320 is configured to acquire an image to be re-identified across modalities, and determine the image features to be re-identified across modalities;
  • the identification unit 330 is used to calculate the image features to be re-identified cross-modal pedestrians, and the cross-modal generalization features, infrared single-modal generalization features and RGB single-modal generalization features of pedestrian images with identity tags The similarity of , for pedestrian re-identification;
  • the feature extraction module is used to extract the infrared image features and RGB image features of the sample image;
  • the modality batch normalization identity embedding module is used to normalize the infrared image features and RGB image features to obtain cross-modal generalization.
  • the single-modal identity embedding module based on the mutual learning strategy is used to normalize the infrared image features respectively to obtain the infrared single-modal generalization features, and to normalize the RGB image features to obtain the RGB single-modality features.
  • the cross-modality generalization feature, the infrared single-modality generalization feature and the RGB single-modality generalization feature corresponding to the sample image are calculated to optimize the cross-modality loss. feature extraction model until the preset convergence conditions are met.
  • cross-modality-based pedestrian re-identification device described in this embodiment can be used to execute the cross-modality-based pedestrian re-identification method described in the first embodiment, and its principles and technical effects are similar, and will not be described in detail here.
  • the modal batch normalization identity embedding module is used to normalize infrared image features and RGB image features to obtain cross-modal generalization features, including:
  • the modality batch normalization identity embedding module is used to input infrared image features and RGB image features into a normalization function to obtain cross-modal generalization features.
  • cross-modality-based pedestrian re-identification device described in this embodiment can be used to execute the cross-modality-based pedestrian re-identification method described in the first embodiment, and its principles and technical effects are similar, and will not be described in detail here.
  • the single-modal identity embedding module based on the mutual learning strategy is used to normalize the infrared image features respectively to obtain the infrared single-modal generalization features, and, Normalize the RGB image features to obtain RGB single-modal generalization features, including:
  • a unimodal identity embedding module based on a mutual learning strategy is used to:
  • cross-modality-based pedestrian re-identification device described in this embodiment can be used to execute the cross-modality-based pedestrian re-identification method described in the first embodiment, and its principles and technical effects are similar, and will not be described in detail here.
  • FIG. 4 is a schematic structural diagram of an electronic device provided by a third embodiment of the present invention.
  • the electronic device may include: a processor (processor) 410, a communications interface (Communications Interface) 420, and a memory (memory) 430 and a communication bus 440 , wherein the processor 410 , the communication interface 420 , and the memory 430 communicate with each other through the communication bus 440 .
  • processor processor
  • Communication interface Communication Interface
  • memory memory
  • the processor 410 can invoke the logic instructions in the memory 430 to execute a cross-modality-based pedestrian re-identification method, the method comprising: acquiring a pedestrian image with an identity mark, and inputting the pedestrian image with an identity mark into cross-modal feature extraction In the model, determine the cross-modal generalization feature, infrared single-modal generalization feature, and RGB single-modal generalization feature of the pedestrian image with identification; Image features of cross-modal pedestrian re-identification; calculate image features to be cross-modal pedestrian re-identification, and cross-modal generalization features, infrared single-modal generalization features, and RGB single-modality features of pedestrian images with identification signs The similarity of state generalization features is used to perform person re-identification.
  • the above-mentioned logic instructions in the memory 430 can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product.
  • the technical solution of the present invention can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention.
  • the aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .
  • an embodiment of the present invention also provides a computer program product, the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, when the program instructions When executed by a computer, the computer can execute the cross-modal pedestrian re-identification method provided by the above method embodiments, the method includes: acquiring a pedestrian image with an identity mark, and inputting the pedestrian image with an identity mark into the cross-modality
  • embodiments of the present invention further provide a non-transitory computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, is implemented to execute the cross-modal pedestrian-based approach provided by the above embodiments
  • a re-identification method comprising: acquiring a pedestrian image with an identity mark, inputting the pedestrian image with an identity mark into a cross-modal feature extraction model, and determining cross-modal generalization features of the pedestrian image with an identity mark, Infrared single-modality generalization features and RGB single-modality generalization features; obtain images to be re-identified across modalities, determine image features to be re-identified across modalities; calculate pedestrian re-identifications to be performed across modalities
  • the similarity of the image features with the cross-modal generalization feature, the infrared single-modal generalization feature and the RGB single-modal generalization feature of the pedestrian image with the identification mark is carried out for pedestrian re-identification.
  • the device embodiments described above are only illustrative, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed over multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. Those of ordinary skill in the art can understand and implement it without creative effort.
  • each embodiment can be implemented by means of software plus a necessary general hardware platform, and certainly can also be implemented by hardware.
  • the above-mentioned technical solutions can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products can be stored in computer-readable storage media, such as ROM/RAM, magnetic A disc, an optical disc, etc., includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in various embodiments or some parts of the embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

本发明实施例提供一种基于跨模态行人重识别方法,所述方法包括:通过将带有身份标识的行人图像输入跨模态特征提取模型中,确定带有身份标识的行人图像的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征,并计算待进行跨模态行人重识别的图像特征,与跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征的相似度,进行行人重识别;其中,跨模态特征提取模型包括特征提取模块、模态批归一化身份嵌入模块和单模态身份嵌入模块,从而使得提取的图像特征泛化性更强,同时能够准确提取图像的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征,并确定其与待进行跨模态行人重识别图像特征的相似度,准确得到识别结果。

Description

基于跨模态行人重识别方法及装置 技术领域
本发明涉及计算机技术领域,尤其涉及一种基于跨模态行人重识别方法及装置。
背景技术
行人重识别是在智能视频监控系统中非常重要的一部分。传统的行人重识别(RGB-RGB)旨在解决给定行人图像检索出图像库中同一行人图像的问题,其中需要克服行人图像跨摄像头视角、姿态、尺度变化等因素的影响。但是当前行人重识别任务只适用于光照良好的情况,一旦遇到夜晚低光照甚至无光照时,这些系统基本会失效。因此引入红外摄像机对行人提取红外成像进行跨模态行人重识别是达成全天智能视频监控的一种贴近实际的方案。
为了解决跨模态(RGB-Thermal)行人重识别问题,很多基于深度学习的方法采用不同的子网络对不同模态数据进行特征提取,然后通过特征嵌入学习通用的特征表示。此外也有部分研究利用对抗生成网络的图像生成能力对不同模态图像进行转换,以获取单一图像的多模态表示。
然而,基于不同子网络的跨模态行人重识别方法,在训练过程中采用了不同的子网络对不同模态的图像进行提取。其中,红外子网络负责处理红外图像的输入,RGB子网络负责处理RGB图像的输入。然后在网络后层融合两个子网络的分支得到一个共享网络形成第三个分支,通过身份嵌入损失以及三元组度量损失对三个分支进行共同学习从而得到一个跨模态行人图像的统一表示。
此种方法通过两个子网络提取特征再进行嵌入学习,使得学到的 特征无法很好的适用到不同模态。因为不同的子网络更倾向于学习到模态独有的特征,仅靠后层网络的联合学习难以学到模态通用的特征信息,对跨模态分布对齐的效果较差,性能较低,而且使用两个子网络对网络的优化更为复杂且容易出现模态内的过拟合。
发明内容
针对现有技术存在的问题,本发明实施例提供一种基于跨模态行人重识别方法及装置。
具体地,本发明实施例提供了如下技术方案:
第一方面,本发明实施例提供一种基于跨模态行人重识别方法,包括:
获取带有身份标识的行人图像,将所述带有身份标识的行人图像输入跨模态特征提取模型中,确定带有身份标识的行人图像的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征;
获取待进行跨模态行人重识别的图像,确定待进行跨模态行人重识别的图像特征;
计算所述待进行跨模态行人重识别的图像特征,与带有身份标识的行人图像的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征的相似度,进行行人重识别;
其中,所述跨模态特征提取模型是基于跨模态行人重识别样本图像进行训练得到的,包括特征提取模块、模态批归一化身份嵌入模块和基于互学习策略的单模态身份嵌入模块;
其中,所述特征提取模块用于提取所述样本图像的红外图像特征和RGB图像特征;所述模态批归一化身份嵌入模块用于对红外图像特征和RGB图像特征进行归一化处理,获取跨模态泛化特征;所述基于互学习策略的单模态身份嵌入模块用于分别对红外图像特征进 行归一化处理,获取红外单模态泛化特征,以及,对RGB图像特征进行归一化处理,获取RGB单模态泛化特征;
其中,在对所述跨模态特征提取模型进行训练时,通过计算与样本图像对应的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征的损失来优化所述跨模态特征提取模型,直至满足预设收敛条件。
进一步地,所述模态批归一化身份嵌入模块用于对红外图像特征和RGB图像特征进行归一化处理,获取跨模态泛化特征,包括:
所述模态批归一化身份嵌入模块用于将红外图像特征和RGB图像特征输入归一化函数,获取跨模态泛化特征。
进一步地,所述基于互学习策略的单模态身份嵌入模块用于分别对红外图像特征进行归一化处理,获取红外单模态泛化特征,以及,对RGB图像特征进行归一化处理,获取RGB单模态泛化特征,包括:
所述基于互学习策略的单模态身份嵌入模块用于:
将红外图像特征输入归一化函数,获取红外单模态泛化特征;以及将RGB图像特征输入归一化函数,获取RGB单模态泛化特征。
进一步地,所述通过计算与样本图像对应的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征的损失来优化所述跨模态特征提取模型,包括:
根据交叉熵损失函数,分别获取跨模态泛化特征的第一损失结果,红外单模态泛化特征的第二损失结果,以及RGB单模态泛化特征的第三损失结果;
根据最小化分布距离的损失函数,获取红外单模态泛化特征和RGB单模态泛化特征的第四损失结果;
根据第一损失结果,第二损失结果,第三损失结果,以及第四损失结果之和,优化所述跨模态特征提取模型参数。
进一步地,所述通过计算与样本图像对应的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征的损失来优化所述跨模态特征提取模型,还包括:
根据三元组损失函数,获取跨模态泛化特征的第五损失结果;
根据第一损失结果,第二损失结果,第三损失结果,第四损失结果,以及第五损失结果之和,优化所述跨模态特征提取模型参数。
第二方面,本发明实施例提供一种基于跨模态行人重识别装置,包括:
第一获取单元,用于获取带有身份标识的行人图像,将所述带有身份标识的行人图像输入跨模态特征提取模型中,确定带有身份标识的行人图像的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征;
第二获取单元,用于获取待进行跨模态行人重识别的图像,确定待进行跨模态行人重识别的图像特征;
识别单元,用于计算所述待进行跨模态行人重识别的图像特征,与带有身份标识的行人图像的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征的相似度,进行行人重识别;
其中,所述特征提取模块用于提取所述样本图像的红外图像特征和RGB图像特征;所述模态批归一化身份嵌入模块用于对红外图像特征和RGB图像特征进行归一化处理,获取跨模态泛化特征;所述基于互学习策略的单模态身份嵌入模块用于分别对红外图像特征进行归一化处理,获取红外单模态泛化特征,以及,对RGB图像特征进行归一化处理,获取RGB单模态泛化特征;
其中,在对所述跨模态特征提取模型进行训练时,通过计算与样本图像对应的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征的损失来优化所述跨模态特征提取模型,直至满足预设收敛 条件。
进一步地,所述模态批归一化身份嵌入模块用于对红外图像特征和RGB图像特征进行归一化处理,获取跨模态泛化特征,包括:
所述模态批归一化身份嵌入模块用于将红外图像特征和RGB图像特征输入归一化函数,获取跨模态泛化特征。
进一步地,所述基于互学习策略的单模态身份嵌入模块用于分别对红外图像特征进行归一化处理,获取红外单模态泛化特征,以及,对RGB图像特征进行归一化处理,获取RGB单模态泛化特征,包括:
所述基于互学习策略的单模态身份嵌入模块用于:
将红外图像特征输入归一化函数,获取红外单模态泛化特征;以及将RGB图像特征输入归一化函数,获取RGB单模态泛化特征。
第三方面,本发明实施例提供一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如第一方面所述基于跨模态行人重识别方法的步骤。
第四方面,本发明实施例提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如第一方面所述基于跨模态行人重识别方法的步骤。
本发明实施例提供的基于跨模态行人重识别方法及装置,通过将带有身份标识的行人图像输入跨模态特征提取模型中,确定带有身份标识的行人图像的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征,并计算待进行跨模态行人重识别的图像特征,与带有身份标识的行人图像的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征的相似度,进行行人重识别;其中,跨模态特征提取模型是基于跨模态行人重识别样本图像进行训练得到的,包括用于提取样本图像的红外图像特征和RGB图像特征的特征提取模块、用于 获取跨模态泛化特征的模态批归一化身份嵌入模块,和用于获取红外单模态泛化特征和RGB单模态泛化特征的单模态身份嵌入模块。本发明实施例通过同一网络即特征提取模块提取样本图像的红外图像特征和RGB图像特征,使得提取的图像特征泛化性更强,同时利用模态批归一化身份嵌入和单模态身份嵌入强化红外单模态泛化特征和RGB单模态泛化特征提取,提高跨模态特征提取模型中同一行人图像特征的相似度,使得模态分布对齐效果较好,并根据跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征对应的损失来优化跨模态特征提取模型参数,进而跨模态特征提取模型能够准确提取带有身份标识的行人图像的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征,并确定其与待进行跨模态行人重识别图像特征的相似度,准确得到识别结果。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本发明第一个实施例提供的基于跨模态行人重识别方法的流程示意图;
图2是本发明第一个实施例提供的网络训练的流程示意图;
图3是本发明第二个实施例提供的基于跨模态行人重识别装置的结构示意图;
图4是本发明第三个实施例提供的电子设备的结构示意图。
具体实施方式
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
图1是本发明第一个实施例提供的基于跨模态行人重识别方法的流程示意图,如图1所示,本发明第一个实施例提供的基于跨模态行人重识别方法,包括如下步骤:
步骤110、获取带有身份标识的行人图像,将带有身份标识的行人图像输入跨模态特征提取模型中,确定带有身份标识的行人图像的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征。
其中,跨模态特征提取模型是基于跨模态行人重识别样本图像进行训练得到的,包括特征提取模块、模态批归一化身份嵌入模块和基于互学习策略的单模态身份嵌入模块。
在本步骤中,行人重识别也称行人再识别,是利用计算机视觉技术判断图像或者视频序列中是否存在特定行人的技术。例如通过给定摄像头A下一个带有身份标识的监控行人图像,检索跨设备下如摄像头B是否存在该行人图像。通过行人重识别可以弥补固定的摄像头的视觉局限,并可与行人检测/行人跟踪技术相结合,可广泛应用于智能视频监控、智能安保等领域。然而,传统的RGB-RGB行人重识别的方法只适用于光照良好的情况,通过RGB-RGB匹配的人形外观,处理单模态相机捕获的图像。如今摄像机大多都将红外和可见光功能配在了一起,其中红外摄像机在白天或者黑夜都能够获取行人的红外图像信息,这为跨模态行人重识别的提供了有利的条件。与传统的行 人重识别方法有所不同,跨模态行人重识别专注于匹配跨模态下的图像,它可以利用行人的红外-RGB图像去搜索跨设备下该行人的红外-RGB图像。
然而,红外-RGB图像相对于单模态的RGB图像,增加了模态差异,使得模态内不同行人的图像比跨模态相同行人的图像更相似。RGB图像三通道的信息和红外图像的单通道信息本身的信息容量和表示形式有所不同,不同的清晰度和光照条件在两类图像上所能产生效果可能会大相径庭。比如施加相同的光照条件在这两类图像上,很可能对于RGB图像来说会增加清晰度,而对于红外图像来说可能就会因为亮度过高以至于图像模糊不清。
为了解决跨模态(红外-RGB)行人重识别问题,传统方法基于深度学习采用不同的子网络对不同模态的图像进行特征提取,即红外子网络负责对红外图像进行特征提取,RGB子网络对RGB图像进行特征提取,然后在网络后层融合两个子网络的分支得到一个共享网络形成第三个网络分支,通过身份嵌入损失以及三元组损失对三个分支进行共同学习,从而优化跨模态行人重识别模型。然而,传统方法采用两个子网络分别提取红外特征和RGB特征,由于不同子网络更倾向于学习到单模态独有的特征,仅靠后层网络的联合学习无法学到跨模态通用的特征,从而两个子网络分别提取的单模态特征无法适用到不同模态,对跨模态分布对齐的效果较差,而且也无法准确进行行人重识别,性能较低。而且使用两个子网络对网络的优化更为复杂,当网络的容量过大时,网络模型除了学习到训练集数据的模态之外,还把额外的观测误差也学习进来,导致学习的模型在训练集上面表现较好,但是在测试集上表现不佳,模型泛化能力偏弱,容易出现模态内的过拟合。
因此,本实施例将获取的带有身份标识的行人图像输入跨模态特 征提取模型中,确定带有身份标识的行人图像的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征;其中,跨模态特征提取模型是基于跨模态行人重识别样本图像进行训练得到的,包括特征提取模块、模态批归一化身份嵌入模块和基于互学习策略的单模态身份嵌入模块。
可以理解的是,获取的带有身份标识的行人图像可以是一张,也可以是多张同一行人的图像。举例来说,若要搜索在摄像头A下拍摄到的行人A的图像,则可以将摄像头A拍摄的行人A图像输入跨模态特征提取模型中,若摄像头B也同时拍摄到了行人A的图像,则也可以将摄像头B下拍摄的行人A的图像也同时输入跨模态特征提取模型中,从而可以生成多个行人A图像的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征,构成一个对比数据库,用来逐一与待进行行人重识别的图像进行对比,从而确定行人重识别的结果。
其中,图2是本发明第一个实施例提供的网络训练的流程示意图,如图2所示,特征提取模块用于提取样本图像的红外图像特征和RGB图像特征,由于特征提取模块是单一共享网络,从而能够学习到样本图像中红外图像特征和RGB图像特征的通用特征,相较于传统方法中采用两个不同子网络分别提取红外图像特征和RGB图像特征,本实施例采用的单一网络提取红外图像特征和RGB图像特征,使得提取的图像特征泛化性更强。
此外,身份嵌入损失是进行行人重识别的基本方法,是将每个行人的不同图像当作一个分类任务学习。但是身份嵌入损失在跨模态行人重识别中直接应用会由于不同模态图像的影响从而出现梯度消失的现象,使得跨模态特征提取模型无法很好地学习到跨模态的泛化特征。因此,本实施例通过模态批归一化身份嵌入模块,对提取的红外 图像特征和RGB图像特征进行归一化处理后,获取跨模态泛化特征,从而使模态批归一化身份嵌入模块能够更好地学习到跨模态的泛化特征。
同时,为了强化跨模态特征提取模型对单模态图像中行人特征提取,本实施例对于每一个单模态分支(红外单模态和RGB单模态)使用了单独的身份嵌入来优化。本实施例通过采用互学习策略,每个单模态特征经过归一化处理后可以看作是一个属于不同行人的概率分布,概率越大说明经过归一化处理的单模态泛化特征与该行人其它图像的相似度越高。因此,本实施例基于互学习策略的单模态身份嵌入模块,分别对红外图像特征进行归一化处理,获取红外单模态泛化特征,以及,对RGB图像特征进行归一化处理,获取RGB单模态泛化特征,从而强化对红外单模态图像特征提取和RGB单模态图像特征提取。
此外,在对跨模态特征提取模型进行训练时,通过计算与样本图像对应的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征的损失来优化跨模态特征提取模型,直至满足预设收敛条件,如当训练次数达到阈值,满足收敛条件,停止模型训练。本实施例通过优化训练跨模态特征提取模型,可以准确提取带有身份标识的行人图像的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征。
步骤120、获取待进行跨模态行人重识别的图像,确定待进行跨模态行人重识别的图像特征。
在本步骤中,由于上述步骤110中通过跨模态特征提取模型,获取了带有身份标识的行人图像的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征,因此,本实施例通过获取待进行跨模态行人重识别的图像,并确定待进行跨模态行人重识别的图像特征,如可以采用方向梯度直方图(HOG)确定待进行跨模态行人重识别的图 像特征,即通过计算和统计图像局部区域的梯度方向直方图来提取图像特征。在本实施例中,需要说明的是,本实施例还可以采用灰度差分统计法确定待进行跨模态行人重识别的图像特征,还可以根据实际情况选取图像特征的提取方法,本实施例对此不作具体限定。
在本实施例中,确定待进行跨模态行人重识别的图像特征后,将待进行跨模态行人重识别的图像特征与带有身份标识的行人图像的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征进行比对,从而可以确定待进行跨模态行人重识别的行人是否为带有身份标识的行人。
步骤130、计算所述待进行跨模态行人重识别的图像特征,与带有身份标识的行人图像的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征的相似度,进行行人重识别。
在本步骤中,根据步骤120获取的待进行跨模态行人重识别的图像特征,以及步骤110获取的带有身份标识的行人图像的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征,计算步骤120的图像特征与步骤110中获取的每个图像特征的相似度,并对相似度结果进行排序,与待进行跨模态行人重识别的图像特征相似度最高的跨模态泛化特征、或红外单模态泛化特征、或RGB单模态泛化特征对应的行人身份,即为待进行识别的行人身份。
举例来说,待进行跨模态行人重识别的图像特征A,与带有身份标识的行人图像的跨模态泛化特征B1的相似度为100%,与红外单模态泛化特征B2的相似度为50%,与RGB单模态泛化特征B3的相似度为0%,那么根据相似度从大到小进行排序对应的特征为B1>B2>B3,则跨模态泛化特征B1对应图像中的行人身份即为待进行行人重识别图像中的行人身份。
本发明实施例提供的基于跨模态行人重识别方法,通过将带有身 份标识的行人图像输入跨模态特征提取模型中,确定带有身份标识的行人图像的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征,并计算待进行跨模态行人重识别的图像特征,与带有身份标识的行人图像的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征的相似度,进行行人重识别;其中,跨模态特征提取模型是基于跨模态行人重识别样本图像进行训练得到的,包括用于提取样本图像的红外图像特征和RGB图像特征的特征提取模块、用于获取跨模态泛化特征的模态批归一化身份嵌入模块,和用于获取红外单模态泛化特征和RGB单模态泛化特征的单模态身份嵌入模块。本发明实施例通过同一网络即特征提取模块提取样本图像的红外图像特征和RGB图像特征,使得提取的图像特征泛化性更强,同时利用模态批归一化身份嵌入和单模态身份嵌入强化红外单模态泛化特征和RGB单模态泛化特征提取,提高跨模态特征提取模型中同一行人图像特征的相似度,使得模态分布对齐效果较好,并根据跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征对应的损失来优化跨模态特征提取模型参数,进而跨模态特征提取模型能够准确提取带有身份标识的行人图像的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征,并确定其与待进行跨模态行人重识别图像特征的相似度,准确得到识别结果。
基于上述实施例的内容,作为一种可选实施例,模态批归一化身份嵌入模块用于对红外图像特征和RGB图像特征进行归一化处理,获取跨模态泛化特征,包括:
模态批归一化身份嵌入模块用于将红外图像特征和RGB图像特征输入归一化函数,获取跨模态泛化特征。
在本步骤中,鉴于身份嵌入损失在跨模态行人重识别中直接应用会由于不同模态图像的影响从而出现梯度消失的现象,使得跨模态特 征提取模型无法很好地学习到跨模态的泛化特征。因此,本实施例通过模态批归一化身份嵌入模块,将红外图像特征和RGB图像特征输入归一化函数,获取跨模态泛化特征,从而使模态批归一化身份嵌入模块能够更好地学习到跨模态的泛化特征。所述归一化函数为均值为0标准差为1的归一化操作。
本发明实施例提供的基于跨模态行人重识别方法,通过模态批归一化身份嵌入模块,将红外图像特征和RGB图像特征输入归一化函数,获取跨模态泛化特征,从而使模态批归一化身份嵌入模块能够更好地学习到跨模态的泛化特征,提高跨模态特征提取模型中同一行人图像特征的相似度,使得模态分布对齐效果较好,从而便于准确进行行人重识别。
基于上述实施例的内容,作为一种可选实施例,基于互学习策略的单模态身份嵌入模块用于分别对红外图像特征进行归一化处理,获取红外单模态泛化特征,以及,对RGB图像特征进行归一化处理,获取RGB单模态泛化特征,包括:
基于互学习策略的单模态身份嵌入模块用于:
将红外图像特征输入归一化函数,获取红外单模态泛化特征;以及将RGB图像特征输入归一化函数,获取RGB单模态泛化特征。
在本步骤中,为了强化跨模态特征提取模型对单模态图像中行人特征提取,本实施例对于每一个单模态分支(红外单模态和RGB单模态)使用了单独的身份嵌入来优化。本实施例通过采用互学习策略,每个单模态特征经过归一化处理后可以看作是一个属于不同行人的概率分布,概率越大说明经过归一化处理的单模态泛化特征与该行人其它图像的相似度越高。因此,本实施例基于互学习策略的单模态身份嵌入模块,分别将红外图像特征输入归一化函数,获取红外单模态泛化特征;以及将RGB图像特征输入归一化函数,获取RGB单模态 泛化特征,从而强化对红外单模态图像特征提取和RGB单模态图像特征提取。所述归一化函数为所述归一化函数为均值为0标准差为1的归一化操作。
本发明实施例提供的基于跨模态行人重识别方法,通过基于互学习策略的单模态身份嵌入模块,分别将红外图像特征输入归一化函数,获取红外单模态泛化特征;以及将RGB图像特征输入归一化函数,获取RGB单模态泛化特征,从而强化对红外单模态图像特征提取和RGB单模态图像特征提取,提高跨模态特征提取模型中同一行人图像特征的相似度,使得模态分布对齐效果较好,从而便于准确进行行人重识别。
基于上述实施例的内容,作为一种可选实施例,通过计算与样本图像对应的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征的损失来优化跨模态特征提取模型,包括:
根据交叉熵损失函数,分别获取跨模态泛化特征的第一损失结果,红外单模态泛化特征的第二损失结果,以及RGB单模态泛化特征的第三损失结果;
根据最小化分布距离的损失函数,获取红外单模态泛化特征和RGB单模态泛化特征的第四损失结果;
根据第一损失结果,第二损失结果,第三损失结果,以及第四损失结果之和,优化所述跨模态特征提取模型参数。
在本步骤中,由于跨模态特征提取模型是负责提取带有身份标识的行人图像的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征,为了保证准确提取特征,需要对跨模态特征提取模型的参数进行优化。
因此,本实施例通过根据交叉熵损失函数,获取跨模态泛化特征的第一损失结果,即通过
Figure PCTCN2021084753-appb-000001
获取第 一损失结果,其中
Figure PCTCN2021084753-appb-000002
表示第一损失结果,f表示特征,W a表示行人a参数,W j表示行人j参数,Np表示行人数量,Fi表示红外单模态泛化特征,Fv表示RGB单模态泛化特征。
同理,通过根据交叉熵损失函数,获取红外单模态泛化特征的第二损失结果,即通过
Figure PCTCN2021084753-appb-000003
获取第二损失结果,其中
Figure PCTCN2021084753-appb-000004
表示第二损失结果,f表示特征,W a表示行人a参数,W j表示行人j参数,Np表示行人数量,Fi表示红外单模态泛化特征。
通过根据交叉熵损失函数,获取RGB单模态泛化特征的第三损失结果,即通过
Figure PCTCN2021084753-appb-000005
获取第三损失结果,其中
Figure PCTCN2021084753-appb-000006
表示第三损失结果,f表示特征,W a表示行人a参数,W j表示行人j参数,Np表示行人数量,Fv表示红外单模态泛化特征。
根据最小化分布距离的损失函数,获取红外单模态泛化特征和RGB单模态泛化特征的第四损失结果,即通过
Figure PCTCN2021084753-appb-000007
Figure PCTCN2021084753-appb-000008
获取第四损失结果,其中
Figure PCTCN2021084753-appb-000009
表示第四损失结果,KL表示分布的KL距离,y表示行人概率,fi表示红外图像特征,fv表示RGB图像特征,Wi表示红外分支参数,Wv表示RGB分支参数。
然后,根据第一损失结果,第二损失结果,第三损失结果,以及第四损失结果之和,优化跨模态特征提取模型参数,从而使得跨模态特征提取模型能够准确提取特征。
本发明实施例提供的基于跨模态行人重识别方法,通过计算与样本图像对应的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征的损失来优化跨模态特征提取模型,使跨模态特征提取模型能够准确提取带有身份标识的行人图像的跨模态泛化特征、红外单模 态泛化特征和RGB单模态泛化特征,提高行人重识别结果的准确性。
基于上述实施例的内容,作为一种可选实施例,通过计算与样本图像对应的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征的损失来优化跨模态特征提取模型,还包括:
根据三元组损失函数,获取跨模态泛化特征的第五损失结果;
根据第一损失结果,第二损失结果,第三损失结果,第四损失结果,以及第五损失结果之和,优化跨模态特征提取模型参数。
在本步骤中,为了进一步缩小同一行人图像的特征差异,使得同一行人特征的相似度高于不同行人的相似度,本实施例根据三元组损失函数,获取跨模态泛化特征的第五损失结果,即通过三元组损失函数:
Figure PCTCN2021084753-appb-000010
获取第五损失结果,其中
Figure PCTCN2021084753-appb-000011
表示第五损失结果,a表示中心特征,p表示正例特征,n表示负例特征,ξ表示间距。
本发明实施例提供的基于跨模态行人重识别方法,结合基于三元组损失函数获取的第五损失结果来优化跨模态特征提取模型,使跨模态特征提取模型能够准确提取带有身份标识的行人图像的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征,提高行人重识别结果的准确性。
图3是本发明第二个实施例提供的基于跨模态行人重识别装置的结构示意图,如图3所示,本发明第二个实施例提供的基于跨模态行人重识别装置,包括:
第一获取单元310,用于获取带有身份标识的行人图像,将带有身份标识的行人图像输入跨模态特征提取模型中,确定带有身份标识的行人图像的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征;
第二获取单元320,用于获取待进行跨模态行人重识别的图像,确定待进行跨模态行人重识别的图像特征;
识别单元330,用于计算待进行跨模态行人重识别的图像特征,与带有身份标识的行人图像的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征的相似度,进行行人重识别;
其中,特征提取模块用于提取样本图像的红外图像特征和RGB图像特征;模态批归一化身份嵌入模块用于对红外图像特征和RGB图像特征进行归一化处理,获取跨模态泛化特征;基于互学习策略的单模态身份嵌入模块用于分别对红外图像特征进行归一化处理,获取红外单模态泛化特征,以及,对RGB图像特征进行归一化处理,获取RGB单模态泛化特征;
其中,在对跨模态特征提取模型进行训练时,通过计算与样本图像对应的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征的损失来优化跨模态特征提取模型,直至满足预设收敛条件。
本实施例所述的基于跨模态行人重识别装置可以用于执行上述第一个实施例所述的基于跨模态行人重识别方法,其原理和技术效果类似,此处不再详述。
基于上述实施例的内容,作为一种可选实施例,模态批归一化身份嵌入模块用于对红外图像特征和RGB图像特征进行归一化处理,获取跨模态泛化特征,包括:
模态批归一化身份嵌入模块用于将红外图像特征和RGB图像特征输入归一化函数,获取跨模态泛化特征。
本实施例所述的基于跨模态行人重识别装置可以用于执行上述第一个实施例所述的基于跨模态行人重识别方法,其原理和技术效果类似,此处不再详述。
基于上述实施例的内容,作为一种可选实施例,基于互学习策略 的单模态身份嵌入模块用于分别对红外图像特征进行归一化处理,获取红外单模态泛化特征,以及,对RGB图像特征进行归一化处理,获取RGB单模态泛化特征,包括:
基于互学习策略的单模态身份嵌入模块用于:
将红外图像特征输入归一化函数,获取红外单模态泛化特征;以及将RGB图像特征输入归一化函数,获取RGB单模态泛化特征。
本实施例所述的基于跨模态行人重识别装置可以用于执行上述第一个实施例所述的基于跨模态行人重识别方法,其原理和技术效果类似,此处不再详述。
图4是本发明第三个实施例提供的电子设备的结构示意图,如图4所示,该电子设备可以包括:处理器(processor)410、通信接口(Communications Interface)420、存储器(memory)430和通信总线440,其中,处理器410,通信接口420,存储器430通过通信总线440完成相互间的通信。处理器410可以调用存储器430中的逻辑指令,以执行基于跨模态行人重识别方法,该方法包括:获取带有身份标识的行人图像,将带有身份标识的行人图像输入跨模态特征提取模型中,确定带有身份标识的行人图像的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征;获取待进行跨模态行人重识别的图像,确定待进行跨模态行人重识别的图像特征;计算待进行跨模态行人重识别的图像特征,与带有身份标识的行人图像的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征的相似度,进行行人重识别。
此外,上述的存储器430中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品 的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
另一方面,本发明实施例还提供一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,计算机能够执行上述各方法实施例所提供的基于跨模态行人重识别方法,该方法包括:获取带有身份标识的行人图像,将带有身份标识的行人图像输入跨模态特征提取模型中,确定带有身份标识的行人图像的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征;获取待进行跨模态行人重识别的图像,确定待进行跨模态行人重识别的图像特征;计算待进行跨模态行人重识别的图像特征,与带有身份标识的行人图像的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征的相似度,进行行人重识别。
又一方面,本发明实施例还提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现以执行上述各实施例提供的基于跨模态行人重识别方法,该方法包括:获取带有身份标识的行人图像,将带有身份标识的行人图像输入跨模态特征提取模型中,确定带有身份标识的行人图像的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征;获取待进行跨模态行人重识别的图像,确定待进行跨模态行人重识别的图像特征;计算待进行跨模态行人重识别的图像特征,与带有身份标识的行人图像的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征的相似度, 进行行人重识别。
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。

Claims (10)

  1. 一种基于跨模态行人重识别方法,其特征在于,包括:
    获取带有身份标识的行人图像,将所述带有身份标识的行人图像输入跨模态特征提取模型中,确定带有身份标识的行人图像的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征;
    获取待进行跨模态行人重识别的图像,确定待进行跨模态行人重识别的图像特征;
    计算所述待进行跨模态行人重识别的图像特征,与带有身份标识的行人图像的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征的相似度,进行行人重识别;
    其中,所述跨模态特征提取模型是基于跨模态行人重识别样本图像进行训练得到的,包括特征提取模块、模态批归一化身份嵌入模块和基于互学习策略的单模态身份嵌入模块;
    其中,所述特征提取模块用于提取所述样本图像的红外图像特征和RGB图像特征;所述模态批归一化身份嵌入模块用于对红外图像特征和RGB图像特征进行归一化处理,获取跨模态泛化特征;所述基于互学习策略的单模态身份嵌入模块用于分别对红外图像特征进行归一化处理,获取红外单模态泛化特征,以及,对RGB图像特征进行归一化处理,获取RGB单模态泛化特征;
    其中,在对所述跨模态特征提取模型进行训练时,通过计算与样本图像对应的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征的损失来优化所述跨模态特征提取模型,直至满足预设收敛条件。
  2. 根据权利要求1所述的基于跨模态行人重识别方法,其特征在于,所述模态批归一化身份嵌入模块用于对红外图像特征和RGB图像特征进行归一化处理,获取跨模态泛化特征,包括:
    所述模态批归一化身份嵌入模块用于将红外图像特征和RGB图像特征输入归一化函数,获取跨模态泛化特征。
  3. 根据权利要求1所述的基于跨模态行人重识别方法,其特征在于,所述基于互学习策略的单模态身份嵌入模块用于分别对红外图像特征进行归一化处理,获取红外单模态泛化特征,以及,对RGB图像特征进行归一化处理,获取RGB单模态泛化特征,包括:
    所述基于互学习策略的单模态身份嵌入模块用于:
    将红外图像特征输入归一化函数,获取红外单模态泛化特征;以及将RGB图像特征输入归一化函数,获取RGB单模态泛化特征。
  4. 根据权利要求1所述的基于跨模态行人重识别方法,其特征在于,所述通过计算与样本图像对应的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征的损失来优化所述跨模态特征提取模型,包括:
    根据交叉熵损失函数,分别获取跨模态泛化特征的第一损失结果,红外单模态泛化特征的第二损失结果,以及RGB单模态泛化特征的第三损失结果;
    根据最小化分布距离的损失函数,获取红外单模态泛化特征和RGB单模态泛化特征的第四损失结果;
    根据第一损失结果,第二损失结果,第三损失结果,以及第四损失结果之和,优化所述跨模态特征提取模型参数。
  5. 根据权利要求4所述的基于跨模态行人重识别方法,其特征在于,所述通过计算与样本图像对应的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征的损失来优化所述跨模态特征提取模型,还包括:
    根据三元组损失函数,获取跨模态泛化特征的第五损失结果;
    根据第一损失结果,第二损失结果,第三损失结果,第四损失结 果,以及第五损失结果之和,优化所述跨模态特征提取模型参数。
  6. 一种基于跨模态行人重识别装置,其特征在于,包括:
    第一获取单元,用于获取带有身份标识的行人图像,将所述带有身份标识的行人图像输入跨模态特征提取模型中,确定带有身份标识的行人图像的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征;
    第二获取单元,用于获取待进行跨模态行人重识别的图像,确定待进行跨模态行人重识别的图像特征;
    识别单元,用于计算所述待进行跨模态行人重识别的图像特征,与带有身份标识的行人图像的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征的相似度,进行行人重识别;
    其中,所述特征提取模块用于提取所述样本图像的红外图像特征和RGB图像特征;所述模态批归一化身份嵌入模块用于对红外图像特征和RGB图像特征进行归一化处理,获取跨模态泛化特征;所述基于互学习策略的单模态身份嵌入模块用于分别对红外图像特征进行归一化处理,获取红外单模态泛化特征,以及,对RGB图像特征进行归一化处理,获取RGB单模态泛化特征;
    其中,在对所述跨模态特征提取模型进行训练时,通过计算与样本图像对应的跨模态泛化特征、红外单模态泛化特征和RGB单模态泛化特征的损失来优化所述跨模态特征提取模型,直至满足预设收敛条件。
  7. 根据权利要求6所述的基于跨模态行人重识别装置,其特征在于,所述模态批归一化身份嵌入模块用于对红外图像特征和RGB图像特征进行归一化处理,获取跨模态泛化特征,包括:
    所述模态批归一化身份嵌入模块用于将红外图像特征和RGB图像特征输入归一化函数,获取跨模态泛化特征。
  8. 根据权利要求6所述的基于跨模态行人重识别装置,其特征在于,所述基于互学习策略的单模态身份嵌入模块用于分别对红外图像特征进行归一化处理,获取红外单模态泛化特征,以及,对RGB图像特征进行归一化处理,获取RGB单模态泛化特征,包括:
    所述基于互学习策略的单模态身份嵌入模块用于:
    将红外图像特征输入归一化函数,获取红外单模态泛化特征;以及将RGB图像特征输入归一化函数,获取RGB单模态泛化特征。
  9. 一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现如权利要求1至5任一项所述基于跨模态行人重识别方法的步骤。
  10. 一种非暂态计算机可读存储介质,其上存储有计算机程序,其特征在于,该计算机程序被处理器执行时实现如权利要求1至5任一项所述基于跨模态行人重识别方法的步骤。
PCT/CN2021/084753 2020-08-04 2021-03-31 基于跨模态行人重识别方法及装置 WO2022027986A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010772750.6A CN112016401B (zh) 2020-08-04 2020-08-04 基于跨模态行人重识别方法及装置
CN202010772750.6 2020-08-04

Publications (1)

Publication Number Publication Date
WO2022027986A1 true WO2022027986A1 (zh) 2022-02-10

Family

ID=73498983

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/084753 WO2022027986A1 (zh) 2020-08-04 2021-03-31 基于跨模态行人重识别方法及装置

Country Status (2)

Country Link
CN (1) CN112016401B (zh)
WO (1) WO2022027986A1 (zh)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114550210A (zh) * 2022-02-21 2022-05-27 中国科学技术大学 基于模态自适应混合和不变性卷积分解的行人重识别方法
CN114663737A (zh) * 2022-05-20 2022-06-24 浪潮电子信息产业股份有限公司 物体识别方法、装置、电子设备及计算机可读存储介质
CN114694185A (zh) * 2022-05-31 2022-07-01 浪潮电子信息产业股份有限公司 一种跨模态目标重识别方法、装置、设备及介质
CN114882525A (zh) * 2022-04-21 2022-08-09 中国科学技术大学 基于模态特定记忆网络的跨模态行人重识别方法
CN114998925A (zh) * 2022-04-22 2022-09-02 四川大学 一种面向孪生噪声标签的鲁棒跨模态行人重识别方法
CN116311387A (zh) * 2023-05-25 2023-06-23 浙江工业大学 一种基于特征交集的跨模态行人重识别方法
CN116612439A (zh) * 2023-07-20 2023-08-18 华侨大学 模态域适应性和特征鉴别性平衡方法及行人再辨识方法
CN116682144A (zh) * 2023-06-20 2023-09-01 北京大学 一种基于多层次跨模态差异调和的多模态行人重识别方法
CN116861361A (zh) * 2023-06-27 2023-10-10 河海大学 一种基于图像-文本多模态融合的大坝形变评估方法
CN117422963A (zh) * 2023-09-11 2024-01-19 南通大学 基于高维度特征映射和特征聚合的跨模态地点识别方法
CN117935172A (zh) * 2024-03-21 2024-04-26 南京信息工程大学 一种基于光谱信息过滤的可见光红外行人重识别方法及系统
CN118015694A (zh) * 2024-01-06 2024-05-10 哈尔滨理工大学 一种基于分区感知网络的可见光-红外跨模态行人图像检索的方法

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112016401B (zh) * 2020-08-04 2024-05-17 杰创智能科技股份有限公司 基于跨模态行人重识别方法及装置
CN112380369B (zh) * 2021-01-15 2021-05-28 长沙海信智能系统研究院有限公司 图像检索模型的训练方法、装置、设备和存储介质
CN113837024A (zh) * 2021-09-02 2021-12-24 北京新橙智慧科技发展有限公司 一种基于多模态的跨境追踪方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598654A (zh) * 2019-09-18 2019-12-20 合肥工业大学 多粒度交叉模态特征融合行人再识别方法和再识别系统
WO2020083831A1 (en) * 2018-10-22 2020-04-30 Future Health Works Ltd. Computer based object detection within a video or image
CN111260594A (zh) * 2019-12-22 2020-06-09 天津大学 一种无监督的多模态图像融合方法
CN112016401A (zh) * 2020-08-04 2020-12-01 杰创智能科技股份有限公司 基于跨模态行人重识别方法及装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180173940A1 (en) * 2016-12-19 2018-06-21 Canon Kabushiki Kaisha System and method for matching an object in captured images
CN109740413B (zh) * 2018-11-14 2023-07-28 平安科技(深圳)有限公司 行人重识别方法、装置、计算机设备及计算机存储介质
CN109635728B (zh) * 2018-12-12 2020-10-13 中山大学 一种基于非对称度量学习的异构行人再识别方法
CN110909605B (zh) * 2019-10-24 2022-04-26 西北工业大学 基于对比相关的跨模态行人重识别方法
CN111325115B (zh) * 2020-02-05 2022-06-21 山东师范大学 带有三重约束损失的对抗跨模态行人重识别方法和系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020083831A1 (en) * 2018-10-22 2020-04-30 Future Health Works Ltd. Computer based object detection within a video or image
CN110598654A (zh) * 2019-09-18 2019-12-20 合肥工业大学 多粒度交叉模态特征融合行人再识别方法和再识别系统
CN111260594A (zh) * 2019-12-22 2020-06-09 天津大学 一种无监督的多模态图像融合方法
CN112016401A (zh) * 2020-08-04 2020-12-01 杰创智能科技股份有限公司 基于跨模态行人重识别方法及装置

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114550210B (zh) * 2022-02-21 2024-04-02 中国科学技术大学 基于模态自适应混合和不变性卷积分解的行人重识别方法
CN114550210A (zh) * 2022-02-21 2022-05-27 中国科学技术大学 基于模态自适应混合和不变性卷积分解的行人重识别方法
CN114882525B (zh) * 2022-04-21 2024-04-02 中国科学技术大学 基于模态特定记忆网络的跨模态行人重识别方法
CN114882525A (zh) * 2022-04-21 2022-08-09 中国科学技术大学 基于模态特定记忆网络的跨模态行人重识别方法
CN114998925A (zh) * 2022-04-22 2022-09-02 四川大学 一种面向孪生噪声标签的鲁棒跨模态行人重识别方法
CN114998925B (zh) * 2022-04-22 2024-04-02 四川大学 一种面向孪生噪声标签的鲁棒跨模态行人重识别方法
CN114663737A (zh) * 2022-05-20 2022-06-24 浪潮电子信息产业股份有限公司 物体识别方法、装置、电子设备及计算机可读存储介质
CN114694185B (zh) * 2022-05-31 2022-11-04 浪潮电子信息产业股份有限公司 一种跨模态目标重识别方法、装置、设备及介质
CN114694185A (zh) * 2022-05-31 2022-07-01 浪潮电子信息产业股份有限公司 一种跨模态目标重识别方法、装置、设备及介质
CN116311387B (zh) * 2023-05-25 2023-09-01 浙江工业大学 一种基于特征交集的跨模态行人重识别方法
CN116311387A (zh) * 2023-05-25 2023-06-23 浙江工业大学 一种基于特征交集的跨模态行人重识别方法
CN116682144A (zh) * 2023-06-20 2023-09-01 北京大学 一种基于多层次跨模态差异调和的多模态行人重识别方法
CN116682144B (zh) * 2023-06-20 2023-12-22 北京大学 一种基于多层次跨模态差异调和的多模态行人重识别方法
CN116861361A (zh) * 2023-06-27 2023-10-10 河海大学 一种基于图像-文本多模态融合的大坝形变评估方法
CN116861361B (zh) * 2023-06-27 2024-05-03 河海大学 一种基于图像-文本多模态融合的大坝形变评估方法
CN116612439B (zh) * 2023-07-20 2023-10-31 华侨大学 模态域适应性和特征鉴别性平衡方法及行人再辨识方法
CN116612439A (zh) * 2023-07-20 2023-08-18 华侨大学 模态域适应性和特征鉴别性平衡方法及行人再辨识方法
CN117422963A (zh) * 2023-09-11 2024-01-19 南通大学 基于高维度特征映射和特征聚合的跨模态地点识别方法
CN118015694A (zh) * 2024-01-06 2024-05-10 哈尔滨理工大学 一种基于分区感知网络的可见光-红外跨模态行人图像检索的方法
CN117935172A (zh) * 2024-03-21 2024-04-26 南京信息工程大学 一种基于光谱信息过滤的可见光红外行人重识别方法及系统

Also Published As

Publication number Publication date
CN112016401B (zh) 2024-05-17
CN112016401A (zh) 2020-12-01

Similar Documents

Publication Publication Date Title
WO2022027986A1 (zh) 基于跨模态行人重识别方法及装置
Günther et al. Unconstrained face detection and open-set face recognition challenge
WO2020125216A1 (zh) 一种行人重识别方法、装置、电子设备及计算机可读存储介质
Lavi et al. Survey on deep learning techniques for person re-identification task
Xia et al. Toward kinship verification using visual attributes
Kang et al. Pairwise relational networks for face recognition
CN109376604B (zh) 一种基于人体姿态的年龄识别方法和装置
Kavitha et al. Evaluation of distance measures for feature based image registration using alexnet
WO2016145940A1 (zh) 人脸认证方法和装置
US10445602B2 (en) Apparatus and method for recognizing traffic signs
CN111611874B (zh) 基于ResNet和Canny的人脸口罩佩戴检测方法
CN111310662B (zh) 一种基于集成深度网络的火焰检测识别方法及系统
Molina-Moreno et al. Efficient scale-adaptive license plate detection system
CN104281572B (zh) 一种基于互信息的目标匹配方法及其系统
CN106485253B (zh) 一种最大粒度结构描述符的行人再辨识方法
CN110909618A (zh) 一种宠物身份的识别方法及装置
CN105760858A (zh) 一种基于类Haar中间层滤波特征的行人检测方法及装置
CN112016402A (zh) 基于无监督学习的行人重识别领域自适应方法及装置
Demirkus et al. Hierarchical temporal graphical model for head pose estimation and subsequent attribute classification in real-world videos
CN109492528A (zh) 一种基于高斯和深度特征的行人再识别方法
CN111666976A (zh) 基于属性信息的特征融合方法、装置和存储介质
CN110619280A (zh) 一种基于深度联合判别学习的车辆重识别方法及装置
Zhang et al. A deep neural network-based vehicle re-identification method for bridge load monitoring
Narang et al. Robust face recognition method based on SIFT features using Levenberg-Marquardt Backpropagation neural networks
Li et al. Reliable line segment matching for multispectral images guided by intersection matches

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21853157

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 28.06.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21853157

Country of ref document: EP

Kind code of ref document: A1