WO2023279935A1 - Target re-recognition model training method and device, and target re-recognition method and device - Google Patents

Target re-recognition model training method and device, and target re-recognition method and device Download PDF

Info

Publication number
WO2023279935A1
WO2023279935A1 PCT/CN2022/099257 CN2022099257W WO2023279935A1 WO 2023279935 A1 WO2023279935 A1 WO 2023279935A1 CN 2022099257 W CN2022099257 W CN 2022099257W WO 2023279935 A1 WO2023279935 A1 WO 2023279935A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
feature
loss value
initial
identification model
Prior art date
Application number
PCT/CN2022/099257
Other languages
French (fr)
Chinese (zh)
Inventor
刘武
梅涛
Original Assignee
京东科技信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东科技信息技术有限公司 filed Critical 京东科技信息技术有限公司
Publication of WO2023279935A1 publication Critical patent/WO2023279935A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Definitions

  • the present disclosure relates to the technical field of image recognition, and in particular, to a training method of an object re-identification model, an object re-identification method, a device, electronic equipment, a storage medium, a computer program product, and a computer program.
  • video surveillance cameras are placed in various environmental scenes of life and work. Common cameras use color video during the day and infrared video at night to record information around the clock.
  • the cross-modal target re-identification aims to match the targets in the three primary color images (Red Green Blue, RGB) collected by the visible light camera and the infrared images (Infrared Radiation, IR) collected by the infrared camera. Since images of different modalities (RGB and IR) are heterogeneous, modality differences degrade the performance of matching.
  • RGB Red Green Blue
  • IR Infrared Radiation
  • Embodiments of the present disclosure propose a training method for a target re-identification model, a target re-identification method, a device, an electronic device, a storage medium, a computer program product, and a computer program, aiming to solve technical problems in related technologies at least to a certain extent one.
  • the embodiment of the first aspect of the present disclosure proposes a training method for a target re-identification model, including: acquiring multiple images, the multiple images respectively have corresponding multiple modalities and corresponding multiple labeled target categories; Multiple convolutional feature maps corresponding to the modalities respectively, and obtain multiple edge feature maps corresponding to the multiple modalities respectively; obtain various feature distance information corresponding to the multiple modalities respectively; and according to multiple images, multiple Convolutional feature maps, multiple edge feature maps, multiple feature distance information, and multiple labeled target categories are used to train the initial re-identification model to obtain the target re-identification model.
  • the training initial The re-identification model to obtain the target re-identification model including:
  • the initial re-identification model is trained according to the initial loss value, the perceptual edge loss value, and the cross-modal center comparison loss value, so as to obtain the target re-identification model.
  • the initial re-identification model includes: a first network structure for identifying perceptual loss values between the convolutional feature map and the edge feature map.
  • the processing of the plurality of convolutional feature maps and the plurality of edge feature maps using the initial re-identification model to obtain perceptual edge loss values includes:
  • the perceptual edge loss value is generated based on the plurality of first perceptual edge loss values and the plurality of second perceptual edge loss values.
  • the initial re-identification model includes: a batch normalization layer, and the acquisition of various feature distance information corresponding to the various modalities includes:
  • the process of using the initial re-identification model to process the various feature distance information to obtain a cross-modal center comparison loss value includes:
  • the first target distance is the first distance with the smallest median value among the multiple first distances
  • the cross-modal center comparison loss value is calculated according to the first target distance, multiple second distances, and the number of targets.
  • the initial re-identification model includes: a sequentially connected fully connected layer and an output layer, and the processing of the multiple images using the initial re-identification model to obtain an initial loss value includes:
  • An identity loss value is generated according to the plurality of category feature vectors and the corresponding encoding vectors, and the identity loss value is used as the initial loss value.
  • the processing of the plurality of images using the initial re-identification model to obtain an initial loss value includes:
  • the triplet sample set including: the plurality of images, the plurality of first images, and the plurality of second images, the multiple first images correspond to the same labeled target category, and the multiple second images correspond to different labeled target categories;
  • a ternary loss value is determined according to the plurality of first Euclidean distances and the plurality of second Euclidean distances, and the ternary loss value is used as the initial loss value.
  • the initial re-identification model is trained according to the initial loss value, the perceptual edge loss value, and the cross-modal center comparison loss value to obtain the target re-identification model, include:
  • the re-identification model obtained through training is used as the target re-identification model.
  • the plurality of modalities include: a color image modality and an infrared image modality.
  • the embodiment of the second aspect of the present disclosure proposes a target re-identification method, including: acquiring a reference image and an image to be recognized.
  • the modalities of the reference image and the image to be recognized are different.
  • the reference image includes: a reference category;
  • the recognition images are respectively input into the target re-recognition model trained by the above-mentioned target re-recognition model training method to obtain the target corresponding to the image to be recognized output by the target re-recognition model.
  • the target has a corresponding target category, and the target category and The reference category matches.
  • the embodiment of the third aspect of the present disclosure proposes a training device for a target re-identification model, including: a first acquisition module, configured to acquire multiple images, each of which has corresponding multiple modalities and corresponding multiple annotations The target category; the second acquisition module is used to obtain multiple convolutional feature maps corresponding to multiple modalities, and to obtain multiple edge feature maps corresponding to multiple modalities; the third acquisition module is used to obtain A variety of feature distance information corresponding to a variety of modalities; and a training module, used for multiple images, multiple convolution feature maps, multiple edge feature maps, multiple feature distance information, and multiple labeling target categories Train the initial re-identification model to obtain the target re-identification model.
  • the training module includes:
  • the first processing submodule is used to process the plurality of images using the initial re-identification model to obtain an initial loss value
  • the second processing submodule is used to process the plurality of convolutional feature maps and the plurality of edge feature maps using the initial re-identification model to obtain a perceptual edge loss value;
  • the third processing submodule is used to process the various feature distance information using the initial re-identification model to obtain a cross-modal center comparison loss value
  • the training submodule is configured to train the initial re-identification model according to the initial loss value, the perceptual edge loss value, and the cross-modal center comparison loss value, so as to obtain the target re-identification model.
  • the initial re-identification model includes: a first network structure for identifying perceptual loss values between the convolutional feature map and the edge feature map.
  • the second processing submodule is specifically used for:
  • the perceptual edge loss value is generated based on the plurality of first perceptual edge loss values and the plurality of second perceptual edge loss values.
  • the initial re-identification model includes: a batch normalization layer
  • the third acquisition module includes:
  • a normalization processing submodule configured to input the plurality of images into the batch normalization layer respectively, so as to obtain a plurality of feature vectors respectively corresponding to the plurality of images output by the batch normalization layer;
  • a central point determination submodule configured to determine, according to the multiple feature vectors, the feature center points of multiple targets corresponding to the multiple images
  • a distance determining submodule configured to determine a first distance between feature center points of different targets, and determine a second distance between feature center points corresponding to different modes of the same target, the first The first distance and the second distance together constitute the various kinds of characteristic distance information.
  • the third processing submodule is specifically configured to: use the initial re-identification model to determine a first target distance from multiple first distances, and the first target distance is the multiple first said first distance having the smallest distance from the median;
  • the cross-modal center contrast loss value is calculated according to the first target distance and a plurality of the second distances, and the number of targets.
  • the initial re-identification model includes: a sequentially connected fully connected layer and an output layer, and the first processing submodule is specifically used for:
  • An identity loss value is generated according to the plurality of category feature vectors and the corresponding encoding vectors, and the identity loss value is used as the initial loss value.
  • the first processing submodule is specifically used for:
  • the triplet sample set including: the plurality of images, the plurality of first images, and the plurality of second images, the multiple first images correspond to the same labeled target category, and the multiple second images correspond to different labeled target categories;
  • a ternary loss value is determined according to the plurality of first Euclidean distances and the plurality of second Euclidean distances, and the ternary loss value is used as the initial loss value.
  • the training submodule is specifically used for:
  • the re-identification model obtained through training is used as the target re-identification model.
  • the plurality of modalities include: a color image modality and an infrared image modality.
  • the embodiment of the fourth aspect of the present disclosure proposes a target re-identification device, including: a fourth acquisition module, configured to acquire a reference image and an image to be recognized.
  • the modalities of the reference image and the image to be recognized are different, and the reference image includes: Category; recognition module, for respectively inputting the reference image and the image to be recognized into the target re-recognition model trained by the above-mentioned target re-recognition model training method, so as to obtain the output corresponding to the image to be recognized by the target re-recognition model
  • a target, a target has a corresponding target class, and the target class matches the reference class.
  • the embodiment of the fifth aspect of the present disclosure provides an electronic device, including: at least one processor; and a memory connected to the at least one processor in communication; wherein, the memory stores information that can be executed by the at least one processor. instructions, the instructions are executed by the at least one processor, so that the at least one processor can execute the method for training a target re-identification model described in any one of the embodiments of the present disclosure, or execute any of the embodiments of the present disclosure A method for object re-identification described herein.
  • the embodiment of the sixth aspect of the present disclosure provides a non-transitory computer-readable storage medium storing computer instructions, the computer instructions are used to make the computer execute the object re-identification model described in any one of the embodiments of the present disclosure.
  • the embodiment of the seventh aspect of the present disclosure provides a computer program product, the computer program product includes computer program code, when the computer program code is run on the computer, to execute any one of the embodiments of the present disclosure.
  • the embodiment of the eighth aspect of the present disclosure provides a computer program, the computer program includes computer program code, when the computer program code is run on the computer, so that the computer executes the object described in any one of the embodiments of the present disclosure
  • FIG. 1 is a schematic flowchart of a method for training a target re-identification model according to an embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of a network structure providing a re-identification model according to an embodiment of the present disclosure
  • FIG. 3 is a schematic flowchart of a method for training a target re-identification model according to another embodiment of the present disclosure
  • FIG. 4 is a schematic structural diagram of a first network structure provided according to an embodiment of the present disclosure.
  • Fig. 5 is a schematic diagram of a feature space structure of a target provided according to an embodiment of the present disclosure
  • FIG. 6 is a schematic flowchart of a method for training a target re-identification model according to another embodiment of the present disclosure
  • Fig. 7 is a training flowchart of a target re-identification model provided according to an embodiment of the present disclosure
  • FIG. 8 is a schematic flowchart of a method for re-identifying a target according to another embodiment of the present disclosure.
  • FIG. 9 is a schematic diagram of a training device for a target re-identification model provided according to another embodiment of the present disclosure.
  • FIG. 10 is a schematic diagram of a training device for a target re-identification model provided according to another embodiment of the present disclosure.
  • Fig. 11 is a schematic diagram of a target re-identification device provided according to another embodiment of the present disclosure.
  • Figure 12 shows a block diagram of an exemplary computer device suitable for implementing embodiments of the present disclosure.
  • the technical solution of the embodiment of the present disclosure provides a The training method of the target re-identification model will be described below in conjunction with specific embodiments.
  • the execution subject of the training method of the target re-identification model in the embodiment of the present disclosure may be the training device of the target re-recognition model, and the device may be realized by software and/or hardware, and the device may be configured in In the electronic equipment, the electronic equipment may include but not limited to a terminal, a server, and the like.
  • FIG. 1 is a schematic flowchart of a method for training an object re-identification model according to an embodiment of the present disclosure. Referring to Fig. 1, the method includes step S101 to step S104.
  • S101 Acquire multiple images, each of which has multiple corresponding modalities and multiple corresponding labeled target categories.
  • the multiple images may be images collected by an image collection device in any possible scene, or may also be images obtained from the Internet, which is not limited.
  • Multiple images have multiple modes, such as: color image mode, infrared image mode, and any other possible image modes, where the color image mode can be RGB mode, infrared image mode It can be an IR mode, and there is no limitation regarding the various modes.
  • multiple images in the embodiments of the present disclosure may have RGB modality and IR modality.
  • an image acquisition device such as a camera
  • RGB mode color images or video frames
  • IR mode infrared images or video frames
  • target objects there may be multiple target objects in multiple images, for example: pedestrians, vehicles, and any other possible target objects. More specifically, multiple target objects may also be pedestrian 1, pedestrian 2, vehicle 1, vehicle 2, etc., different Pedestrians or vehicles may correspond to different categories, that is to say, the embodiments of the present disclosure may collect multiple images of various modalities for different target objects.
  • the information used to label the category of the target object can be called the labeling target category, where the labeling target category can be in the form of a score, for example, and different scores represent different types of target objects.
  • the labeling target category By labeling the target category, you can Differentiate target objects in multiple images.
  • multiple images can also be divided into a training set (train set) and a test set (test set), which includes the image and the labeled target category corresponding to the image.
  • train set training set
  • test set test set
  • S102 Acquire multiple convolutional feature maps corresponding to multiple modalities, and acquire multiple edge feature maps respectively corresponding to multiple modalities.
  • multiple convolution feature maps and multiple edge feature maps respectively corresponding to multiple modalities are further acquired.
  • the feature map obtained by performing convolution operations on images of various modalities may be called a convolution feature map.
  • Embodiments of the present disclosure can use any one or more convolutional layers in the neural network to perform convolution operations on images of various modalities, for example: use the ResNet Layer0 layer of the residual neural network to extract the multiple convolutional feature maps, or Multiple convolutional feature maps can also be obtained in any other possible way, without limitation.
  • the edge feature map can represent the edge contour information of the target object in images of various modalities.
  • a Sobel operator Sobel operator
  • the embodiment of the present disclosure can use the edge contour information of the target object as a guide during the model training process, and optimize the characteristic feature space, thereby realizing Mining of common features between modalities.
  • multiple feature distance information corresponding to multiple modalities is further acquired.
  • the various feature distance information can be the distance between the feature center points of targets of different marked target categories, and/or the distance between the feature center points of the same target corresponding to different modalities, or any other possible The feature distance information of , without limitation.
  • multiple feature vectors corresponding to multiple images can be determined first, and further the feature center point can be determined according to the multiple feature vectors, so that the multiple feature vectors can be determined according to the feature center point.
  • the feature center point can be determined according to the multiple feature vectors, so that the multiple feature vectors can be determined according to the feature center point.
  • S104 Train an initial re-identification model according to multiple images, multiple convolutional feature maps, multiple edge feature maps, multiple feature distance information, and multiple labeled target categories to obtain a target re-identification model.
  • the re-identification model in the embodiment of the present disclosure may be based on a convolutional neural network structure, specifically, a residual neural network ResNet50 may be used as the backbone network of the re-identification model.
  • Fig. 2 is a schematic diagram of a network structure providing a re-identification model according to an embodiment of the present disclosure.
  • the embodiment of the present disclosure can divide ResNet50 into two parts, wherein, the convolutional layer (ResNet Layer0) of the initial stage can adopt a dual-stream design, and the convolutional layer (ResNet Layer1-4) of the next four stages ) can use the strategy of dual-stream shared weights to uniformly extract the information of the two modalities.
  • the convolutional layer (ResNet Layer0) of the initial stage can adopt a dual-stream design
  • the convolutional layer (ResNet Layer1-4) of the next four stages ) can use the strategy of dual-stream shared weights to uniformly extract the information of the two modalities.
  • the parameters of the initial re-identification model can be adjusted according to the relationship between multiple images, multiple convolutional feature maps, multiple edge feature maps, multiple feature distance information, and multiple labeled target categories. Optimize and adjust until the model converges to obtain the target re-identification model.
  • the multiple images respectively have corresponding multiple modalities and corresponding multiple labeled target categories, and multiple convolutional feature maps corresponding to the multiple modalities are acquired, And obtain multiple edge feature maps corresponding to multiple modalities, and obtain multiple feature distance information corresponding to multiple modalities, and according to multiple images, multiple convolution feature maps, and multiple edge feature maps , a variety of feature distance information, and multiple labeled target categories to train the initial re-identification model to obtain the target re-identification model. Therefore, the trained re-identification model can fully mine the features in various modal images and can enhance different modalities. Improve the accuracy of image matching, thereby improving the effect of cross-modal target re-identification. Furthermore, it solves the technical problem that the network model existing in the related technology is not sufficient for feature mining in multi-modal images, which affects the effect of cross-modal target re-identification.
  • Fig. 3 is a schematic flowchart of a method for training an object re-identification model according to another embodiment of the present disclosure. Referring to Fig. 3, the method includes step S301 to step S307.
  • S301 Acquire multiple images, each of which has multiple corresponding modalities and multiple corresponding labeled target categories.
  • S302 Acquire multiple convolutional feature maps corresponding to multiple modalities, and acquire multiple edge feature maps corresponding to multiple modalities.
  • S303 Acquire various feature distance information respectively corresponding to multiple modalities.
  • S304 Process multiple images using the initial re-identification model to obtain an initial loss value.
  • the identity loss function (Id Loss) can be used to calculate the initial loss of the initial re-identification model value, or other loss functions can be used to determine the initial loss value, which is not limited.
  • the initial re-identification model may include sequentially connected fully connected layers (fully connected layers, FC) and output layer (for example: Softmax classifier), when using the initial re-identification model
  • FC fully connected layers
  • Softmax classifier for example: Softmax classifier
  • a batch contains B pictures, so that Represents one of the RGB or IR images, then i ⁇ 1,2,...,B ⁇ .
  • the obtained vector can be called a category feature vector, and the category feature vector can be represented by p i , for example, then multiple categories corresponding to multiple images
  • the eigenvectors are expressed as where j ⁇ ⁇ 1,2,...,N ⁇ , N is the number of object categories in multiple images.
  • a plurality of encoding vectors respectively corresponding to a plurality of labeled object categories are determined, for example, one-hot encoding (one-hot) can be used to encode a plurality of labeled object categories to obtain an encoded vector, and the encoded vector can be represented by y i Indicates that multiple encoded vectors can be expressed as
  • identity loss values are generated according to multiple category feature vectors and corresponding multiple encoding vectors, that is to say, the embodiments of the present disclosure can use the identity loss function (Id Loss) to compare multiple category feature vectors and The corresponding multiple encoding vectors are calculated to obtain the identity loss value, and the identity loss value is used as the initial loss value.
  • Id Loss identity loss function
  • the identity loss function Id Loss can be expressed as:
  • the identity loss value is used as the initial loss value, which can make the model have a good pedestrian re-identification effect.
  • the initial re-identification model may include a first network structure
  • FIG. 4 is a schematic structural diagram of the first network structure provided according to an embodiment of the present disclosure.
  • the first network structure can be, for example, a deep convolutional neural network VGGNet-16, which can identify perceptual loss values between edge convolutional feature maps and edge feature maps.
  • VGGNet-16 as the first network structure can deeply identify the loss between the convolutional feature map and the edge feature map, thereby improving the accuracy of the perceived loss value.
  • a plurality of convolution feature map parameters respectively corresponding to the plurality of convolution loss feature maps are determined, and a plurality of edge feature map parameters respectively corresponding to the plurality of edge loss feature maps are determined.
  • ⁇ t (z) denote multiple convolution loss feature maps and multiple edge loss feature maps extracted by the first network structure of the 0-t stage, assuming convolution loss feature maps and edge loss feature maps The shape is C t ⁇ H t ⁇ W t , then C t ⁇ H t ⁇ W t can be used as the feature map parameters of the convolution loss feature map and the edge loss feature map.
  • z denote the input convolutional feature map and edge loss feature map, respectively.
  • the corresponding plurality of convolution loss feature maps are processed according to the plurality of convolution feature map parameters to obtain a plurality of first perceptual edge loss values, and the corresponding plurality of edges are processed according to the plurality of edge feature map parameters Loss feature map to get multiple second perceptual edge loss values.
  • the first perceptual edge loss value can be expressed as:
  • the second perceptual edge loss value can be expressed as:
  • the perceptual edge loss value is generated according to multiple first perceptual edge loss values and multiple second perceptual edge loss values, for example: the sum of the first perceptual edge loss value and the second perceptual edge loss value, as the perceptual edge loss value.
  • the perceptual edge loss value is expressed as
  • the edge information of the image can be used as a guide to mine the common information in the modality characteristic space, reducing the difference between different modalities, thereby improving the The effect of modal object re-identification.
  • Embodiments of the present disclosure may also use an initial re-identification model to process various feature distance information, so as to obtain cross-modal center comparison loss values.
  • Fig. 5 is a schematic diagram of a feature space structure of an object provided according to an embodiment of the present disclosure.
  • the cross-modal center contrast loss can act on the common feature space of the modality.
  • the initial re-identification model can be used to process various feature distance information, for example: to process the features of different types of targets The distance between the center points, or the distance between the center points of the features corresponding to different modalities for the same category of targets, to obtain the cross-modal center comparison loss value.
  • S307 Train an initial re-identification model according to the initial loss value, perceptual edge loss value, and cross-modal center comparison loss value, so as to obtain a target re-identification model.
  • the target loss value may be firstly generated according to the initial loss value, perceptual edge loss value, and cross-modal center comparison loss value.
  • the target loss value may be, for example, the initial loss value, perceptual edge loss value, and cross-modal loss value.
  • the sum of state center comparison loss values, the target loss value can be expressed as: in, represents the perceptual edge loss value, represents the initial loss value, Can represent cross-modal center contrast loss values.
  • the initial re-identification model is trained according to the target loss value, that is, the parameters of the re-identification model are adjusted according to the target loss value until the target loss value meets the set conditions, for example: if the model convergence condition is met, the training The obtained re-identification model is used as the target re-identification model. Therefore, in the process of model training, combined with multi-task loss (that is, multiple loss values), the modal characteristic feature space and common feature space are optimized and adjusted, which enhances the cross-modal feature extraction ability of the model, and It can make the model extract more discriminative features, and can meet the requirements of cross-modal target re-identification for features, thereby improving the effect of target re-identification.
  • multi-task loss that is, multiple loss values
  • the multiple images respectively have corresponding multiple modalities and corresponding multiple labeled target categories, and multiple convolutional feature maps corresponding to the multiple modalities are acquired, And obtain multiple edge feature maps corresponding to multiple modalities, and obtain multiple feature distance information corresponding to multiple modalities, and according to multiple images, multiple convolution feature maps, and multiple edge feature maps , a variety of feature distance information, and multiple labeled target categories to train the initial re-identification model to obtain the target re-identification model. Therefore, the trained re-identification model can fully mine the features in various modal images and can enhance different modalities. Improve the accuracy of image matching, thereby improving the effect of cross-modal target re-identification.
  • the network model existing in the related technology is not sufficient for feature mining in multi-modal images, which affects the effect of cross-modal target re-identification.
  • using the identity loss value as the initial loss value can make the model have a better person re-identification effect.
  • VGGNet-16 as the first network structure can deeply identify the loss between the convolutional feature map and the edge feature map, thereby improving the accuracy of the perceived loss value.
  • the multi-task loss (that is, multiple loss values) is combined to optimize and adjust the modal characteristic feature space and common feature space, which enhances the cross-modal feature extraction ability of the model, and can The model extracts more discriminative features, which can meet the feature requirements of cross-modal target re-identification, thereby improving the effect of target re-identification.
  • Fig. 6 is a schematic flowchart of a method for training an object re-identification model according to another embodiment of the present disclosure. Referring to Fig. 6, the method includes step S601 to step S610.
  • S601 Acquire multiple images, each of which has corresponding multiple modalities and multiple corresponding labeled target categories.
  • S602 Acquire multiple convolutional feature maps corresponding to multiple modalities, and acquire multiple edge feature maps corresponding to multiple modalities.
  • S603 Input multiple images into the batch normalization layer respectively, so as to obtain multiple feature vectors respectively corresponding to the multiple images output by the batch normalization layer.
  • the initial re-identification model also includes a batch normalization layer (Batch Normalization, BN).
  • Batch Normalization BN
  • the batch normalization layer In the operation of obtaining various feature distance information corresponding to various modalities, firstly A plurality of images are respectively input into the batch normalization layer, so as to obtain a plurality of feature vectors respectively corresponding to the plurality of images output by the BN layer (for example represented by f i m ).
  • S604 Determine, according to the multiple feature vectors, feature center points of multiple targets respectively corresponding to the multiple images.
  • S605 Determine the first distance between the feature center points of different targets, and determine the second distance between the feature center points corresponding to different modalities of the same target, the first distance and the second distance together constitute a variety of feature distance information .
  • the first distance may be represented by d inter .
  • the second distance between the center points of the features corresponding to different modalities of the same target that is, to determine the distance between the centers of the features of the two modalities of the target of the same category, which can be represented by d intra distance
  • the first distance and the second distance jointly constitute a variety of characteristic distance information. Therefore, determining a variety of feature distance information through the relationship between the feature center points of the target can constrain the relationship between the mode center and the category center, and can well adjust the feature extraction ability of the model.
  • S606 Process multiple images using the initial re-identification model to obtain an initial loss value.
  • multiple images may also be divided into multiple images with reference to multiple labeled target categories to obtain a triplet sample set, which may include: multiple images (using ), multiple first images (indicated by ), and multiple second images (with express), Multiple first images in the set correspond to the same labeled target category, A plurality of second images in the set correspond to different labeled target categories, and can constitute a positive sample pair, and Negative sample pairs can be formed.
  • the first Euclidean distance between the feature vector of the image and the feature vector of the first image is determined, and the feature vector is output by the batch normalization layer, that is to say, the batch normalization layer (BN) can be used to The distance between the eigenvectors and the eigenvectors of the first image is calculated to obtain the first Euclidean distance.
  • the batch normalization layer that is to say, the batch normalization layer (BN) can be used to The distance between the eigenvectors and the eigenvectors of the first image is calculated to obtain the first Euclidean distance.
  • a second Euclidean distance between the feature vector of the image and the feature vector of the second image may also be determined, and the first Euclidean distance and the second Euclidean distance may be represented by d, for example.
  • the ternary loss value is determined according to a plurality of first Euclidean distances and a plurality of second Euclidean distances, and the ternary loss value is used as an initial loss value, and the calculation formula of the initial loss value is as follows:
  • d ii+ means the first Euclidean distance
  • d ii- means the second Euclidean distance
  • WRT Loss the weighted ternary loss function
  • S607 Using the initial re-identification model to process multiple convolutional feature maps and multiple edge feature maps to obtain perceptual edge loss values.
  • S608 Using the initial re-identification model to determine a first target distance from multiple first distances, where the first target distance is the first distance with the smallest median among the multiple first distances.
  • the first distance with the smallest median among the multiple first distances may be called the first target distance, for example: Represents the minimum value of all d inter , then can be used as the first target distance.
  • S609 Calculate and obtain a cross-modal center comparison loss value according to the first target distance, multiple second distances, and the number of targets.
  • the cross-modal center contrast loss value is calculated according to the first target distance and multiple second distances, and the number of targets.
  • the cross-modal center contrast loss value (may be referred to as CMCC loss) is calculated as follows:
  • the distance between different modalities of the same category can be shortened through the CMCC loss, and the distance between features of different categories can be shortened at the same time, thereby optimizing the distribution state of the feature f i m extracted by the model, It is convenient to use the features of this layer for target re-identification matching in the later stage.
  • S610 Train an initial re-identification model according to the initial loss value, perceptual edge loss value, and cross-modal center comparison loss value, so as to obtain a target re-identification model.
  • a target loss value is generated according to an initial loss value, a perceptual edge loss value, and a cross-modal center comparison loss value
  • the target loss value may be, for example, a combination of an initial loss value, a perception edge loss value, and a cross-modal center comparison loss value and, the target loss value can be expressed as: in, represents the perceptual edge loss value, and represents the initial loss value, Can represent cross-modal center contrast loss values.
  • an initial re-identification model is trained based on a target loss value.
  • the multiple images respectively have corresponding multiple modalities and corresponding multiple labeled target categories, and multiple convolutional feature maps corresponding to the multiple modalities are acquired, And obtain multiple edge feature maps corresponding to multiple modalities, and obtain multiple feature distance information corresponding to multiple modalities, and according to multiple images, multiple convolution feature maps, and multiple edge feature maps , a variety of feature distance information, and multiple labeled target categories to train the initial re-identification model to obtain the target re-identification model. Therefore, the trained re-identification model can fully mine the features in various modal images and can enhance different modalities. Improve the accuracy of image matching, thereby improving the effect of cross-modal target re-identification.
  • the network model existing in the related technology is not sufficient for feature mining in multi-modal images, which affects the effect of cross-modal target re-identification.
  • the relationship between the mode center and the category center can be constrained, and the feature extraction ability of the model can be well adjusted.
  • the distance between different modes of the same category can be shortened through CMCC loss, and the distance between features of different categories can be shortened at the same time, thereby optimizing the distribution state of the feature f i m extracted by the model, which is convenient for later use of this layer
  • the features are matched for target re-identification.
  • the backbone network of the target re-identification model is a convolutional neural network (ResNet50 is used here).
  • ResNet50 convolutional neural network
  • the convolutional layer (ResNet Layer0) in the initial stage adopts a dual-stream design
  • the convolutional layer (ResNet Layer1-4) in the next four stages uses a dual-stream shared weight strategy.
  • Equation 1 a multi-task loss function is used, as shown in Equation 1, which incorporates four loss functions, namely identity loss (Id Loss), weighted ternary loss (WRT Loss), perceptual edge loss ( PEF Loss) and cross-modal center contrast loss (CMCC Loss).
  • Id Loss identity loss
  • WRT Loss weighted ternary loss
  • PEF Loss perceptual edge loss
  • CMCC Loss cross-modal center contrast loss
  • the first two losses are loss functions commonly used in existing methods, and the latter two losses (PEF Loss and CMCC Loss) are loss functions newly proposed in this disclosure.
  • the first two losses are briefly introduced below, and then the key points Explain the latter two loss functions.
  • rgb and ir represent RGB image modality and IR image modality respectively
  • m ⁇ rgb, ir ⁇ , H and W represent the height and width of the image, respectively
  • 3 represents the number of channels of the image (RGB image contains three channels of R ⁇ G ⁇ B, IR images were converted to 3-channel by repeating their single-channel 3 times).
  • a batch contains B pictures during the training process, let Represents one of the RGB or IR images, then i ⁇ 1, 2,..., B ⁇ .
  • Id Loss can be expressed as:
  • WRT loss It is calculated by the model batch regularization (BN) layer and the feature vector obtained after the L2-Norm operation.
  • the calculation formula of the loss function is as follows:
  • d represents the Euclidean distance between feature vectors, and respectively Represents the set of positive sample pairs and negative sample pairs.
  • the perceptual edge loss acts on the characteristic feature space of the modality. This part of the feature is generated by the unshared ResNet Layer0.
  • the PEF loss is directly optimized for the characteristic feature space using the edge profile information of the target as a guide, thus enabling the mining of common features among modalities.
  • the calculation of PEF loss includes two inputs: one is the convolution feature map extracted by ResNet Layer0; the other branch is It is to use the Sobel operator to perform convolution operation on the image input by the original modality, extract its edge information, and obtain the edge feature map.
  • the perceptual loss between the edge feature map and the convolutional feature map is calculated in PEF
  • the VGGNet-16 model trained on ImageNet is used as the perceptual network
  • ⁇ t (z) represent the feature map extracted by the perception network of the 0-t stage, assuming its shape is C t ⁇ H t ⁇ W t
  • the calculation formula of PEF loss As follows:
  • the edge contour information of prior knowledge is used as the guidance of the common features of the modes, which makes the mode characteristic features extracted by the unshared Layer-0 more consistent and helps to reduce the mode The difference between them, so as to better realize the cross-modal target re-identification task.
  • the embodiment of the present disclosure proposes a new cross-modal central contrast loss, which acts on the common feature space of the modality, that is, the feature vector (assumed to be represented by f i m ) after the BN layer in Figure 1(a) is located Space.
  • d inter represents the distance between the centers of object features of different categories
  • d intra represents the distance between the centers of the features of the two modalities of objects of the same category, assuming that the Indicates the feature center of different modes of the kth object, and its calculation formula is:
  • Fig. 7 is a flow chart of training a target re-identification model according to an embodiment of the present disclosure. As shown in Figure 7, the following steps are included:
  • Step 1-1 Read the cross-modal target re-identification image data set, and obtain the original image and the category information of the corresponding target object;
  • the data set includes: training set (train set) and test set (test set), including the original image and the object category label corresponding to the image.
  • training set train set
  • test set test set
  • the image input model is used, and then the loss function is calculated in combination with the category label.
  • the test set is divided into a set to be queried (query) and a set to be matched (gallery), which is used to test the re-identification performance of the model;
  • Algorithm model hyperparameters including the size of the input image during model training, batch size, target objects and numbers of different modalities in the batch, image data enhancement methods, number of training iterations (Epoch), learning rate (learning rate) adjustment strategy, the type of optimizer (optimizer) used, as follows.
  • Batch size 64 (includes 8 objects, 4 images of one object per modality);
  • Image data enhancement methods random cropping, horizontal flipping;
  • the number of training iterations is: 200;
  • the learning rate increases linearly from 0.0005 to 0.005 during the first 10 epochs, maintains 0.005 for 10-20 epochs, and then decays to one-tenth of the original every 5 epochs, and maintains 0.000005 until the 35th epoch to the end of training .
  • Step 1-2 According to the set batch size, the number of categories in the batch and the number of images under each category, organize the data of RGB and IR into a batch (Batch);
  • Step 1-3 Standardize the image, then adjust the image to the set width and height dimensions, and perform specified data enhancement transformation on it, and then load batches of data into GPU memory for later input into training In the model, and use the corresponding label to participate in the calculation of the later loss.
  • Step 2-1 Input the image data of the two modalities respectively along the dual-stream feature extraction network (structure shown in Figure 2), and send the data of each modality to their respective entry branches;
  • Step 2-2 The input data is transferred layer by layer, and the calculation of the corresponding layer is performed, and the modal characteristic part and the modal common part are sequentially passed;
  • Step 2-3 Through the forward propagation of step 2-2, the intermediate features and the final classification prediction score can be obtained, which will be used for the multi-task loss calculation in the next stage.
  • Step 3-1 For the input data of a batch, according to the calculation method of the above formula 1-9, you can get and
  • Step 3-2 Add the four losses to get the final multi-task loss value.
  • Step 4-1 The implementation code of this disclosure uses the automatic differentiation PyTorch deep learning framework, which supports the backpropagation of the entire algorithm model directly from the calculated multi-task loss value, and calculates the gradient value of the learnable parameters ;
  • Step 4-2 Use the set optimizer to update and optimize the learnable parameters of the model algorithm using the gradient calculated in step 4-1;
  • Step 4-3 Repeat all the above steps, and continuously update the model parameters in the process until the set number of training rounds is reached, and then stop the training process of the algorithm model.
  • Step 5-1 Divide the test set, use the IR image as the query set (query), and the RGB image as the matching set (gallery).
  • the test method is to use the IR image of the object as the query, in the RGB image set Match the image of the object to test the cross-modal object re-identification performance of the model;
  • Step 5-2 During the test, read the images of the test set (including images of query and gallery), input the data of both modalities into the test model, and obtain The feature vector of each image (the feature vector after the BN layer in Figure 2);
  • Step 5-3 use cosine distance to measure the similarity between the query image and all gallery images, and then sort according to the distance to obtain a list of gallery images (RGB images) matched by each query image (IR image);
  • Step 5-4 Calculate the evaluation indicators Rank-n and mAP commonly used in the target re-identification task, and evaluate the model performance by observing the indicator values;
  • Step 5-4 If the evaluation result does not meet the set requirements, you can adjust the hyperparameters of the model, restart from the first step of the process step, and continue to train the algorithm model. If the evaluation indicators meet the requirements, then Save the model weights, and the weights and model codes are the final cross-modal target re-identification solution.
  • the multi-task loss is used to optimize and adjust the modal feature space and common feature space, and complete the cross-modal target re-identification task end-to-end.
  • the perceptual edge loss is proposed, which can use the edge information of the image as a guide to mine the common information in the modality feature space, reducing the differences between different modalities.
  • a cross-modal center comparison loss is proposed, which acts on the common feature space. By constraining the relationship between the modal center and the category center, the feature extraction ability of the model can be well adjusted, so that the model can achieve excellent performance.
  • the feature space can be optimized, the division of characteristic feature space and common feature space is proposed, and targeted adjustment and optimization are carried out, so as to realize an efficient end-to-end cross-modal target re-identification method.
  • the proposed perceptual edge loss can directly constrain the features of different modalities, introduce prior knowledge into the model feature extraction process, and enhance the cross-modal feature extraction capability of the model;
  • the proposed cross-modal center comparison loss can make The model extracts more discriminative features, which effectively reduces the difference between the modalities of similar objects and increases the characteristic differences of different types of objects, which is conducive to the correct re-identification of cross-modal data by the model.
  • Fig. 8 is a schematic flowchart of a method for object re-identification provided according to another embodiment of the present disclosure. Referring to Fig. 8, the method includes step S801 to step S802.
  • S801 Acquire a reference image and an image to be recognized, where the modes of the reference image and the image to be recognized are different, and the reference image includes: a reference category.
  • the reference image and the image to be recognized may be images collected in any scene, and the modalities of the reference image and the image to be recognized are different.
  • the reference image can be an image of RGB modality, and the image to be recognized can be an image of IR modality; or the reference image can be an image of IR modality, and the image to be recognized can be an image of RGB modality, for This is not limited.
  • the reference image also corresponds to a reference category, wherein the reference category is used to describe the category of the target object in the reference image, for example: the category of the target object is a vehicle, a pedestrian, or any other possible category, which is not limited.
  • the reference image and the image to be recognized are obtained above, the reference image and the image to be recognized are further input into the target re-identification model trained in the above embodiment, and the target corresponding to the image to be recognized and the corresponding target can be output through the target re-recognition model category, where the target category matches the reference category, e.g. the target category and the reference category are the same vehicle.
  • the same object as the target object in the reference image is recognized from the image to be recognized, so as to achieve the purpose of cross-modal target re-identification.
  • the reference image includes: a reference category
  • the reference image and the image to be recognized are respectively input into the target re-identification model
  • the target re-identification model trained by the training method the target corresponding to the image to be recognized outputted by the target re-identification model is obtained, the target has a corresponding target category, and the target category matches the reference category.
  • the target re-recognition model trained by the above-mentioned target re-recognition model training method recognizes the image to be recognized, the features of the image to be recognized can be fully mined, the accuracy of image matching under different modalities can be enhanced, and the cross-modal The effect of target re-identification.
  • Fig. 9 is a schematic diagram of a training device for a target re-identification model according to another embodiment of the present disclosure.
  • the training device 90 of the target re-identification model includes:
  • the first acquiring module 901 is configured to acquire multiple images, and the multiple images respectively have corresponding multiple modalities and corresponding multiple labeled target categories;
  • the second acquisition module 902 is configured to acquire multiple convolutional feature maps corresponding to multiple modalities, and multiple edge feature maps corresponding to multiple modalities respectively;
  • the third acquiring module 903 is configured to acquire various feature distance information respectively corresponding to various modalities.
  • the training module 904 is configured to train an initial re-identification model according to multiple images, multiple convolutional feature maps, multiple edge feature maps, multiple feature distance information, and multiple labeled target categories to obtain a target re-identification model.
  • FIG. 10 is a schematic diagram of a training device for a target re-identification model provided according to another embodiment of the present disclosure.
  • the training module 904 includes:
  • the first processing sub-module 9041 is used to process multiple images using an initial re-identification model to obtain an initial loss value
  • the second processing sub-module 9042 is used to process multiple convolutional feature maps and multiple edge feature maps using the initial re-identification model to obtain perceptual edge loss values;
  • the third processing sub-module 9043 is used to process various feature distance information using the initial re-identification model to obtain cross-modal center comparison loss values;
  • the training sub-module 9044 is used to train the initial re-identification model according to the initial loss value, perceptual edge loss value, and cross-modal center comparison loss value, so as to obtain the target re-identification model.
  • the initial re-identification model includes: a first network structure, and the first network structure is used to identify the perceptual loss value between the convolutional feature map and the edge feature map.
  • the second processing submodule 9042 is specifically used to:
  • Perceptual edge loss values are generated based on the plurality of first perceptual edge loss values and the plurality of second perceptual edge loss values.
  • the initial re-identification model includes: a batch normalization layer, and a third acquisition module 903, including:
  • the normalization processing sub-module 9031 is configured to input multiple images into the batch normalization layer respectively, so as to obtain multiple feature vectors respectively corresponding to the multiple images output by the batch normalization layer;
  • the center point determination submodule 9032 is used to determine the feature center points of multiple targets respectively corresponding to multiple images according to multiple feature vectors;
  • the distance determination sub-module 9033 is used to determine the first distance between the feature center points of different targets, and determine the second distance between the feature center points of the same target corresponding to different modalities, the first distance and the second distance are common A variety of feature distance information is formed.
  • the third processing submodule 9043 is specifically used to:
  • a cross-modal center comparison loss value is calculated according to the first target distance, multiple second distances, and the number of targets.
  • the initial re-identification model includes: a sequentially connected fully connected layer and an output layer, and the first processing submodule 9041 is specifically used for:
  • An identity loss value is generated according to multiple category feature vectors and corresponding multiple encoding vectors, and the identity loss value is used as an initial loss value.
  • the first processing submodule 9041 is specifically used to:
  • Image division is performed on multiple images with reference to multiple labeled target categories to obtain a triplet sample set, which includes: multiple images, multiple first images, and multiple second images, and the multiple first images correspond to The same tagged target category, multiple second images corresponding to different tagged target categories;
  • a ternary loss value is determined, and the ternary loss value is used as an initial loss value.
  • the training submodule 9044 is specifically used for:
  • the trained re-identification model is used as the target re-identification model.
  • the plurality of modalities includes: a color image modality and an infrared image modality.
  • the multiple images respectively have corresponding multiple modalities and corresponding multiple labeled target categories, and multiple convolutional feature maps corresponding to the multiple modalities are acquired, And obtain multiple edge feature maps corresponding to multiple modalities, and obtain multiple feature distance information corresponding to multiple modalities, and according to multiple images, multiple convolution feature maps, and multiple edge feature maps , a variety of feature distance information, and multiple labeled target categories to train the initial re-identification model to obtain the target re-identification model. Therefore, the trained re-identification model can fully mine the features in various modal images and can enhance different modalities. Improve the accuracy of image matching, thereby improving the effect of cross-modal target re-identification. Furthermore, it solves the technical problem that the network model existing in the related technology is not sufficient for feature mining in multi-modal images, which affects the effect of cross-modal target re-identification.
  • Fig. 11 is a schematic diagram of an object re-identification device according to another embodiment of the present disclosure.
  • the target re-identification device 100 includes:
  • the fourth acquisition module 1001 is used to acquire a reference image and an image to be recognized, the modalities of the reference image and the image to be recognized are different, and the reference image includes: a reference category;
  • the recognition module 1002 is configured to respectively input the reference image and the image to be recognized into the target re-recognition model trained by the above-mentioned target re-recognition model training method, so as to obtain the target corresponding to the image to be recognized output by the target re-recognition model, A target has a corresponding target class, and the target class matches the reference class.
  • the object re-identification model trained by the above-mentioned object re-identification model training method can be used to recognize the image to be recognized, and determine the object corresponding to the image to be recognized. Therefore, the features of the image to be recognized can be fully exploited, the accuracy of image matching in different modalities can be enhanced, and the effect of cross-modal target re-recognition can be improved.
  • the present disclosure also provides an electronic device, a readable storage medium, a computer program product, and a computer program.
  • an embodiment of the present disclosure proposes an electronic device, including: at least one processor; and a memory connected to the at least one processor in communication; wherein, the memory stores information that can be processed by the at least one processor. Instructions executed by the processor, the instructions are executed by the at least one processor, so that the at least one processor can execute the target re-identification model training method of the embodiment of the present disclosure, or perform the target re-identification of the embodiment of the present disclosure method.
  • the embodiments of the present disclosure propose a non-transitory computer-readable storage medium storing computer instructions, the computer instructions are used to enable the computer to execute the target re-identification model training method of the embodiments of the present disclosure, Or execute the target re-identification method of the embodiment of the present disclosure.
  • the embodiments of the present disclosure also propose a computer program product, when the instruction processor in the computer program product executes, execute the method for training the target re-identification model described in any one of the embodiments of the present disclosure, or Execute the object re-identification method described in any one of the embodiments of the present disclosure.
  • the embodiments of the present disclosure propose a computer program, the computer program includes computer program code, when the computer program code is run on the computer, so that the computer executes any one of the embodiments of the present disclosure.
  • Figure 12 shows a block diagram of an exemplary computer device suitable for implementing embodiments of the present disclosure.
  • the computer device 12 shown in FIG. 12 is only an example, and should not limit the functions and scope of use of the embodiments of the present disclosure.
  • computer device 12 takes the form of a general-purpose computing device.
  • Components of computer device 12 may include, but are not limited to: one or more processors or processing units 16 , system memory 28 , bus 18 connecting various system components including system memory 28 and processing unit 16 .
  • Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus structures.
  • these architectures include but are not limited to Industry Standard Architecture (Industry Standard Architecture; hereinafter referred to as: ISA) bus, Micro Channel Architecture (Micro Channel Architecture; hereinafter referred to as: MAC) bus, enhanced ISA bus, video electronics Standards Association (Video Electronics Standards Association; hereinafter referred to as: VESA) local bus and Peripheral Component Interconnection (hereinafter referred to as: PCI) bus.
  • Computer device 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by computer device 12 and include both volatile and nonvolatile media, removable and non-removable media.
  • the memory 28 may include a computer system readable medium in the form of a volatile memory, such as a random access memory (Random Access Memory; hereinafter referred to as: RAM) 30 and/or a cache memory 32 .
  • Computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media.
  • storage system 34 may be used to read and write to non-removable, non-volatile magnetic media (not shown in FIG. 12, commonly referred to as a "hard drive").
  • a disk drive for reading and writing to a removable nonvolatile disk may be provided, as well as a removable nonvolatile disk (such as a Compact Disk ROM (Compact Disk).
  • Disc Read Only Memory hereinafter referred to as: CD-ROM
  • DVD-ROM Digital Video Disc Read Only Memory
  • each drive may be connected to bus 18 via one or more data media interfaces.
  • Memory 28 may include at least one program product having a set (eg, at least one) of program modules configured to perform the functions of various embodiments of the present disclosure.
  • a program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including but not limited to an operating system, one or more application programs, other program modules, and program data , each or some combination of these examples may include implementations of network environments.
  • the program modules 42 generally perform the functions and/or methods of the embodiments described in this disclosure.
  • the computer device 12 may also communicate with one or more external devices 14 (e.g., a keyboard, pointing device, display 24, etc.), and with one or more devices that enable a user to interact with the computer device 12, and/or with Any device (eg, network card, modem, etc.) that enables the computing device 12 to communicate with one or more other computing devices. Such communication may occur through input/output (I/O) interface 22 .
  • the computer device 12 can also communicate with one or more networks (such as a local area network (Local Area Network; hereinafter referred to as: LAN), a wide area network (Wide Area Network; hereinafter referred to as: WAN) and/or public networks, such as the Internet, through the network adapter 20. ) communication.
  • networks such as a local area network (Local Area Network; hereinafter referred to as: LAN), a wide area network (Wide Area Network; hereinafter referred to as: WAN) and/or public networks, such as the Internet, through the network adapt
  • network adapter 20 communicates with other modules of computer device 12 via bus 18 .
  • other hardware and/or software modules may be used in conjunction with computer device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives And data backup storage system, etc.
  • the processing unit 16 executes various functional applications and training of the object re-identification model by running the program stored in the system memory 28 , for example, realizing the training method of the object re-identification model mentioned in the foregoing embodiments.
  • various parts of the present disclosure may be implemented in hardware, software, firmware or a combination thereof.
  • various steps or methods may be implemented by software or firmware stored in memory and executed by a suitable instruction execution system.
  • a suitable instruction execution system For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or combination of the following techniques known in the art: Discrete logic circuits, ASICs with suitable combinational logic gates, programmable gate arrays (PGAs), field programmable gate arrays (FPGAs), etc.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing module, each unit may exist separately physically, or two or more units may be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules. If the integrated modules are realized in the form of software function modules and sold or used as independent products, they can also be stored in a computer-readable storage medium.
  • the storage medium mentioned above may be a read-only memory, a magnetic disk or an optical disk, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed are a target re-recognition model training method and device, and a target re-recognition method and device. The target re-recognition model training method comprises: acquiring a plurality of images, the plurality of images respectively having a plurality of corresponding modes and a plurality of corresponding annotated target categories; acquiring a plurality of convolution feature maps respectively corresponding to the plurality of modes, and acquiring a plurality of edge feature maps respectively corresponding to the plurality of modes; acquiring a plurality of pieces of feature distance information respectively corresponding to the plurality of modes; and training an initial re-recognition model according to the plurality of images, the plurality of convolution feature maps, the plurality of edge feature maps, the plurality of pieces of feature distance information, and the plurality of annotated target categories to obtain a target re-recognition model.

Description

目标重识别模型的训练方法、目标重识别方法及装置Target re-identification model training method, target re-identification method and device
相关申请的交叉引用Cross References to Related Applications
本申请基于申请号202110763047.3、申请日为2021年7月6日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。This application is based on the Chinese patent application with application number 202110763047.3 and the filing date is July 6, 2021, and claims the priority of this Chinese patent application. The entire content of this Chinese patent application is hereby incorporated by reference into this application.
技术领域technical field
本公开涉及图像识别技术领域,尤其涉及一种目标重识别模型的训练方法、目标重识别方法、装置、电子设备、存储介质、计算机程序产品和计算机程序。The present disclosure relates to the technical field of image recognition, and in particular, to a training method of an object re-identification model, an object re-identification method, a device, electronic equipment, a storage medium, a computer program product, and a computer program.
背景技术Background technique
随着人们对安全的重视,视频监控摄像头被安置于生活工作的多种环境场景中。常见的摄像头采用白天拍摄彩色视频、晚上拍摄红外视频的方式,进行全天候的信息录制工作。As people pay more attention to safety, video surveillance cameras are placed in various environmental scenes of life and work. Common cameras use color video during the day and infrared video at night to record information around the clock.
而跨模态的目标重识别旨在将可见光摄像头采集的三原色图像(Red Green Blue,RGB)和红外摄像头采集的红外线图像(Infrared Radiation,IR)中的目标进行匹配。由于不同模态(RGB和IR)的图像是异质的,因此模态差异会降低匹配的性能。The cross-modal target re-identification aims to match the targets in the three primary color images (Red Green Blue, RGB) collected by the visible light camera and the infrared images (Infrared Radiation, IR) collected by the infrared camera. Since images of different modalities (RGB and IR) are heterogeneous, modality differences degrade the performance of matching.
相关技术中的网络模型在进行跨模态的目标重识别时,对RGB图像和IR图像中的特征挖掘不够充分,且模型训练过程中稳定性不强,因此影响跨模态的目标重识别的效果。When the network model in the related art performs cross-modal target re-identification, the feature mining in RGB images and IR images is not sufficient, and the stability of the model training process is not strong, thus affecting the performance of cross-modal target re-identification. Effect.
发明内容Contents of the invention
本公开实施例提出了一种目标重识别模型的训练方法、目标重识别方法、装置、电子设备、存储介质、计算机程序产品和计算机程序,旨在至少在一定程度上解决相关技术中的技术问题之一。Embodiments of the present disclosure propose a training method for a target re-identification model, a target re-identification method, a device, an electronic device, a storage medium, a computer program product, and a computer program, aiming to solve technical problems in related technologies at least to a certain extent one.
本公开第一方面实施例提出了一种目标重识别模型的训练方法,包括:获取多个图像,多个图像分别具有对应的多种模态和对应的多个标注目标类别;获取与多种模态分别对应的多个卷积特征图,并获取与多种模态分别对应的多个边缘特征图;获取与多种模态分别对应的多种特征距离信息;以及根据多个图像、多个卷积特征图、多个边缘特征图、多种特征距离信息,以及多个标注目标类别训练初始的重识别模型,以得到目标重识别模型。The embodiment of the first aspect of the present disclosure proposes a training method for a target re-identification model, including: acquiring multiple images, the multiple images respectively have corresponding multiple modalities and corresponding multiple labeled target categories; Multiple convolutional feature maps corresponding to the modalities respectively, and obtain multiple edge feature maps corresponding to the multiple modalities respectively; obtain various feature distance information corresponding to the multiple modalities respectively; and according to multiple images, multiple Convolutional feature maps, multiple edge feature maps, multiple feature distance information, and multiple labeled target categories are used to train the initial re-identification model to obtain the target re-identification model.
在一些实施例中,所述根据所述多个图像、所述多个卷积特征图、所述多个边缘特征图、所述多种特征距离信息,以及所述多个标注目标类别训练初始的重识别模型,以得到目标重识别模型,包括:In some embodiments, the training initial The re-identification model to obtain the target re-identification model, including:
采用所述初始的重识别模型处理所述多个图像,以得到初始损失值;processing the plurality of images using the initial re-identification model to obtain an initial loss value;
采用所述初始的重识别模型处理所述多个卷积特征图和所述多个边缘特征图,以得到感知边缘损失值;processing the plurality of convolutional feature maps and the plurality of edge feature maps using the initial re-identification model to obtain perceptual edge loss values;
采用所述初始的重识别模型处理所述多种特征距离信息,以得到跨模态中心对比损失值;Using the initial re-identification model to process the various feature distance information to obtain a cross-modal center comparison loss value;
根据所述初始损失值、所述感知边缘损失值、以及所述跨模态中心对比损失值训练所述初始的重识别模型,以得到所述目标重识别模型。The initial re-identification model is trained according to the initial loss value, the perceptual edge loss value, and the cross-modal center comparison loss value, so as to obtain the target re-identification model.
在一些实施例中,所述初始的重识别模型包括:第一网络结构,所述第一网络结构用于识别所述卷积特征图和所述边缘特征图之间的感知损失值。In some embodiments, the initial re-identification model includes: a first network structure for identifying perceptual loss values between the convolutional feature map and the edge feature map.
在一些实施例中,所述采用所述初始的重识别模型处理所述多个卷积特征图和所述多个边缘特征图,以得到感知边缘损失值,包括:In some embodiments, the processing of the plurality of convolutional feature maps and the plurality of edge feature maps using the initial re-identification model to obtain perceptual edge loss values includes:
将所述多个卷积特征图和所述多个边缘特征图输入至所述第一网络结构之中,以得到与所述多个卷积特征图分别对应的多个卷积损失特征图,并得到与所述多个边缘特征图分别对应的多个边缘损失特征图;inputting the plurality of convolutional feature maps and the plurality of edge feature maps into the first network structure to obtain a plurality of convolution loss feature maps respectively corresponding to the plurality of convolutional feature maps, And obtain a plurality of edge loss feature maps respectively corresponding to the plurality of edge feature maps;
确定与所述多个卷积损失特征图分别对应的多个卷积特征图参数,并确定与所述多个边缘损失特征图分别对应的多个边缘特征图参数;determining a plurality of convolution feature map parameters respectively corresponding to the plurality of convolution loss feature maps, and determining a plurality of edge feature map parameters respectively corresponding to the plurality of edge loss feature maps;
根据所述多个卷积特征图参数处理对应的所述多个卷积损失特征图,以得到多个第一感知边缘损失值;Processing the corresponding plurality of convolution loss feature maps according to the plurality of convolution feature map parameters to obtain a plurality of first perceptual edge loss values;
根据所述多个边缘特征图参数处理对应的所述多个边缘损失特征图,以得到多个第二感知边缘损失值;以及processing the corresponding plurality of edge loss feature maps according to the plurality of edge feature map parameters to obtain a plurality of second perceptual edge loss values; and
根据所述多个第一感知边缘损失值和所述多个第二感知边缘损失值,生成所述感知边缘损失值。The perceptual edge loss value is generated based on the plurality of first perceptual edge loss values and the plurality of second perceptual edge loss values.
在一些实施例中,所述初始的重识别模型包括:批标准化层,所述获取与所述多种模态分别对应的多种特征距离信息,包括:In some embodiments, the initial re-identification model includes: a batch normalization layer, and the acquisition of various feature distance information corresponding to the various modalities includes:
将所述多个图像分别输入至所述批标准化层之中,以得到所述批标准化层输出的与所述多个图像分别对应的多个特征向量;Inputting the plurality of images into the batch normalization layer, respectively, to obtain a plurality of feature vectors respectively corresponding to the plurality of images output by the batch normalization layer;
根据所述多个特征向量,确定与所述多个图像分别对应的多个目标的特征中心点;determining, according to the plurality of feature vectors, feature center points of a plurality of targets respectively corresponding to the plurality of images;
确定不同所述目标的特征中心点之间的第一距离,并确定相同所述目标对应于不同所述模态的特征中心点之间的第二距离,所述第一距离和所述第二距离共同构成所述多种特征距离信息。determining a first distance between feature center points of different targets, and determining a second distance between feature center points of the same target corresponding to different modes, the first distance and the second The distances together constitute the various kinds of characteristic distance information.
在一些实施例中,所述采用所述初始的重识别模型处理所述多种特征距离信息,以得到跨模态中心对比损失值,包括:In some embodiments, the process of using the initial re-identification model to process the various feature distance information to obtain a cross-modal center comparison loss value includes:
采用所述初始的重识别模型从多个第一距离确定出第一目标距离,所述第一目标距离是所述多个第一距离中值最小的所述第一距离;Using the initial re-identification model to determine a first target distance from multiple first distances, the first target distance is the first distance with the smallest median value among the multiple first distances;
根据所述第一目标距离和多个所述第二距离,以及所述目标的数量计算得到所述跨模态中心对比损失值。The cross-modal center comparison loss value is calculated according to the first target distance, multiple second distances, and the number of targets.
在一些实施例中,所述初始的重识别模型包括:顺序连接的全连接层和输出层,所述采用所述初始的重识别模型处理所述多个图像,以得到初始损失值,包括:In some embodiments, the initial re-identification model includes: a sequentially connected fully connected layer and an output layer, and the processing of the multiple images using the initial re-identification model to obtain an initial loss value includes:
将所述多个图像顺序输入至所述全连接层和输出层之中,以得到所述输出层输出的与所述多个图像分别对应的多个类别特征向量;Sequentially input the plurality of images into the fully connected layer and the output layer to obtain a plurality of category feature vectors output by the output layer corresponding to the plurality of images;
确定与所述多个标注目标类别分别对应的多个编码向量;determining a plurality of encoding vectors respectively corresponding to the plurality of labeling target categories;
根据所述多个类别特征向量和对应的所述多个编码向量,生成身份损失值,并将所述身份损失值作为所述初始损失值。An identity loss value is generated according to the plurality of category feature vectors and the corresponding encoding vectors, and the identity loss value is used as the initial loss value.
在一些实施例中,所述采用所述初始的重识别模型处理所述多个图像,以得到初始损失值,包括:In some embodiments, the processing of the plurality of images using the initial re-identification model to obtain an initial loss value includes:
参考所述多个标注目标类别对所述多个图像进行图像划分,以得到三元样本集合,所述三元样本集合包括:所述多个图像、多个第一图像,以及多个第二图像,所述多个第一图像对应相同所述标注目标类别,所述多个第二图像对应不同所述标注目标类别;performing image division on the plurality of images with reference to the plurality of labeled target categories to obtain a triplet sample set, the triplet sample set including: the plurality of images, the plurality of first images, and the plurality of second images, the multiple first images correspond to the same labeled target category, and the multiple second images correspond to different labeled target categories;
确定所述图像的特征向量和所述第一图像的特征向量之间的第一欧式距离,所述特征向量由所述批标准化层输出;determining a first Euclidean distance between a feature vector of the image and a feature vector of the first image, the feature vector output by the batch normalization layer;
确定所述图像的特征向量和所述第二图像的特征向量之间的第二欧式距离;以及determining a second Euclidean distance between the feature vector of the image and the feature vector of the second image; and
根据多个所述第一欧式距离和多个所述第二欧式距离,确定三元损失值,并将所述三元损失值作为所述初始损失值。A ternary loss value is determined according to the plurality of first Euclidean distances and the plurality of second Euclidean distances, and the ternary loss value is used as the initial loss value.
在一些实施例中,所述根据所述初始损失值、所述感知边缘损失值、以及所述跨模态中心对比损失值训练所述初始的重识别模型,以得到所述目标重识别模型,包括:In some embodiments, the initial re-identification model is trained according to the initial loss value, the perceptual edge loss value, and the cross-modal center comparison loss value to obtain the target re-identification model, include:
根据所述初始损失值、所述感知边缘损失值、以及所述跨模态中心对比损失值生成目标损失值;generating a target loss value based on the initial loss value, the perceptual edge loss value, and the cross-modal center comparison loss value;
如果所述目标损失值满足设定条件,则将训练得到的所述重识别模型作为所述目标重识别模型。If the target loss value satisfies the set condition, the re-identification model obtained through training is used as the target re-identification model.
在一些实施例中,所述多种模态包括:彩色图像模态和红外图像模态。In some embodiments, the plurality of modalities include: a color image modality and an infrared image modality.
本公开第二方面实施例提出了一种目标重识别方法,包括:获取参考图像和待识别图像,参考图像和待识别图像的模态不相同,参考图像包括:参考类别;将参考图像和待识别图像分别输入至上述的目标重识别模型的训练方法训练得到的目标重识别模型之中,以得到目标重识别模型输出的与待识别图像对应的目标,目标具有对应的目标类别,目标类别与参考类别相匹配。The embodiment of the second aspect of the present disclosure proposes a target re-identification method, including: acquiring a reference image and an image to be recognized. The modalities of the reference image and the image to be recognized are different. The reference image includes: a reference category; The recognition images are respectively input into the target re-recognition model trained by the above-mentioned target re-recognition model training method to obtain the target corresponding to the image to be recognized output by the target re-recognition model. The target has a corresponding target category, and the target category and The reference category matches.
本公开第三方面实施例提出了一种目标重识别模型的训练装置,包括:第一获取模块,用于获取多个图像,多个图像分别具有对应的多种模态和对应的多个标注目标类别;第二获取模块,用于获取与多种模态分别对应的多个卷积特征图,并获取与多种模态分别对应的多个边缘特征图;第三获取模块,用于获取与多种模态分别对应的多种特征距离信息;以及训练模块,用于根据多个图像、多个卷积特征图、多个边缘特征图、多种特征距离信息,以及多个标注目标类别训练初始的重识别模型,以得到目标重识别模型。The embodiment of the third aspect of the present disclosure proposes a training device for a target re-identification model, including: a first acquisition module, configured to acquire multiple images, each of which has corresponding multiple modalities and corresponding multiple annotations The target category; the second acquisition module is used to obtain multiple convolutional feature maps corresponding to multiple modalities, and to obtain multiple edge feature maps corresponding to multiple modalities; the third acquisition module is used to obtain A variety of feature distance information corresponding to a variety of modalities; and a training module, used for multiple images, multiple convolution feature maps, multiple edge feature maps, multiple feature distance information, and multiple labeling target categories Train the initial re-identification model to obtain the target re-identification model.
在一些实施例中,所述训练模块,包括:In some embodiments, the training module includes:
第一处理子模块,用于采用所述初始的重识别模型处理所述多个图像,以得到初始损失值;The first processing submodule is used to process the plurality of images using the initial re-identification model to obtain an initial loss value;
第二处理子模块,用于采用所述初始的重识别模型处理所述多个卷积特征图和所述多个边缘特征图,以得到感知边缘损失值;The second processing submodule is used to process the plurality of convolutional feature maps and the plurality of edge feature maps using the initial re-identification model to obtain a perceptual edge loss value;
第三处理子模块,用于采用所述初始的重识别模型处理所述多种特征距离信息,以得到跨模态中心对比损失值;The third processing submodule is used to process the various feature distance information using the initial re-identification model to obtain a cross-modal center comparison loss value;
训练子模块,用于根据所述初始损失值、所述感知边缘损失值、以及所述跨模态中心对比损失值训练所述初始的重识别模型,以得到所述目标重识别模型。The training submodule is configured to train the initial re-identification model according to the initial loss value, the perceptual edge loss value, and the cross-modal center comparison loss value, so as to obtain the target re-identification model.
在一些实施例中,所述初始的重识别模型包括:第一网络结构,所述第一网络结构用于识别所述卷积特征图和所述边缘特征图之间的感知损失值。In some embodiments, the initial re-identification model includes: a first network structure for identifying perceptual loss values between the convolutional feature map and the edge feature map.
在一些实施例中,所述第二处理子模块,具体用于:In some embodiments, the second processing submodule is specifically used for:
将所述多个卷积特征图和所述多个边缘特征图输入至所述第一网络结构之中,以得到与所述多个卷积特征图分别对应的多个卷积损失特征图,并得到与所述多个边缘特征图分别对应的多个边缘损失特征图;inputting the plurality of convolutional feature maps and the plurality of edge feature maps into the first network structure to obtain a plurality of convolution loss feature maps respectively corresponding to the plurality of convolutional feature maps, And obtain a plurality of edge loss feature maps respectively corresponding to the plurality of edge feature maps;
确定与所述多个卷积损失特征图分别对应的多个卷积特征图参数,并确定与所述多个边缘损失特征图分别对应的多个边缘特征图参数;determining a plurality of convolution feature map parameters respectively corresponding to the plurality of convolution loss feature maps, and determining a plurality of edge feature map parameters respectively corresponding to the plurality of edge loss feature maps;
根据所述多个卷积特征图参数处理对应的所述多个卷积损失特征图,以得到多个第一感知边缘损失值;Processing the corresponding plurality of convolution loss feature maps according to the plurality of convolution feature map parameters to obtain a plurality of first perceptual edge loss values;
根据所述多个边缘特征图参数处理对应的所述多个边缘损失特征图,以得到多个第二感知边缘损失值;以及processing the corresponding plurality of edge loss feature maps according to the plurality of edge feature map parameters to obtain a plurality of second perceptual edge loss values; and
根据所述多个第一感知边缘损失值和所述多个第二感知边缘损失值,生成所述感知边缘损失值。The perceptual edge loss value is generated based on the plurality of first perceptual edge loss values and the plurality of second perceptual edge loss values.
在一些实施例中,所述初始的重识别模型包括:批标准化层,所述第三获取模块,包括:In some embodiments, the initial re-identification model includes: a batch normalization layer, and the third acquisition module includes:
标准化处理子模块,用于将所述多个图像分别输入至所述批标准化层之中,以得到所述批标准化层输出的与所述多个图像分别对应的多个特征向量;A normalization processing submodule, configured to input the plurality of images into the batch normalization layer respectively, so as to obtain a plurality of feature vectors respectively corresponding to the plurality of images output by the batch normalization layer;
中心点确定子模块,用于根据所述多个特征向量,确定与所述多个图像分别对应的多个目标的特征中心点;A central point determination submodule, configured to determine, according to the multiple feature vectors, the feature center points of multiple targets corresponding to the multiple images;
距离确定子模块,用于确定不同所述目标的特征中心点之间的第一距离,并确定相同所述目标对应于不同所述模态的特征中心点之间的第二距离,所述第一距离和所述第二距离共同构成所述多种特征距离信息。A distance determining submodule, configured to determine a first distance between feature center points of different targets, and determine a second distance between feature center points corresponding to different modes of the same target, the first The first distance and the second distance together constitute the various kinds of characteristic distance information.
在一些实施例中,第三处理子模块,具体用于:采用所述初始的重识别模型从多个第一距离确定出第一目标距离,所述第一目标距离是所述多个第一距离中值最小的所述第一距离;In some embodiments, the third processing submodule is specifically configured to: use the initial re-identification model to determine a first target distance from multiple first distances, and the first target distance is the multiple first said first distance having the smallest distance from the median;
根据所述第一目标距离和多个所述第二距离,以及所述目标的数量计算得到所述跨模态 中心对比损失值。The cross-modal center contrast loss value is calculated according to the first target distance and a plurality of the second distances, and the number of targets.
在一些实施例中,所述初始的重识别模型包括:顺序连接的全连接层和输出层,所述第一处理子模块,具体用于:In some embodiments, the initial re-identification model includes: a sequentially connected fully connected layer and an output layer, and the first processing submodule is specifically used for:
将所述多个图像顺序输入至所述全连接层和输出层之中,以得到所述输出层输出的与所述多个图像分别对应的多个类别特征向量;Sequentially input the plurality of images into the fully connected layer and the output layer to obtain a plurality of category feature vectors output by the output layer corresponding to the plurality of images;
确定与所述多个标注目标类别分别对应的多个编码向量;determining a plurality of encoding vectors respectively corresponding to the plurality of labeling target categories;
根据所述多个类别特征向量和对应的所述多个编码向量,生成身份损失值,并将所述身份损失值作为所述初始损失值。An identity loss value is generated according to the plurality of category feature vectors and the corresponding encoding vectors, and the identity loss value is used as the initial loss value.
在一些实施例中,所述第一处理子模块,具体用于:In some embodiments, the first processing submodule is specifically used for:
参考所述多个标注目标类别对所述多个图像进行图像划分,以得到三元样本集合,所述三元样本集合包括:所述多个图像、多个第一图像,以及多个第二图像,所述多个第一图像对应相同所述标注目标类别,所述多个第二图像对应不同所述标注目标类别;performing image division on the plurality of images with reference to the plurality of labeled target categories to obtain a triplet sample set, the triplet sample set including: the plurality of images, the plurality of first images, and the plurality of second images, the multiple first images correspond to the same labeled target category, and the multiple second images correspond to different labeled target categories;
确定所述图像的特征向量和所述第一图像的特征向量之间的第一欧式距离,所述特征向量由所述批标准化层输出;determining a first Euclidean distance between a feature vector of the image and a feature vector of the first image, the feature vector output by the batch normalization layer;
确定所述图像的特征向量和所述第二图像的特征向量之间的第二欧式距离;以及determining a second Euclidean distance between the feature vector of the image and the feature vector of the second image; and
根据多个所述第一欧式距离和多个所述第二欧式距离,确定三元损失值,并将所述三元损失值作为所述初始损失值。A ternary loss value is determined according to the plurality of first Euclidean distances and the plurality of second Euclidean distances, and the ternary loss value is used as the initial loss value.
在一些实施例中,所述训练子模块,具体用于:In some embodiments, the training submodule is specifically used for:
根据所述初始损失值、所述感知边缘损失值、以及所述跨模态中心对比损失值生成目标损失值;generating a target loss value based on the initial loss value, the perceptual edge loss value, and the cross-modal center comparison loss value;
如果所述目标损失值满足设定条件,则将训练得到的所述重识别模型作为所述目标重识别模型。If the target loss value satisfies the set condition, the re-identification model obtained through training is used as the target re-identification model.
在一些实施例中,所述多种模态包括:彩色图像模态和红外图像模态。In some embodiments, the plurality of modalities include: a color image modality and an infrared image modality.
本公开第四方面实施例提出了一种目标重识别装置,包括:第四获取模块,用于获取参考图像和待识别图像,参考图像和待识别图像的模态不相同,参考图像包括:参考类别;识别模块,用于将参考图像和待识别图像分别输入至上述的目标重识别模型的训练方法训练得到的目标重识别模型之中,以得到目标重识别模型输出的与待识别图像对应的目标,目标具有对应的目标类别,目标类别与参考类别相匹配。The embodiment of the fourth aspect of the present disclosure proposes a target re-identification device, including: a fourth acquisition module, configured to acquire a reference image and an image to be recognized. The modalities of the reference image and the image to be recognized are different, and the reference image includes: Category; recognition module, for respectively inputting the reference image and the image to be recognized into the target re-recognition model trained by the above-mentioned target re-recognition model training method, so as to obtain the output corresponding to the image to be recognized by the target re-recognition model A target, a target has a corresponding target class, and the target class matches the reference class.
本公开第五方面实施例提出了一种电子设备,包括:至少一个处理器;以及与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行本公开实施例任一项所述的目标重识别模型的训练方法,或者执行本公开实施例任一项所述的目标重识别方法。The embodiment of the fifth aspect of the present disclosure provides an electronic device, including: at least one processor; and a memory connected to the at least one processor in communication; wherein, the memory stores information that can be executed by the at least one processor. instructions, the instructions are executed by the at least one processor, so that the at least one processor can execute the method for training a target re-identification model described in any one of the embodiments of the present disclosure, or execute any of the embodiments of the present disclosure A method for object re-identification described herein.
本公开第六方面实施例提出了一种存储有计算机指令的非瞬时计算机可读存储介质,所述计算机指令用于使所述计算机执行本公开实施例任一项所述的目标重识别模型的训练方法,或者执行本公开实施例任一项所述的目标重识别方法。The embodiment of the sixth aspect of the present disclosure provides a non-transitory computer-readable storage medium storing computer instructions, the computer instructions are used to make the computer execute the object re-identification model described in any one of the embodiments of the present disclosure. A training method, or execute the object re-identification method described in any one of the embodiments of the present disclosure.
本公开第七方面实施例提出了一种计算机程序产品,所述计算机程序产品中包括计算机程序代码,当所述计算机程序代码在计算机上运行时,以执行本公开实施例任一项所述的目标重识别模型的训练方法,或者执行本公开实施例任一项所述的目标重识别方法。The embodiment of the seventh aspect of the present disclosure provides a computer program product, the computer program product includes computer program code, when the computer program code is run on the computer, to execute any one of the embodiments of the present disclosure. A method for training a target re-identification model, or execute the target re-identification method described in any one of the embodiments of the present disclosure.
本公开第八方面实施例提出了一种计算机程序,所述计算机程序包括计算机程序代码,当所述计算机程序代码在计算机上运行时,以使得计算机执行本公开实施例任一项所述的目标重识别模型的训练方法,或者执行本公开实施例任一项所述的目标重识别方法。The embodiment of the eighth aspect of the present disclosure provides a computer program, the computer program includes computer program code, when the computer program code is run on the computer, so that the computer executes the object described in any one of the embodiments of the present disclosure A training method for a re-identification model, or execute the target re-identification method described in any one of the embodiments of the present disclosure.
附图说明Description of drawings
本公开上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:The above and/or additional aspects and advantages of the present disclosure will become apparent and understandable from the following description of the embodiments in conjunction with the accompanying drawings, wherein:
图1是根据本公开一实施例提供的目标重识别模型的训练方法的流程示意图;FIG. 1 is a schematic flowchart of a method for training a target re-identification model according to an embodiment of the present disclosure;
图2是根据本公开实施例提供重识别模型的网络结构示意图;FIG. 2 is a schematic diagram of a network structure providing a re-identification model according to an embodiment of the present disclosure;
图3是根据本公开另一实施例提供的目标重识别模型的训练方法的流程示意图;3 is a schematic flowchart of a method for training a target re-identification model according to another embodiment of the present disclosure;
图4是根据本公开实施例提供的第一网络结构的结构示意图;FIG. 4 is a schematic structural diagram of a first network structure provided according to an embodiment of the present disclosure;
图5是根据本公开实施例提供的目标的特征空间结构示意图;Fig. 5 is a schematic diagram of a feature space structure of a target provided according to an embodiment of the present disclosure;
图6是根据本公开另一实施例提供的目标重识别模型的训练方法的流程示意图;6 is a schematic flowchart of a method for training a target re-identification model according to another embodiment of the present disclosure;
图7是根据本公开实施例提供的目标重识别模型的训练流程图;Fig. 7 is a training flowchart of a target re-identification model provided according to an embodiment of the present disclosure;
图8是根据本公开另一实施例提供的目标重识别方法的流程示意图;FIG. 8 is a schematic flowchart of a method for re-identifying a target according to another embodiment of the present disclosure;
图9是根据本公开另一实施例提供的目标重识别模型的训练装置的示意图;FIG. 9 is a schematic diagram of a training device for a target re-identification model provided according to another embodiment of the present disclosure;
图10是根据本公开另一实施例提供的目标重识别模型的训练装置的示意图;FIG. 10 is a schematic diagram of a training device for a target re-identification model provided according to another embodiment of the present disclosure;
图11是根据本公开另一实施例提供的目标重识别装置的示意图;以及Fig. 11 is a schematic diagram of a target re-identification device provided according to another embodiment of the present disclosure; and
图12示出了适于用来实现本公开实施方式的示例性计算机设备的框图。Figure 12 shows a block diagram of an exemplary computer device suitable for implementing embodiments of the present disclosure.
具体实施方式detailed description
下面详细描述本公开的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,仅用于解释本公开,而不能理解为对本公开的限制。相反,本公开的实施例包括落入所附加权利要求书的精神和内涵范围内的所有变化、修改和等同物。Embodiments of the present disclosure are described in detail below, examples of which are illustrated in the drawings, in which the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary only for explaining the present disclosure and should not be construed as limiting the present disclosure. On the contrary, the embodiments of the present disclosure include all changes, modifications and equivalents coming within the spirit and scope of the appended claims.
针对背景技术中提到的相关技术中的网络模型对多模态图像中的特征挖掘不够充分,影响跨模态的目标重识别的效果的技术问题,本公开实施例的技术方案提供了一种目标重识别模型的训练方法,下面结合具体的实施例对该方法进行说明。In view of the technical problem that the network model in the related art mentioned in the background technology is not sufficient for feature mining in multi-modal images, which affects the effect of cross-modal target re-identification, the technical solution of the embodiment of the present disclosure provides a The training method of the target re-identification model will be described below in conjunction with specific embodiments.
其中,需要说明的是,本公开实施例的目标重识别模型的训练方法的执行主体可以为目标重识别模型的训练装置,该装置可以由软件和/或硬件的方式实现,该装置可以配置在电子设备中,电子设备可以包括但不限于终端、服务器端等。Wherein, it should be noted that the execution subject of the training method of the target re-identification model in the embodiment of the present disclosure may be the training device of the target re-recognition model, and the device may be realized by software and/or hardware, and the device may be configured in In the electronic equipment, the electronic equipment may include but not limited to a terminal, a server, and the like.
图1是根据本公开一实施例提供的目标重识别模型的训练方法的流程示意图。参考图1所示,该方法包括步骤S101至步骤S104。FIG. 1 is a schematic flowchart of a method for training an object re-identification model according to an embodiment of the present disclosure. Referring to Fig. 1, the method includes step S101 to step S104.
S101:获取多个图像,多个图像分别具有对应的多种模态和对应的多个标注目标类别。S101: Acquire multiple images, each of which has multiple corresponding modalities and multiple corresponding labeled target categories.
其中,多个图像可以是采用图像采集设备在任意可能场景下采集的图像,或者还可以是从互联网中获取的图像,对此不作限制。Wherein, the multiple images may be images collected by an image collection device in any possible scene, or may also be images obtained from the Internet, which is not limited.
多个图像分别具有多种模态,多种模态例如:彩色图像模态、红外图像模态以及其它任意可能的图像模态,其中,彩色图像模态可以是RGB模态,红外图像模态可以是IR模态,关于多种模态,对此不做限制。Multiple images have multiple modes, such as: color image mode, infrared image mode, and any other possible image modes, where the color image mode can be RGB mode, infrared image mode It can be an IR mode, and there is no limitation regarding the various modes.
也即是说,本公开实施例中的多个图像可以具有RGB模态和IR模态。在实际应用中,可以采用图像采集装置(例如:摄像头)在白天采集彩色图像或者视频帧(RGB模态),夜晚采集红外图像或者视频帧(IR模态),从而可以得到具有多种模态的多个图像。That is to say, multiple images in the embodiments of the present disclosure may have RGB modality and IR modality. In practical applications, an image acquisition device (such as a camera) can be used to collect color images or video frames (RGB mode) during the day, and infrared images or video frames (IR mode) at night, so that multiple modes can be obtained. multiple images of .
多个图像中可以存在多个目标对象,例如:行人、车辆以及其它任意可能的目标对象,更具体地,多个目标对象还可以是行人1、行人2、车辆1、车辆2等,不同的行人或者车辆可以对应不同的类别,也即是说,本公开实施例可以针对不同目标对象采集多种模态的多个图像。There may be multiple target objects in multiple images, for example: pedestrians, vehicles, and any other possible target objects. More specifically, multiple target objects may also be pedestrian 1, pedestrian 2, vehicle 1, vehicle 2, etc., different Pedestrians or vehicles may correspond to different categories, that is to say, the embodiments of the present disclosure may collect multiple images of various modalities for different target objects.
而用于对目标对象的类别进行标注的信息,可以被称为标注目标类别,其中,标注目标类别例如可以是分值的形式,不同的分值表示不同类别的目标对象,通过标注目标类别可以对多个图像中的目标对象进行区分。The information used to label the category of the target object can be called the labeling target category, where the labeling target category can be in the form of a score, for example, and different scores represent different types of target objects. By labeling the target category, you can Differentiate target objects in multiple images.
此外,还可以将多个图像分为训练集(train set)和测试集(test set),其中包括图像和图像对应的标注目标类别。In addition, multiple images can also be divided into a training set (train set) and a test set (test set), which includes the image and the labeled target category corresponding to the image.
S102:获取与多种模态分别对应的多个卷积特征图,并获取与多种模态分别对应的多个边缘特征图。S102: Acquire multiple convolutional feature maps corresponding to multiple modalities, and acquire multiple edge feature maps respectively corresponding to multiple modalities.
上述获取多个图像后,进一步地获取与多种模态分别对应的多个卷积特征图和多个边缘特征图。After the multiple images are acquired above, multiple convolution feature maps and multiple edge feature maps respectively corresponding to multiple modalities are further acquired.
其中,对多种模态的图像进行卷积操作得到的特征图,可以被称为卷积特征图。本公开实施例可以采用神经网络中的任意一个或者多个卷积层对多种模态的图像进行卷积操作,例如:采用残差神经网络ResNet Layer0层提取该多个卷积特征图,或者还可以通过其它任意可能的方式获取多个卷积特征图,对此不作限制。Among them, the feature map obtained by performing convolution operations on images of various modalities may be called a convolution feature map. Embodiments of the present disclosure can use any one or more convolutional layers in the neural network to perform convolution operations on images of various modalities, for example: use the ResNet Layer0 layer of the residual neural network to extract the multiple convolutional feature maps, or Multiple convolutional feature maps can also be obtained in any other possible way, without limitation.
而边缘特征图,可以表示多种模态的图像中目标对象的边缘轮廓信息,本公开实施例中例如可以采用索贝尔算子(Sobel算子)对多个图像进行卷积操作,提取目标对象的边缘信息,得到多个边缘特征图,或者还可以采用其它任意可能的方式获取多个边缘特征图,对此不作限制。The edge feature map can represent the edge contour information of the target object in images of various modalities. In the embodiment of the present disclosure, for example, a Sobel operator (Sobel operator) can be used to perform convolution operations on multiple images to extract the target object edge information to obtain multiple edge feature maps, or obtain multiple edge feature maps in any other possible way, without limitation.
也即是说,为了解决RGB模态和IR模态之间的特征差异,本公开实施例在模型训练过程中可以采用目标对象的边缘轮廓信息作为指导,针对特性特征空间进行优化,从而实现了对模态间共性特征的挖掘。That is to say, in order to solve the feature difference between the RGB mode and the IR mode, the embodiment of the present disclosure can use the edge contour information of the target object as a guide during the model training process, and optimize the characteristic feature space, thereby realizing Mining of common features between modalities.
S103:获取与多种模态分别对应的多种特征距离信息。S103: Acquire various feature distance information respectively corresponding to multiple modalities.
上述获取多个卷积特征图和多个边缘特征图后,进一步地,获取与多种模态分别对应的多种特征距离信息。After the multiple convolutional feature maps and multiple edge feature maps are acquired above, multiple feature distance information corresponding to multiple modalities is further acquired.
其中,多种特征距离信息可以是不同标注目标类别的目标的特征中心点之间的距离,和/或相同目标对应于不同模态的特征中心点之间的距离,或者还可以是其它任意可能的特征距离信息,对此不作限制。Among them, the various feature distance information can be the distance between the feature center points of targets of different marked target categories, and/or the distance between the feature center points of the same target corresponding to different modalities, or any other possible The feature distance information of , without limitation.
举例而言,在确定多种特征距离信息的过程中,首先可以确定多个图像对应的多个特征向量,进一步地根据多个特征向量确定该特征中心点,从而可以根据特征中心点确定该多种特征距离信息,计算多种特征距离信息的具体方式可以参见下述实施例。For example, in the process of determining various feature distance information, multiple feature vectors corresponding to multiple images can be determined first, and further the feature center point can be determined according to the multiple feature vectors, so that the multiple feature vectors can be determined according to the feature center point. Please refer to the following embodiments for the specific manner of calculating multiple kinds of feature distance information.
S104:根据多个图像、多个卷积特征图、多个边缘特征图、多种特征距离信息,以及多个标注目标类别训练初始的重识别模型,以得到目标重识别模型。S104: Train an initial re-identification model according to multiple images, multiple convolutional feature maps, multiple edge feature maps, multiple feature distance information, and multiple labeled target categories to obtain a target re-identification model.
其中,本公开实施例的重识别模型可以是基于卷积神经网络结构的,具体地,可以采用残差神经网络ResNet50作为重识别模型的主干网络。Wherein, the re-identification model in the embodiment of the present disclosure may be based on a convolutional neural network structure, specifically, a residual neural network ResNet50 may be used as the backbone network of the re-identification model.
图2是根据本公开实施例提供重识别模型的网络结构示意图。如图2所示,本公开实施例可以将ResNet50分为了两部分,其中,开始阶段的卷积层(ResNet Layer0)可以采用双流的设计,而后的四个阶段的卷积层(ResNet Layer1-4)可以使用双流共享的权重的策略,统一的提取两种模态的信息。Fig. 2 is a schematic diagram of a network structure providing a re-identification model according to an embodiment of the present disclosure. As shown in Figure 2, the embodiment of the present disclosure can divide ResNet50 into two parts, wherein, the convolutional layer (ResNet Layer0) of the initial stage can adopt a dual-stream design, and the convolutional layer (ResNet Layer1-4) of the next four stages ) can use the strategy of dual-stream shared weights to uniformly extract the information of the two modalities.
训练过程中,可以根据多个图像、多个卷积特征图、多个边缘特征图、多种特征距离信息、多个标注目标类别之间的关系对初始的重识别模型(ResNet50)的参数进行优化调整,直至模型收敛,得到目标重识别模型。During the training process, the parameters of the initial re-identification model (ResNet50) can be adjusted according to the relationship between multiple images, multiple convolutional feature maps, multiple edge feature maps, multiple feature distance information, and multiple labeled target categories. Optimize and adjust until the model converges to obtain the target re-identification model.
在本公开实施例中,通过获取多个图像,多个图像分别具有对应的多种模态和对应的多个标注目标类别,并获取与多种模态分别对应的多个卷积特征图,并获取与多种模态分别对应的多个边缘特征图,并获取与多种模态分别对应的多种特征距离信息,以及根据多个图像、多个卷积特征图、多个边缘特征图、多种特征距离信息,以及多个标注目标类别训练初始的重识别模型,以得到目标重识别模型,因此训练的重识别模型能够充分挖掘多种模态图像中的特征,能够增强不同模态下图像匹配的准确度,从而提升跨模态的目标重识别的效果。进而解决了相关技术中存在的网络模型对多模态图像中的特征挖掘不够充分,影响跨模态的目标重识别的效果的技术问题。In the embodiment of the present disclosure, by acquiring multiple images, the multiple images respectively have corresponding multiple modalities and corresponding multiple labeled target categories, and multiple convolutional feature maps corresponding to the multiple modalities are acquired, And obtain multiple edge feature maps corresponding to multiple modalities, and obtain multiple feature distance information corresponding to multiple modalities, and according to multiple images, multiple convolution feature maps, and multiple edge feature maps , a variety of feature distance information, and multiple labeled target categories to train the initial re-identification model to obtain the target re-identification model. Therefore, the trained re-identification model can fully mine the features in various modal images and can enhance different modalities. Improve the accuracy of image matching, thereby improving the effect of cross-modal target re-identification. Furthermore, it solves the technical problem that the network model existing in the related technology is not sufficient for feature mining in multi-modal images, which affects the effect of cross-modal target re-identification.
图3是根据本公开另一实施例提供的目标重识别模型的训练方法的流程示意图。参考图3所示,该方法包括步骤S301至步骤S307。Fig. 3 is a schematic flowchart of a method for training an object re-identification model according to another embodiment of the present disclosure. Referring to Fig. 3, the method includes step S301 to step S307.
S301:获取多个图像,多个图像分别具有对应的多种模态和对应的多个标注目标类别。S301: Acquire multiple images, each of which has multiple corresponding modalities and multiple corresponding labeled target categories.
S302:获取与多种模态分别对应的多个卷积特征图,并获取与多种模态分别对应的多个边缘特征图。S302: Acquire multiple convolutional feature maps corresponding to multiple modalities, and acquire multiple edge feature maps corresponding to multiple modalities.
S303:获取与多种模态分别对应的多种特征距离信息。S303: Acquire various feature distance information respectively corresponding to multiple modalities.
S301~S303的具体说明可以参见上述实施例,此处不在赘述。For specific descriptions of S301-S303, reference may be made to the foregoing embodiments, and details are not repeated here.
S304:采用初始的重识别模型处理多个图像,以得到初始损失值。S304: Process multiple images using the initial re-identification model to obtain an initial loss value.
在训练初始的重识别模型的操作中,首先采用初始的重识别模型处理多个图像,以得到初始损失值,例如:可以采用身份损失函数(Id Loss)计算该初始的重识别模型的初始损失值,或者还可以采用其它损失函数确定初始损失值,对此不作限制。In the operation of training the initial re-identification model, first use the initial re-identification model to process multiple images to obtain the initial loss value, for example: the identity loss function (Id Loss) can be used to calculate the initial loss of the initial re-identification model value, or other loss functions can be used to determine the initial loss value, which is not limited.
在一些实施例中,如图2所示,初始的重识别模型可以包括顺序连接的全连接层(fully connected layers,FC)和输出层(例如:Softmax分类器),在采用初始的重识别模型处理多个图像,以得到初始损失值的过程中,首先可以将多个图像顺序输入至全连接层和输出层之中,以得到输出层输出的与多个图像分别对应的多个类别特征向量。In some embodiments, as shown in Figure 2, the initial re-identification model may include sequentially connected fully connected layers (fully connected layers, FC) and output layer (for example: Softmax classifier), when using the initial re-identification model In the process of processing multiple images to obtain the initial loss value, firstly, multiple images can be sequentially input into the fully connected layer and the output layer to obtain multiple category feature vectors corresponding to multiple images output by the output layer .
举例而言,可以采用rgb和ir分别代表多种模态的多个图像,令X m={x m|x m∈R H×W×3}表示输入的多个图像集(训练集或测试集),其中m∈{rgb,ir},H和W分别表示图像的高和宽,3表示图像的通道数(RGB图像包含R\G\B三个通道,IR图像通过重复其单通道3次转化为3通道)。例如:在训练过程中一个批次(Batch)中包含B张图片,令
Figure PCTCN2022099257-appb-000001
表示其中的一张RGB或IR图像,则i∈{1,2,...,B}。
For example, rgb and ir can be used to represent multiple images of various modalities, and let X m ={x m |x m ∈R H×W×3 } represent multiple input image sets (training set or test set set), where m∈{rgb,ir}, H and W represent the height and width of the image, respectively, and 3 represents the number of channels of the image (RGB image contains three channels of R\G\B, and IR image repeats its single channel 3 times converted to 3 channels). For example: in the training process, a batch (Batch) contains B pictures, so that
Figure PCTCN2022099257-appb-000001
Represents one of the RGB or IR images, then i∈{1,2,...,B}.
如图2所示,输入图像
Figure PCTCN2022099257-appb-000002
通过网络模型到最终全连接(FC)层及输出层(Softmax)操作之后,得到的向量可以被称为类别特征向量,类别特征向量例如可以用p i表示,则多个图像对应的多个类别特征向量表示为
Figure PCTCN2022099257-appb-000003
其中j∈{1,2,...,N},N为多个图像中目标类别的数量。
As shown in Figure 2, the input image
Figure PCTCN2022099257-appb-000002
After the operation of the network model to the final fully connected (FC) layer and the output layer (Softmax), the obtained vector can be called a category feature vector, and the category feature vector can be represented by p i , for example, then multiple categories corresponding to multiple images The eigenvectors are expressed as
Figure PCTCN2022099257-appb-000003
where j ∈ {1,2,...,N}, N is the number of object categories in multiple images.
在一些实施例中,确定与多个标注目标类别分别对应的多个编码向量,例如可以采用独热编码(one-hot)对多个标注目标类别进行编码得到编码向量,编码向量可以用y i表示,则多个编码向量可以表示为
Figure PCTCN2022099257-appb-000004
In some embodiments, a plurality of encoding vectors respectively corresponding to a plurality of labeled object categories are determined, for example, one-hot encoding (one-hot) can be used to encode a plurality of labeled object categories to obtain an encoded vector, and the encoded vector can be represented by y i Indicates that multiple encoded vectors can be expressed as
Figure PCTCN2022099257-appb-000004
在一些实施例中,根据多个类别特征向量和对应的多个编码向量,生成身份损失值,也即是说,本公开实施例可以采用身份损失函数(Id Loss)对多个类别特征向量和对应的多个编码向量进行计算,得到身份损失值,并将身份损失值作为初始损失值。In some embodiments, identity loss values are generated according to multiple category feature vectors and corresponding multiple encoding vectors, that is to say, the embodiments of the present disclosure can use the identity loss function (Id Loss) to compare multiple category feature vectors and The corresponding multiple encoding vectors are calculated to obtain the identity loss value, and the identity loss value is used as the initial loss value.
其中,身份损失函数Id Loss可以表示为:Among them, the identity loss function Id Loss can be expressed as:
Figure PCTCN2022099257-appb-000005
Figure PCTCN2022099257-appb-000005
可以理解的是,上述实例只是以身份损失值作为初始损失值进行示例性说明,在实际应用中还可以采用其它损失函数确定初始损失值,对此不作限制。It can be understood that, the above example is just an illustration by using the identity loss value as the initial loss value, and other loss functions may also be used to determine the initial loss value in practical applications, which is not limited.
本公开实施例采用身份损失值作为初始损失值,可以使模型具有良好的行人重识别的效果。In the embodiment of the present disclosure, the identity loss value is used as the initial loss value, which can make the model have a good pedestrian re-identification effect.
S305:采用初始的重识别模型处理多个卷积特征图和多个边缘特征图,以得到感知边缘损失值。S305: Using the initial re-identification model to process multiple convolutional feature maps and multiple edge feature maps to obtain perceptual edge loss values.
一些实施例中,初始的重识别模型可以包括第一网络结构,图4是根据本公开实施例提供的第一网络结构的结构示意图。如图4所示,该第一网络结构例如可以是深度卷积神经网络VGGNet-16,可以识别边卷积特征图和边缘特征图之间的感知损失值。采用VGGNet-16作为第一网络结构,可以深度的识别卷积特征图和边缘特征图之间的损失,从而提高感知损失值的准确性。In some embodiments, the initial re-identification model may include a first network structure, and FIG. 4 is a schematic structural diagram of the first network structure provided according to an embodiment of the present disclosure. As shown in FIG. 4, the first network structure can be, for example, a deep convolutional neural network VGGNet-16, which can identify perceptual loss values between edge convolutional feature maps and edge feature maps. Using VGGNet-16 as the first network structure can deeply identify the loss between the convolutional feature map and the edge feature map, thereby improving the accuracy of the perceived loss value.
在一些实施例中,如图4所示,可以将ResNet Layer0提取的多个卷积特征图和Sobel算子提取的多个边缘特征图输入至VGGNet-16之中,其中,VGGNet-16网络使用 φ={φ 1234}表示四个阶段,多个卷积特征图经过该四个阶段可以得到对应的多个卷积损失特征图,多个边缘特征图经过该四个阶段可以得到多个边缘损失特征图。 In some embodiments, as shown in Figure 4, multiple convolutional feature maps extracted by ResNet Layer0 and multiple edge feature maps extracted by Sobel operator can be input into VGGNet-16, wherein VGGNet-16 network uses φ={φ 1234 } represents four stages, through which multiple convolution feature maps can obtain corresponding multiple convolution loss feature maps, and multiple edge feature maps pass through the four stages Four stages can get multiple edge loss feature maps.
在一些实施例中,确定与多个卷积损失特征图分别对应的多个卷积特征图参数,并确定与多个边缘损失特征图分别对应的多个边缘特征图参数。In some embodiments, a plurality of convolution feature map parameters respectively corresponding to the plurality of convolution loss feature maps are determined, and a plurality of edge feature map parameters respectively corresponding to the plurality of edge loss feature maps are determined.
其中,令φ t(z)表示由第0-t个阶段的第一网络结构所提取的多个卷积损失特征图和多个边缘损失特征图,假设卷积损失特征图和边缘损失特征图形状为C t×H t×W t,则C t×H t×W t可以作为卷积损失特征图和边缘损失特征图的特征图参数。 Among them, let φ t (z) denote multiple convolution loss feature maps and multiple edge loss feature maps extracted by the first network structure of the 0-t stage, assuming convolution loss feature maps and edge loss feature maps The shape is C t ×H t ×W t , then C t ×H t ×W t can be used as the feature map parameters of the convolution loss feature map and the edge loss feature map.
其中,感知边缘损失值计算公式如下:Among them, the calculation formula of perceptual edge loss value is as follows:
Figure PCTCN2022099257-appb-000006
Figure PCTCN2022099257-appb-000006
其中,z和
Figure PCTCN2022099257-appb-000007
分别表示输入的卷积特征图和边缘损失特征图。
Among them, z and
Figure PCTCN2022099257-appb-000007
denote the input convolutional feature map and edge loss feature map, respectively.
在一些实施例中,根据多个卷积特征图参数处理对应的多个卷积损失特征图,以得到多个第一感知边缘损失值,并根据多个边缘特征图参数处理对应的多个边缘损失特征图,以得到多个第二感知边缘损失值。In some embodiments, the corresponding plurality of convolution loss feature maps are processed according to the plurality of convolution feature map parameters to obtain a plurality of first perceptual edge loss values, and the corresponding plurality of edges are processed according to the plurality of edge feature map parameters Loss feature map to get multiple second perceptual edge loss values.
其中,第一感知边缘损失值可以表示为:
Figure PCTCN2022099257-appb-000008
第二感知边缘损失值可以表示为:
Figure PCTCN2022099257-appb-000009
Among them, the first perceptual edge loss value can be expressed as:
Figure PCTCN2022099257-appb-000008
The second perceptual edge loss value can be expressed as:
Figure PCTCN2022099257-appb-000009
其中,的
Figure PCTCN2022099257-appb-000010
Figure PCTCN2022099257-appb-000011
分别表示两种模态各自的ResNet Layer0所提取的卷积特征图,
Figure PCTCN2022099257-appb-000012
Figure PCTCN2022099257-appb-000013
分别代表了对应模态的边缘特征图。
one of them
Figure PCTCN2022099257-appb-000010
and
Figure PCTCN2022099257-appb-000011
Represents the convolutional feature maps extracted by ResNet Layer0 of the two modalities respectively,
Figure PCTCN2022099257-appb-000012
and
Figure PCTCN2022099257-appb-000013
represent the edge feature maps of the corresponding modalities, respectively.
在一些实施例中,根据多个第一感知边缘损失值和多个第二感知边缘损失值,生成感知边缘损失值,例如:将第一感知边缘损失值和第二感知边缘损失值之和,作为该感知边缘损失值。In some embodiments, the perceptual edge loss value is generated according to multiple first perceptual edge loss values and multiple second perceptual edge loss values, for example: the sum of the first perceptual edge loss value and the second perceptual edge loss value, as the perceptual edge loss value.
感知边缘损失值表示为
Figure PCTCN2022099257-appb-000014
The perceptual edge loss value is expressed as
Figure PCTCN2022099257-appb-000014
在本公开实施例中,结合了感知边缘损失(PEF Loss),能够采用图像的边缘信息作为指导,挖掘模态特性空间中的共性信息,减小了不同模态之间的差异,从而提升跨模态目标重识别的效果。In the embodiment of the present disclosure, combined with the perceptual edge loss (PEF Loss), the edge information of the image can be used as a guide to mine the common information in the modality characteristic space, reducing the difference between different modalities, thereby improving the The effect of modal object re-identification.
S306:采用初始的重识别模型处理多种特征距离信息,以得到跨模态中心对比损失值。S306: Using the initial re-identification model to process various feature distance information to obtain a cross-modal center comparison loss value.
本公开实施例还可以采用初始的重识别模型处理多种特征距离信息,以得到跨模态中心对比损失值。Embodiments of the present disclosure may also use an initial re-identification model to process various feature distance information, so as to obtain cross-modal center comparison loss values.
图5是根据本公开实施例提供的目标的特征空间结构示意图。如图5所示,跨模态中心对比损失可以作用于模态的共性特征空间,本公开实施例中可以采用初始的重识别模型处理多种特征距离信息,例如:处理不同类别的目标的特征中心点之间的距离,或者处理相同类别目标对应于不同模态的特征中心点之间的距离,得到跨模态中心对比损失值。Fig. 5 is a schematic diagram of a feature space structure of an object provided according to an embodiment of the present disclosure. As shown in Figure 5, the cross-modal center contrast loss can act on the common feature space of the modality. In the embodiment of the present disclosure, the initial re-identification model can be used to process various feature distance information, for example: to process the features of different types of targets The distance between the center points, or the distance between the center points of the features corresponding to different modalities for the same category of targets, to obtain the cross-modal center comparison loss value.
S307:根据初始损失值、感知边缘损失值、以及跨模态中心对比损失值训练初始的重识别模型,以得到目标重识别模型。S307: Train an initial re-identification model according to the initial loss value, perceptual edge loss value, and cross-modal center comparison loss value, so as to obtain a target re-identification model.
在一些实施例中,可以首先根据初始损失值、感知边缘损失值、以及跨模态中心对比损 失值生成目标损失值,该目标损失值例如可以是初始损失值、感知边缘损失值、以及跨模态中心对比损失值之和,则目标损失值可以表示为:
Figure PCTCN2022099257-appb-000015
其中,
Figure PCTCN2022099257-appb-000016
表示感知边缘损失值,
Figure PCTCN2022099257-appb-000017
表示初始损失值,
Figure PCTCN2022099257-appb-000018
可以表示跨模态中心对比损失值。
In some embodiments, the target loss value may be firstly generated according to the initial loss value, perceptual edge loss value, and cross-modal center comparison loss value. The target loss value may be, for example, the initial loss value, perceptual edge loss value, and cross-modal loss value. The sum of state center comparison loss values, the target loss value can be expressed as:
Figure PCTCN2022099257-appb-000015
in,
Figure PCTCN2022099257-appb-000016
represents the perceptual edge loss value,
Figure PCTCN2022099257-appb-000017
represents the initial loss value,
Figure PCTCN2022099257-appb-000018
Can represent cross-modal center contrast loss values.
在一些实施例中,根据目标损失值训练初始的重识别模型,即:根据目标损失值调整重识别模型的参数,直至目标损失值满足设定条件,例如:满足模型收敛的条件,则将训练得到的重识别模型作为目标重识别模型。从而,在模型训练过程中,结合了多任务损失(即多种损失值)对模态特性特征空间和共性特征空间进行了针对性的优化调整,增强了模型的跨模态特征提取能力,并且可以使模型提取更加具有辨别性的特征,能够满足跨模态目标重识别对特征的要求,从而提高目标重识别的效果。In some embodiments, the initial re-identification model is trained according to the target loss value, that is, the parameters of the re-identification model are adjusted according to the target loss value until the target loss value meets the set conditions, for example: if the model convergence condition is met, the training The obtained re-identification model is used as the target re-identification model. Therefore, in the process of model training, combined with multi-task loss (that is, multiple loss values), the modal characteristic feature space and common feature space are optimized and adjusted, which enhances the cross-modal feature extraction ability of the model, and It can make the model extract more discriminative features, and can meet the requirements of cross-modal target re-identification for features, thereby improving the effect of target re-identification.
在本公开实施例中,通过获取多个图像,多个图像分别具有对应的多种模态和对应的多个标注目标类别,并获取与多种模态分别对应的多个卷积特征图,并获取与多种模态分别对应的多个边缘特征图,并获取与多种模态分别对应的多种特征距离信息,以及根据多个图像、多个卷积特征图、多个边缘特征图、多种特征距离信息,以及多个标注目标类别训练初始的重识别模型,以得到目标重识别模型,因此训练的重识别模型能够充分挖掘多种模态图像中的特征,能够增强不同模态下图像匹配的准确度,从而提升跨模态的目标重识别的效果。进而解决了相关技术中存在的网络模型对多模态图像中的特征挖掘不够充分,影响跨模态的目标重识别的效果的技术问题。此外,采用身份损失值作为初始损失值,可以使模型具有更好的行人重识别的效果。采用VGGNet-16作为第一网络结构,可以深度的识别卷积特征图和边缘特征图之间的损失,从而提高感知损失值的准确性。并且在模型训练过程中,结合了多任务损失(即多种损失值)对模态特性特征空间和共性特征空间进行了针对性的优化调整,增强了模型的跨模态特征提取能力,并且可以使模型提取更加具有辨别性的特征,能够满足跨模态目标重识别对特征的要求,从而提高目标重识别的效果。In the embodiment of the present disclosure, by acquiring multiple images, the multiple images respectively have corresponding multiple modalities and corresponding multiple labeled target categories, and multiple convolutional feature maps corresponding to the multiple modalities are acquired, And obtain multiple edge feature maps corresponding to multiple modalities, and obtain multiple feature distance information corresponding to multiple modalities, and according to multiple images, multiple convolution feature maps, and multiple edge feature maps , a variety of feature distance information, and multiple labeled target categories to train the initial re-identification model to obtain the target re-identification model. Therefore, the trained re-identification model can fully mine the features in various modal images and can enhance different modalities. Improve the accuracy of image matching, thereby improving the effect of cross-modal target re-identification. Furthermore, it solves the technical problem that the network model existing in the related technology is not sufficient for feature mining in multi-modal images, which affects the effect of cross-modal target re-identification. In addition, using the identity loss value as the initial loss value can make the model have a better person re-identification effect. Using VGGNet-16 as the first network structure can deeply identify the loss between the convolutional feature map and the edge feature map, thereby improving the accuracy of the perceived loss value. And in the process of model training, the multi-task loss (that is, multiple loss values) is combined to optimize and adjust the modal characteristic feature space and common feature space, which enhances the cross-modal feature extraction ability of the model, and can The model extracts more discriminative features, which can meet the feature requirements of cross-modal target re-identification, thereby improving the effect of target re-identification.
图6是根据本公开另一实施例提供的目标重识别模型的训练方法的流程示意图。参考图6所示,该方法包括步骤S601至步骤S610。Fig. 6 is a schematic flowchart of a method for training an object re-identification model according to another embodiment of the present disclosure. Referring to Fig. 6, the method includes step S601 to step S610.
S601:获取多个图像,多个图像分别具有对应的多种模态和对应的多个标注目标类别。S601: Acquire multiple images, each of which has corresponding multiple modalities and multiple corresponding labeled target categories.
S602:获取与多种模态分别对应的多个卷积特征图,并获取与多种模态分别对应的多个边缘特征图。S602: Acquire multiple convolutional feature maps corresponding to multiple modalities, and acquire multiple edge feature maps corresponding to multiple modalities.
S601-S602的具体说明可以参见上述实施例,此处不在赘述。For specific descriptions of S601-S602, reference may be made to the foregoing embodiments, and details are not repeated here.
S603:将多个图像分别输入至批标准化层之中,以得到批标准化层输出的与多个图像分别对应的多个特征向量。S603: Input multiple images into the batch normalization layer respectively, so as to obtain multiple feature vectors respectively corresponding to the multiple images output by the batch normalization layer.
在一些实施例中,如图2所示,初始的重识别模型还包括批标准化层(Batch Normalization,BN),在获取与多种模态分别对应的多种特征距离信息的操作中,首先将多个图像分别输入至批标准化层之中,以得到BN层输出的与多个图像分别对应的多个特征向量(例如用f i m表示)。 In some embodiments, as shown in FIG. 2, the initial re-identification model also includes a batch normalization layer (Batch Normalization, BN). In the operation of obtaining various feature distance information corresponding to various modalities, firstly A plurality of images are respectively input into the batch normalization layer, so as to obtain a plurality of feature vectors respectively corresponding to the plurality of images output by the BN layer (for example represented by f i m ).
S604:根据多个特征向量,确定与多个图像分别对应的多个目标的特征中心点。S604: Determine, according to the multiple feature vectors, feature center points of multiple targets respectively corresponding to the multiple images.
举例而言,在一个批次(Batch)中有P个类别的目标,每类包含K张RGB图像和K张IR图像,即B=2×P×K,假设用
Figure PCTCN2022099257-appb-000019
表示第k类目标不同模态的特征中心点,则特征中心点可以表示为:
Figure PCTCN2022099257-appb-000020
其中,m∈{rgb,ir},通过该公式可以计算得到
Figure PCTCN2022099257-appb-000021
Figure PCTCN2022099257-appb-000022
则第k类目标的特征中心点
Figure PCTCN2022099257-appb-000023
For example, there are P categories of targets in a batch (Batch), each category contains K RGB images and K IR images, that is, B=2×P×K, assuming
Figure PCTCN2022099257-appb-000019
Represents the feature center point of different modes of the k-th class target, then the feature center point can be expressed as:
Figure PCTCN2022099257-appb-000020
Among them, m∈{rgb,ir} can be calculated by this formula
Figure PCTCN2022099257-appb-000021
and
Figure PCTCN2022099257-appb-000022
Then the feature center point of the kth class target
Figure PCTCN2022099257-appb-000023
S605:确定不同目标的特征中心点之间的第一距离,并确定相同目标对应于不同模态的特征中心点之间的第二距离,第一距离和第二距离共同构成多种特征距离信息。S605: Determine the first distance between the feature center points of different targets, and determine the second distance between the feature center points corresponding to different modalities of the same target, the first distance and the second distance together constitute a variety of feature distance information .
在一些实施例中,确定不同目标的特征中心点之间的第一距离,即:确定不同类别的目标特征的中心之间的距离,可以用d inter表示第一距离。并且,还可以确定相同目标对应于不同模态的特征中心点之间的第二距离,即:确定同一类别的目标两个模态的特征的中心之间的距离,可以用d intra表示第二距离,并将第一距离和第二距离共同构成多种特征距离信息。因此,通过目标的特征中心点之间的关系确定多种特征距离信息,可以约束模态中心和类别中心的关系,能够很好的调整模型的特征提取能力。 In some embodiments, to determine the first distance between feature center points of different objects, that is, to determine the distance between centers of object features of different categories, the first distance may be represented by d inter . And, it is also possible to determine the second distance between the center points of the features corresponding to different modalities of the same target, that is, to determine the distance between the centers of the features of the two modalities of the target of the same category, which can be represented by d intra distance, and the first distance and the second distance jointly constitute a variety of characteristic distance information. Therefore, determining a variety of feature distance information through the relationship between the feature center points of the target can constrain the relationship between the mode center and the category center, and can well adjust the feature extraction ability of the model.
可以理解的是,上述实例只是对获取多种特征距离信息进行示例性说明,在实际应用中,还可以采用其它任意可能的方式进行获取,此处不作限制。It can be understood that the above example is only an exemplary description of obtaining various feature distance information, and in practical applications, any other possible ways can also be used to obtain, which is not limited here.
S606:采用初始的重识别模型处理多个图像,以得到初始损失值。S606: Process multiple images using the initial re-identification model to obtain an initial loss value.
一些实施例中,在确定初始损失值的操作中,还可以参考多个标注目标类别对多个图像进行图像划分,以得到三元样本集合,三元样本集合可以包括:多个图像(用
Figure PCTCN2022099257-appb-000024
表示)、多个第一图像(用
Figure PCTCN2022099257-appb-000025
),以及多个第二图像(用
Figure PCTCN2022099257-appb-000026
表示),
Figure PCTCN2022099257-appb-000027
集合中的多个第一图像对应相同标注目标类别,
Figure PCTCN2022099257-appb-000028
集合中的多个第二图像对应不同标注目标类别,
Figure PCTCN2022099257-appb-000029
Figure PCTCN2022099257-appb-000030
可以构成正样本对,
Figure PCTCN2022099257-appb-000031
Figure PCTCN2022099257-appb-000032
可以构成负样本对。
In some embodiments, in the operation of determining the initial loss value, multiple images may also be divided into multiple images with reference to multiple labeled target categories to obtain a triplet sample set, which may include: multiple images (using
Figure PCTCN2022099257-appb-000024
), multiple first images (indicated by
Figure PCTCN2022099257-appb-000025
), and multiple second images (with
Figure PCTCN2022099257-appb-000026
express),
Figure PCTCN2022099257-appb-000027
Multiple first images in the set correspond to the same labeled target category,
Figure PCTCN2022099257-appb-000028
A plurality of second images in the set correspond to different labeled target categories,
Figure PCTCN2022099257-appb-000029
and
Figure PCTCN2022099257-appb-000030
can constitute a positive sample pair,
Figure PCTCN2022099257-appb-000031
and
Figure PCTCN2022099257-appb-000032
Negative sample pairs can be formed.
在一些实施例中,确定图像的特征向量和第一图像的特征向量之间的第一欧式距离,特征向量由批标准化层输出,也即是说,可以采用批标准化层(BN)对图像的特征向量和第一图像的特征向量之间的距离进行计算,得到第一欧式距离。In some embodiments, the first Euclidean distance between the feature vector of the image and the feature vector of the first image is determined, and the feature vector is output by the batch normalization layer, that is to say, the batch normalization layer (BN) can be used to The distance between the eigenvectors and the eigenvectors of the first image is calculated to obtain the first Euclidean distance.
并且,还可以确定图像的特征向量和第二图像的特征向量之间的第二欧式距离,第一欧式距离和第二欧式距离例如可以用d表示。Moreover, a second Euclidean distance between the feature vector of the image and the feature vector of the second image may also be determined, and the first Euclidean distance and the second Euclidean distance may be represented by d, for example.
在一些实施例中,根据多个第一欧式距离和多个第二欧式距离,确定三元损失值,并将三元损失值作为初始损失值,初始损失值计算公式如下:In some embodiments, the ternary loss value is determined according to a plurality of first Euclidean distances and a plurality of second Euclidean distances, and the ternary loss value is used as an initial loss value, and the calculation formula of the initial loss value is as follows:
Figure PCTCN2022099257-appb-000033
Figure PCTCN2022099257-appb-000033
其中,
Figure PCTCN2022099257-appb-000034
d ii+表示第一欧式距离,d ii-表示第二欧式距离,
Figure PCTCN2022099257-appb-000035
和分别
Figure PCTCN2022099257-appb-000036
表示的是正样本对和负样本对的集合。从而,在模型训练过程中还可以结合带权重的三元损失函数(WRT Loss),引入了正负样本概念,使得分类预测结果更加聚集,且使分类间能够更加远离。
in,
Figure PCTCN2022099257-appb-000034
d ii+ means the first Euclidean distance, d ii- means the second Euclidean distance,
Figure PCTCN2022099257-appb-000035
and respectively
Figure PCTCN2022099257-appb-000036
Represents the set of positive sample pairs and negative sample pairs. Therefore, in the process of model training, the weighted ternary loss function (WRT Loss) can also be combined to introduce the concept of positive and negative samples, so that the classification prediction results are more aggregated, and the classification can be further separated.
S607:采用初始的重识别模型处理多个卷积特征图和多个边缘特征图,以得到感知边缘损失值。S607: Using the initial re-identification model to process multiple convolutional feature maps and multiple edge feature maps to obtain perceptual edge loss values.
S607的具体说明可以参见上述实施例,此处不在赘述。For a specific description of S607, reference may be made to the foregoing embodiments, and details are not repeated here.
S608:采用初始的重识别模型从多个第一距离确定出第一目标距离,第一目标距离是多个第一距离中值最小的第一距离。S608: Using the initial re-identification model to determine a first target distance from multiple first distances, where the first target distance is the first distance with the smallest median among the multiple first distances.
其中,多个第一距离中值最小的第一距离可以被称为第一目标距离,例如:
Figure PCTCN2022099257-appb-000037
表示的是所有d inter中的最小值,则
Figure PCTCN2022099257-appb-000038
可以作为该第一目标距离。
Wherein, the first distance with the smallest median among the multiple first distances may be called the first target distance, for example:
Figure PCTCN2022099257-appb-000037
Represents the minimum value of all d inter , then
Figure PCTCN2022099257-appb-000038
can be used as the first target distance.
S609:根据第一目标距离和多个第二距离,以及目标的数量计算得到跨模态中心对比损失值。S609: Calculate and obtain a cross-modal center comparison loss value according to the first target distance, multiple second distances, and the number of targets.
在一些实施例中,根据第一目标距离和多个第二距离,以及目标的数量计算得到跨模态中心对比损失值,跨模态中心对比损失值(可以称为CMCC损失)计算公式如下:In some embodiments, the cross-modal center contrast loss value is calculated according to the first target distance and multiple second distances, and the number of targets. The cross-modal center contrast loss value (may be referred to as CMCC loss) is calculated as follows:
Figure PCTCN2022099257-appb-000039
Figure PCTCN2022099257-appb-000039
在本公开实施例中,通过CMCC损失可以拉近同类别的不同模态之间的距离,同时拉远不同类别的特征之间的距离,从而优化了模型提取的特征f i m的分布状态,便于后期使用该层特征进行目标重识别的匹配工作。 In the embodiment of the present disclosure, the distance between different modalities of the same category can be shortened through the CMCC loss, and the distance between features of different categories can be shortened at the same time, thereby optimizing the distribution state of the feature f i m extracted by the model, It is convenient to use the features of this layer for target re-identification matching in the later stage.
S610:根据初始损失值、感知边缘损失值、以及跨模态中心对比损失值训练初始的重识别模型,以得到目标重识别模型。S610: Train an initial re-identification model according to the initial loss value, perceptual edge loss value, and cross-modal center comparison loss value, so as to obtain a target re-identification model.
例如:根据初始损失值、感知边缘损失值、以及跨模态中心对比损失值生成目标损失值,该目标损失值例如可以是初始损失值、感知边缘损失值、以及跨模态中心对比损失值之和,则目标损失值可以表示为:
Figure PCTCN2022099257-appb-000040
其中,
Figure PCTCN2022099257-appb-000041
表示感知边缘损失值,
Figure PCTCN2022099257-appb-000042
Figure PCTCN2022099257-appb-000043
表示初始损失值,
Figure PCTCN2022099257-appb-000044
可以表示跨模态中心对比损失值。在一些实施例中,根据目标损失值训练初始的重识别模型。
For example: a target loss value is generated according to an initial loss value, a perceptual edge loss value, and a cross-modal center comparison loss value, and the target loss value may be, for example, a combination of an initial loss value, a perception edge loss value, and a cross-modal center comparison loss value and, the target loss value can be expressed as:
Figure PCTCN2022099257-appb-000040
in,
Figure PCTCN2022099257-appb-000041
represents the perceptual edge loss value,
Figure PCTCN2022099257-appb-000042
and
Figure PCTCN2022099257-appb-000043
represents the initial loss value,
Figure PCTCN2022099257-appb-000044
Can represent cross-modal center contrast loss values. In some embodiments, an initial re-identification model is trained based on a target loss value.
在本公开实施例中,通过获取多个图像,多个图像分别具有对应的多种模态和对应的多个标注目标类别,并获取与多种模态分别对应的多个卷积特征图,并获取与多种模态分别对应的多个边缘特征图,并获取与多种模态分别对应的多种特征距离信息,以及根据多个图像、多个卷积特征图、多个边缘特征图、多种特征距离信息,以及多个标注目标类别训练初始的重识别模型,以得到目标重识别模型,因此训练的重识别模型能够充分挖掘多种模态图像中的特征,能够增强不同模态下图像匹配的准确度,从而提升跨模态的目标重识别的效果。进而解决了相关技术中存在的网络模型对多模态图像中的特征挖掘不够充分,影响跨模态的目标重识别的效果的技术问题。此外,通过目标的特征中心点之间的关系确定多种特征距离信息,可以约束模态中心和类别中心的关系,能够很好的调整模型的特征提取能力。并且,通过CMCC损失可以拉近同类别的不同模态之间的距离,同时拉远不同类别的特征之间的距离,从而优化了模型提取的特征f i m的分布状态,便于后期使用该层特征进行目标重识别的匹配工作。 In the embodiment of the present disclosure, by acquiring multiple images, the multiple images respectively have corresponding multiple modalities and corresponding multiple labeled target categories, and multiple convolutional feature maps corresponding to the multiple modalities are acquired, And obtain multiple edge feature maps corresponding to multiple modalities, and obtain multiple feature distance information corresponding to multiple modalities, and according to multiple images, multiple convolution feature maps, and multiple edge feature maps , a variety of feature distance information, and multiple labeled target categories to train the initial re-identification model to obtain the target re-identification model. Therefore, the trained re-identification model can fully mine the features in various modal images and can enhance different modalities. Improve the accuracy of image matching, thereby improving the effect of cross-modal target re-identification. Furthermore, it solves the technical problem that the network model existing in the related technology is not sufficient for feature mining in multi-modal images, which affects the effect of cross-modal target re-identification. In addition, by determining a variety of feature distance information through the relationship between the feature center points of the target, the relationship between the mode center and the category center can be constrained, and the feature extraction ability of the model can be well adjusted. Moreover, the distance between different modes of the same category can be shortened through CMCC loss, and the distance between features of different categories can be shortened at the same time, thereby optimizing the distribution state of the feature f i m extracted by the model, which is convenient for later use of this layer The features are matched for target re-identification.
在实际应用中,如图2所示,目标重识别模型的主干网络为卷积神经网络(这里使用的是ResNet50),具体来说,针对彩色图像和红外图像两种模态的输入,本公开将ResNet50分为了两部分,其中开始阶段的卷积层(ResNet Layer0)采用了双流的设计,之后的四个阶段的卷积层(ResNet Layer1-4)使用了双流共享的权重的策略,统一的提取两种模态的信息,之后对卷积层得到的特征图进行池化操作(本公开实施例中使用的是Generalized-mean(GeM)Pooling),再通过批量正则化的处理(Batch Normalization\BN),得到对应于每张图像提取的特征向量(用于测试应用过程中的图像重识别匹配),特征向量在训练过程中会继续通过全连接(FC)层及Softmax运算,得到对目标物体的分类分数。In practical applications, as shown in Figure 2, the backbone network of the target re-identification model is a convolutional neural network (ResNet50 is used here). Specifically, for the input of two modalities of color images and infrared images, the present disclosure Divide ResNet50 into two parts. The convolutional layer (ResNet Layer0) in the initial stage adopts a dual-stream design, and the convolutional layer (ResNet Layer1-4) in the next four stages uses a dual-stream shared weight strategy. Extract the information of the two modes, and then perform pooling operations on the feature maps obtained by the convolutional layer (Generalized-mean (GeM) Pooling is used in the embodiment of the present disclosure), and then through the processing of batch regularization (Batch Normalization\ BN) to obtain the feature vector corresponding to each image extraction (used for image re-identification matching in the test application process), the feature vector will continue to pass through the fully connected (FC) layer and Softmax operation during the training process to obtain the target object category scores.
模型训练过程中,使用了多任务损失函数,如式1所示,其中融合了四种损失函数,分别是身份损失(Id Loss)、带权重的三元损失(WRT Loss)、感知边缘损失(PEF Loss)与跨模态中心对比损失(CMCC Loss)。其中前两种损失为目前已有方法中常用的损失函数,后两种损失(PEF Loss和CMCC Loss)为本公开中新提出的损失函数,下面对前两种损失进行简单介绍,之后重点讲解后两种损失函数。In the process of model training, a multi-task loss function is used, as shown in Equation 1, which incorporates four loss functions, namely identity loss (Id Loss), weighted ternary loss (WRT Loss), perceptual edge loss ( PEF Loss) and cross-modal center contrast loss (CMCC Loss). Among them, the first two losses are loss functions commonly used in existing methods, and the latter two losses (PEF Loss and CMCC Loss) are loss functions newly proposed in this disclosure. The first two losses are briefly introduced below, and then the key points Explain the latter two loss functions.
Figure PCTCN2022099257-appb-000045
Figure PCTCN2022099257-appb-000045
假设rgb、ir分别代表RGB图像模态和IR图像模态,令
Figure PCTCN2022099257-appb-000046
表示输入的RGB图像和IR图像数据集,其中m∈{rgb,ir},H和W分别表示图像的高和宽,3 表示图像的通道数(RGB图像包含R\G\B三个通道,IR图像通过重复其单通道3次转化为3通道)。假设在训练过程中一个批次(Batch)中包含B张图片,令
Figure PCTCN2022099257-appb-000047
表示其中的一张RGB或IR图像,则i∈{1,2,...,B}。
Assuming that rgb and ir represent RGB image modality and IR image modality respectively, let
Figure PCTCN2022099257-appb-000046
Represents the input RGB image and IR image dataset, where m∈{rgb, ir}, H and W represent the height and width of the image, respectively, and 3 represents the number of channels of the image (RGB image contains three channels of R\G\B, IR images were converted to 3-channel by repeating their single-channel 3 times). Assuming that a batch (Batch) contains B pictures during the training process, let
Figure PCTCN2022099257-appb-000047
Represents one of the RGB or IR images, then i∈{1, 2,..., B}.
(1)身份损失(Id Loss)和带权重的三元损失(WRT Loss)(1) Identity loss (Id Loss) and weighted ternary loss (WRT Loss)
(1.1)身份损失(Id Loss):(1.1) Identity Loss (Id Loss):
如图1(a),输入图像
Figure PCTCN2022099257-appb-000048
通过网络模型得到最终全连接(FC)层及Softmax操作之后的向量,这里用p i表示,其对应标签的独热(one-hot)编码用y i表示:
As shown in Figure 1(a), the input image
Figure PCTCN2022099257-appb-000048
The final fully connected (FC) layer and the vector after the Softmax operation are obtained through the network model, which is represented by p i here, and the one-hot encoding of the corresponding label is represented by y i :
Figure PCTCN2022099257-appb-000049
Figure PCTCN2022099257-appb-000049
其中j∈{1,2,...,N},N为数据训练集中目标物体的类别数量,则Id Loss可表示为:Where j ∈ {1, 2, ..., N}, N is the number of categories of target objects in the data training set, then Id Loss can be expressed as:
Figure PCTCN2022099257-appb-000050
Figure PCTCN2022099257-appb-000050
(1.2)带权重的三元损失(WRT Loss):(1.2) Weighted ternary loss (WRT Loss):
如图1(a)所示,WRT损失
Figure PCTCN2022099257-appb-000051
是由模型批量正则化(BN)层以及L2-Norm运算之后得到的特征向量进行计算的,该损失函数的运算公式如下所示:
As shown in Figure 1(a), WRT loss
Figure PCTCN2022099257-appb-000051
It is calculated by the model batch regularization (BN) layer and the feature vector obtained after the L2-Norm operation. The calculation formula of the loss function is as follows:
Figure PCTCN2022099257-appb-000052
Figure PCTCN2022099257-appb-000052
Figure PCTCN2022099257-appb-000053
Figure PCTCN2022099257-appb-000053
其中
Figure PCTCN2022099257-appb-000054
表示的是一个三元样本集,其中包括样本
Figure PCTCN2022099257-appb-000055
同类别的样本
Figure PCTCN2022099257-appb-000056
和不同类别的样本
Figure PCTCN2022099257-appb-000057
Figure PCTCN2022099257-appb-000058
构成正样本对,
Figure PCTCN2022099257-appb-000059
Figure PCTCN2022099257-appb-000060
构成负样本对,d表示的是特征向量之间的欧氏距离,
Figure PCTCN2022099257-appb-000061
和分别
Figure PCTCN2022099257-appb-000062
表示的是正样本对和负样本对的集合。
in
Figure PCTCN2022099257-appb-000054
Represents a ternary sample set, which includes samples
Figure PCTCN2022099257-appb-000055
samples of the same class
Figure PCTCN2022099257-appb-000056
and samples of different classes
Figure PCTCN2022099257-appb-000057
and
Figure PCTCN2022099257-appb-000058
Constitute a positive sample pair,
Figure PCTCN2022099257-appb-000059
and
Figure PCTCN2022099257-appb-000060
Constitute a negative sample pair, d represents the Euclidean distance between feature vectors,
Figure PCTCN2022099257-appb-000061
and respectively
Figure PCTCN2022099257-appb-000062
Represents the set of positive sample pairs and negative sample pairs.
(2)感知边缘损失(PEF Loss)(2) Perceptual edge loss (PEF Loss)
如图1(a)和(b)所示,感知边缘损失作用于模态的特性特征空间,该部分特征由不共享的ResNet Layer0所生成,为了解决RGB模态和IR模态之间的特征差异,PEF损失使用目标的边缘轮廓信息作为指导,针对特性特征空间进行了直接优化,从而实现了对模态间共性特征的挖掘。As shown in Figure 1(a) and (b), the perceptual edge loss acts on the characteristic feature space of the modality. This part of the feature is generated by the unshared ResNet Layer0. In order to solve the feature between the RGB modality and the IR modality Difference, the PEF loss is directly optimized for the characteristic feature space using the edge profile information of the target as a guide, thus enabling the mining of common features among modalities.
具体来说,如图1(b)所示,这里以其中一个模态的损失计算为例,PEF损失的计算包含两个输入:一个是ResNet Layer0所提取的卷积特征图;另一个分支则是对原始模态输入的图像使用Sobel算子进行卷积操作,提取其边缘信息,得到边缘特征图。之后,PEF中计算了边缘特征图和卷积特征图之间的感知损失,使用在ImageNet上训练好的VGGNet-16模型作为感知网络,这里使用φ={φ 1234}表示其中的四个阶段,令φ t(z)表示由第0-t个阶段的感知网络所提取的特征图,假设其形状为C t×H t×W t,PEF损失的计算公式如下所示: Specifically, as shown in Figure 1(b), here we take the loss calculation of one of the modes as an example. The calculation of PEF loss includes two inputs: one is the convolution feature map extracted by ResNet Layer0; the other branch is It is to use the Sobel operator to perform convolution operation on the image input by the original modality, extract its edge information, and obtain the edge feature map. After that, the perceptual loss between the edge feature map and the convolutional feature map is calculated in PEF, and the VGGNet-16 model trained on ImageNet is used as the perceptual network, here we use φ={φ 1234 } represents the four stages, let φ t (z) represent the feature map extracted by the perception network of the 0-t stage, assuming its shape is C t ×H t ×W t , the calculation formula of PEF loss As follows:
Figure PCTCN2022099257-appb-000063
Figure PCTCN2022099257-appb-000063
其中z和
Figure PCTCN2022099257-appb-000064
分别表示输入的卷积特征图和边缘特征图,RGB和IR两个模态的PEF损失的计算如下:
where z and
Figure PCTCN2022099257-appb-000064
Representing the input convolutional feature map and edge feature map, respectively, the calculation of the PEF loss of the two modes of RGB and IR is as follows:
Figure PCTCN2022099257-appb-000065
Figure PCTCN2022099257-appb-000065
其中的
Figure PCTCN2022099257-appb-000066
Figure PCTCN2022099257-appb-000067
分别表示的是两个模态各自的ResNet Layer0所提取的卷积特征图,
Figure PCTCN2022099257-appb-000068
Figure PCTCN2022099257-appb-000069
分别代表了对应模态的边缘特征图,最终的损失是两个模态的损失之和。
one of them
Figure PCTCN2022099257-appb-000066
and
Figure PCTCN2022099257-appb-000067
Respectively represent the convolutional feature maps extracted by the ResNet Layer0 of the two modalities respectively,
Figure PCTCN2022099257-appb-000068
and
Figure PCTCN2022099257-appb-000069
represent the edge feature maps of the corresponding modalities respectively, and the final loss is the sum of the losses of the two modalities.
在感知边缘损失(PEF Loss)中,使用了先验知识边缘轮廓信息作为模态共性特征的指导,使得不共享的Layer-0所提取的模态特性特征更具一致性,有利于降低模态间的差异,从而更好的实现跨模态的目标重识别任务。In the Perceptual Edge Loss (PEF Loss), the edge contour information of prior knowledge is used as the guidance of the common features of the modes, which makes the mode characteristic features extracted by the unshared Layer-0 more consistent and helps to reduce the mode The difference between them, so as to better realize the cross-modal target re-identification task.
(3)跨模态中心对比损失(CMCC Loss)(3) Cross-modal center contrast loss (CMCC Loss)
本公开实施例提出了一种新的跨模态中心对比损失,该损失作用于模态的共性特征空间,即图1(a)BN层之后的特征向量(假设用f i m表示)所处的空间。假设在一个Batch中有P个类别的目标物体,每类包含K张RGB图像和K张IR图像,即B=2×P×K,用d inter表示不同类别的物体特征的中心之间的距离,d intra表示同一类别的物体两个模态的特征的中心之间的距离,假设用
Figure PCTCN2022099257-appb-000070
表示第k类物体不同模态的特征中心,则其计算公式为:
The embodiment of the present disclosure proposes a new cross-modal central contrast loss, which acts on the common feature space of the modality, that is, the feature vector (assumed to be represented by f i m ) after the BN layer in Figure 1(a) is located Space. Assuming that there are P categories of target objects in a batch, each category contains K RGB images and K IR images, that is, B=2×P×K, and d inter represents the distance between the centers of object features of different categories , d intra represents the distance between the centers of the features of the two modalities of objects of the same category, assuming that the
Figure PCTCN2022099257-appb-000070
Indicates the feature center of different modes of the kth object, and its calculation formula is:
Figure PCTCN2022099257-appb-000071
Figure PCTCN2022099257-appb-000071
其中m∈{rgb,ir},通过公式8可以计算得到
Figure PCTCN2022099257-appb-000072
Figure PCTCN2022099257-appb-000073
则第k类目标物体特征的中心
Figure PCTCN2022099257-appb-000074
之后可以得到CMCC损失
Figure PCTCN2022099257-appb-000075
的计算公式如下所示:
Where m∈{rgb,ir} can be calculated by formula 8
Figure PCTCN2022099257-appb-000072
and
Figure PCTCN2022099257-appb-000073
Then the center of the kth class target object feature
Figure PCTCN2022099257-appb-000074
CMCC loss can be obtained after
Figure PCTCN2022099257-appb-000075
The calculation formula of is as follows:
Figure PCTCN2022099257-appb-000076
Figure PCTCN2022099257-appb-000076
其中
Figure PCTCN2022099257-appb-000077
表示的是所有d inter中的最小值,通过优化该损失函数可以拉近同类别的不同模态之间的距离,同时拉远不同类别的特征之间的距离,从而优化了模型提取的特征f i m的分布状态,便于后期使用该层特征进行目标重识别的匹配工作。
in
Figure PCTCN2022099257-appb-000077
Represents the minimum value of all d inter . By optimizing the loss function, the distance between different modalities of the same category can be shortened, and the distance between features of different categories can be shortened, thereby optimizing the feature f extracted by the model The distribution state of i m is convenient for the matching work of target re-identification using the features of this layer in the later stage.
图7是根据本公开实施例提供的目标重识别模型的训练流程图。如图7所示,包括以下步骤:Fig. 7 is a flow chart of training a target re-identification model according to an embodiment of the present disclosure. As shown in Figure 7, the following steps are included:
(1)输入图像预处理阶段(1) Input image preprocessing stage
步骤1-1:读取跨模态的目标重识别图像数据集,获取原始图像及对应的目标物体的类别信息;Step 1-1: Read the cross-modal target re-identification image data set, and obtain the original image and the category information of the corresponding target object;
其中,数据集包括:训练集(train set)和测试集(test set),包括原始图像和图像对应的物体类别标签,在训练过程中,使用图像输入模型,之后结合类别标签计算损失函数,在测试过程中,将测试集划分为待查询集合(query)和待匹配集合(gallery),用于测试模型的重识别性能;Among them, the data set includes: training set (train set) and test set (test set), including the original image and the object category label corresponding to the image. During the training process, the image input model is used, and then the loss function is calculated in combination with the category label. During the test, the test set is divided into a set to be queried (query) and a set to be matched (gallery), which is used to test the re-identification performance of the model;
算法模型超参数:包括模型训练过程中输入图像的尺寸、批量(Batch)大小、批量中不同模态的目标物体和数量、图像数据增强的方式、训练迭代轮数(Epoch)、学习率(learning rate)调整策略,使用的优化器(optimizer)类型,具体如下。Algorithm model hyperparameters: including the size of the input image during model training, batch size, target objects and numbers of different modalities in the batch, image data enhancement methods, number of training iterations (Epoch), learning rate (learning rate) adjustment strategy, the type of optimizer (optimizer) used, as follows.
模型训练过程中输入图像尺寸:288*144;Input image size during model training: 288*144;
批量大小为:64(包括8个目标物体,每个模态的一个目标物体有4张图像);Batch size: 64 (includes 8 objects, 4 images of one object per modality);
图像数据增强的方式:随机裁剪、水平翻转;Image data enhancement methods: random cropping, horizontal flipping;
训练迭代轮数为:200;The number of training iterations is: 200;
优化器:采用Adam优化器,权重衰减(weight decay)为0.0005;Optimizer: using Adam optimizer, weight decay (weight decay) is 0.0005;
学习率调整策略:Learning rate adjustment strategy:
学习率在前10个epoch期间从0.0005线性增大至0.005,10-20个epoch维持0.005,之后每隔5个epoch衰减为原来的十分之一,直至第35个epoch到训练结束都维持0.000005。The learning rate increases linearly from 0.0005 to 0.005 during the first 10 epochs, maintains 0.005 for 10-20 epochs, and then decays to one-tenth of the original every 5 epochs, and maintains 0.000005 until the 35th epoch to the end of training .
步骤1-2:根据设定的Batch大小、Batch中的类别数量和每个类别下的图像数量,将RGB、IR两种模态的数据整理成一个批次(Batch);Step 1-2: According to the set batch size, the number of categories in the batch and the number of images under each category, organize the data of RGB and IR into a batch (Batch);
步骤1-3:对图像做标准化操作,之后将图像调整到设置的宽高尺寸,并对其进行指定的数据增强变换,之后将成批的数据加载到GPU显存中,用于之后输入到训练的模型中,并使用对应的标签参与后期损失的计算。Step 1-3: Standardize the image, then adjust the image to the set width and height dimensions, and perform specified data enhancement transformation on it, and then load batches of data into GPU memory for later input into training In the model, and use the corresponding label to participate in the calculation of the later loss.
(2)特征提取阶段(2) Feature extraction stage
步骤2-1:将两种模态的图像数据分别沿双流特征提取网络(如图2所示结构)输入,将每个模态的数据送入到各自的入口分支;Step 2-1: Input the image data of the two modalities respectively along the dual-stream feature extraction network (structure shown in Figure 2), and send the data of each modality to their respective entry branches;
步骤2-2:输入的数据进行逐层的传递,进行对应层级的计算,依次通过模态特性部分和模态共性部分;Step 2-2: The input data is transferred layer by layer, and the calculation of the corresponding layer is performed, and the modal characteristic part and the modal common part are sequentially passed;
步骤2-3:通过步骤2-2的前向传播,可以获取到中间特征及最后的分类预测分数,将用于下一阶段的多任务损失计算。Step 2-3: Through the forward propagation of step 2-2, the intermediate features and the final classification prediction score can be obtained, which will be used for the multi-task loss calculation in the next stage.
(3)多任务损失计算阶段(3) Multi-task loss calculation stage
步骤3-1:针对一个Batch的输入数据,根据上述式1-9计算方式,可以得到
Figure PCTCN2022099257-appb-000078
Figure PCTCN2022099257-appb-000079
Figure PCTCN2022099257-appb-000080
Step 3-1: For the input data of a batch, according to the calculation method of the above formula 1-9, you can get
Figure PCTCN2022099257-appb-000078
Figure PCTCN2022099257-appb-000079
and
Figure PCTCN2022099257-appb-000080
步骤3-2:将四种损失相加得到最终的多任务损失
Figure PCTCN2022099257-appb-000081
值。
Step 3-2: Add the four losses to get the final multi-task loss
Figure PCTCN2022099257-appb-000081
value.
(4)模型迭代优化阶段(4) Model iterative optimization stage
步骤4-1:本公开的实施代码使用了自动微分的PyTorch深度学习框架,该框架支持直接从计算的多任务损失值出发,进行整个算法模型的反向传播,计算其中可学习参数的梯度值;Step 4-1: The implementation code of this disclosure uses the automatic differentiation PyTorch deep learning framework, which supports the backpropagation of the entire algorithm model directly from the calculated multi-task loss value, and calculates the gradient value of the learnable parameters ;
步骤4-2:采用设定的优化器利用步骤4-1中计算的梯度,对模型算法的可学习参数进行更新优化操作;Step 4-2: Use the set optimizer to update and optimize the learnable parameters of the model algorithm using the gradient calculated in step 4-1;
步骤4-3:重复上述所有步骤,并在此过程中不断更新模型参数,直至达到设定的训练轮数,之后停止对算法模型的训练过程。Step 4-3: Repeat all the above steps, and continuously update the model parameters in the process until the set number of training rounds is reached, and then stop the training process of the algorithm model.
(5)模型测试评估阶段(5) Model testing and evaluation stage
步骤5-1:对测试集进行划分,将IR图像作为待查询集合(query),将RGB图像作为待匹配集合(gallery),测试的方式是使用物体的IR图像作为query,在RGB图像集合中匹配该物体的图像,从而检测模型的跨模态目标重识别性能;Step 5-1: Divide the test set, use the IR image as the query set (query), and the RGB image as the matching set (gallery). The test method is to use the IR image of the object as the query, in the RGB image set Match the image of the object to test the cross-modal object re-identification performance of the model;
步骤5-2:在测试过程中,读取测试集的图像(包括query和gallery的图像),将两种模态的数据都输入到测试模型中,通过模型的前向传播与逐层运算得到每张图像的特征向量(图2中BN层之后的特征向量);Step 5-2: During the test, read the images of the test set (including images of query and gallery), input the data of both modalities into the test model, and obtain The feature vector of each image (the feature vector after the BN layer in Figure 2);
步骤5-3:使用余弦距离进行query图像和所有gallery图像之间的相似性度量,之后根据距离大小进行排序,得到每个query图像(IR图像)所匹配的gallery图像(RGB图像)列表;Step 5-3: use cosine distance to measure the similarity between the query image and all gallery images, and then sort according to the distance to obtain a list of gallery images (RGB images) matched by each query image (IR image);
步骤5-4:计算目标重识别任务中常用的评价指标Rank-n以及mAP,通过观察指标数值对模型性能进行评估;Step 5-4: Calculate the evaluation indicators Rank-n and mAP commonly used in the target re-identification task, and evaluate the model performance by observing the indicator values;
步骤5-4:如果评估结果没有达到设定的要求,可以调整模型的超参数,从流程步骤的第一步重新开始,继续对算法模型进行训练,若评估的各项指标达到要求,之后则进行模型权重的保存,权重和模型代码即为最终的跨模态目标重识别解决方案。Step 5-4: If the evaluation result does not meet the set requirements, you can adjust the hyperparameters of the model, restart from the first step of the process step, and continue to train the algorithm model. If the evaluation indicators meet the requirements, then Save the model weights, and the weights and model codes are the final cross-modal target re-identification solution.
在本公开实施例的技术方案中:In the technical scheme of the embodiment of the present disclosure:
1、使用了多任务损失对模态特性特征空间和共性特征空间进行了针对性的优化调整,端到端的完成跨模态目标重识别任务。1. The multi-task loss is used to optimize and adjust the modal feature space and common feature space, and complete the cross-modal target re-identification task end-to-end.
2、提出了感知边缘损失,能够采用图像的边缘信息作为指导,挖掘模态特性空间中的共性信息,减小了不同模态之间的差异。2. The perceptual edge loss is proposed, which can use the edge information of the image as a guide to mine the common information in the modality feature space, reducing the differences between different modalities.
3、提出了跨模态中心对比损失,其作用于共性特征空间,通过约束模态中心和类别中心的关系,能够很好的调整模型的特征提取能力,使模型达到优秀的性能。3. A cross-modal center comparison loss is proposed, which acts on the common feature space. By constraining the relationship between the modal center and the category center, the feature extraction ability of the model can be well adjusted, so that the model can achieve excellent performance.
通过本方案可以对特征空间优化,提出了特性特征空间和共性特征空间的划分,并进行了针对性的调整优化,从而实现了一种高效的端到端的跨模态目标重识别方法。在实施例中,提出的感知边缘损失可以直接约束不同模态的特征,为模型特征提取过程引入先验知识,增强了模型的跨模态特征提取能力;提出的跨模态中心对比损失可以使模型提取更加具有辨别性的特征,其有效的减小了同类物体模态间差异,增大了不同类物体特征差异,有利于模型对跨模态数据进行正确的重识别。Through this scheme, the feature space can be optimized, the division of characteristic feature space and common feature space is proposed, and targeted adjustment and optimization are carried out, so as to realize an efficient end-to-end cross-modal target re-identification method. In the embodiment, the proposed perceptual edge loss can directly constrain the features of different modalities, introduce prior knowledge into the model feature extraction process, and enhance the cross-modal feature extraction capability of the model; the proposed cross-modal center comparison loss can make The model extracts more discriminative features, which effectively reduces the difference between the modalities of similar objects and increases the characteristic differences of different types of objects, which is conducive to the correct re-identification of cross-modal data by the model.
图8是根据本公开另一实施例提供的目标重识别方法的流程示意图。参考图8所示,该方法包括步骤S801至步骤S802。Fig. 8 is a schematic flowchart of a method for object re-identification provided according to another embodiment of the present disclosure. Referring to Fig. 8, the method includes step S801 to step S802.
S801:获取参考图像和待识别图像,参考图像和待识别图像的模态不相同,参考图像包括:参考类别。S801: Acquire a reference image and an image to be recognized, where the modes of the reference image and the image to be recognized are different, and the reference image includes: a reference category.
其中,参考图像和待识别图像可以是任意场景下采集的图像,并且参考图像和待识别图像的模态不相同。Wherein, the reference image and the image to be recognized may be images collected in any scene, and the modalities of the reference image and the image to be recognized are different.
在一些实施例中,参考图像可以是RGB模态的图像,待识别图像可以是IR模态的图像;或者参考图像可以是IR模态的图像,待识别图像可以是RGB模态的图像,对此不作限制。In some embodiments, the reference image can be an image of RGB modality, and the image to be recognized can be an image of IR modality; or the reference image can be an image of IR modality, and the image to be recognized can be an image of RGB modality, for This is not limited.
并且,参考图像还对应有参考类别,其中,参考类别用于描述参考图像中目标对象的类别,例如:目标对象的类别为车辆、行人以及其它任意可能的类别,对此不作限制。Moreover, the reference image also corresponds to a reference category, wherein the reference category is used to describe the category of the target object in the reference image, for example: the category of the target object is a vehicle, a pedestrian, or any other possible category, which is not limited.
S802:将参考图像和待识别图像分别输入至上述的目标重识别模型的训练方法训练得到的目标重识别模型之中,以得到目标重识别模型输出的与待识别图像对应的目标,目标具有对应的目标类别,目标类别与参考类别相匹配。S802: Input the reference image and the image to be recognized into the target re-recognition model trained by the above-mentioned target re-recognition model training method to obtain the target corresponding to the image to be recognized output by the target re-recognition model, and the target has a corresponding The target category of , the target category matches the reference category.
上述获取参考图像和待识别图像后,进一步地将参考图像和待识别图像输入至上述实施例训练得到的目标重识别模型中,通过目标重识别模型可以输出待识别图像对应的目标和对应的目标类别,其中,目标类别与参考类别相匹配,例如:目标类别和参考类别为同一车辆。After the reference image and the image to be recognized are obtained above, the reference image and the image to be recognized are further input into the target re-identification model trained in the above embodiment, and the target corresponding to the image to be recognized and the corresponding target can be output through the target re-recognition model category, where the target category matches the reference category, e.g. the target category and the reference category are the same vehicle.
也即是说,通过目标重识别模型,从待识别图像中识别出与参考图像中目标对象相同的对象,以实现跨模态目标重识别的目的。That is to say, through the target re-identification model, the same object as the target object in the reference image is recognized from the image to be recognized, so as to achieve the purpose of cross-modal target re-identification.
在本公开实施例中,通过获取参考图像和待识别图像,参考图像和待识别图像的模态不相同,参考图像包括:参考类别,并将参考图像和待识别图像分别输入至目标重识别模型的训练方法训练得到的目标重识别模型之中,以得到目标重识别模型输出的与待识别图像对应的目标,目标具有对应的目标类别,目标类别与参考类别相匹配。由于采用上述目标重识别模型的训练方法训练的目标重识别模型对待识别图像进行识别,从而,能够充分挖掘待识别图像的特征,能够增强不同模态下图像匹配的准确度,从而提升跨模态的目标重识别的效果。In the embodiment of the present disclosure, by acquiring the reference image and the image to be recognized, the modalities of the reference image and the image to be recognized are different, the reference image includes: a reference category, and the reference image and the image to be recognized are respectively input into the target re-identification model In the target re-identification model trained by the training method, the target corresponding to the image to be recognized outputted by the target re-identification model is obtained, the target has a corresponding target category, and the target category matches the reference category. Since the target re-recognition model trained by the above-mentioned target re-recognition model training method recognizes the image to be recognized, the features of the image to be recognized can be fully mined, the accuracy of image matching under different modalities can be enhanced, and the cross-modal The effect of target re-identification.
图9是根据本公开另一实施例提供的目标重识别模型的训练装置的示意图。参考图9所示,该目标重识别模型的训练装置90包括:Fig. 9 is a schematic diagram of a training device for a target re-identification model according to another embodiment of the present disclosure. Referring to Fig. 9, the training device 90 of the target re-identification model includes:
第一获取模块901,用于获取多个图像,多个图像分别具有对应的多种模态和对应的多个标注目标类别;The first acquiring module 901 is configured to acquire multiple images, and the multiple images respectively have corresponding multiple modalities and corresponding multiple labeled target categories;
第二获取模块902,用于获取与多种模态分别对应的多个卷积特征图,并获取与多种模态分别对应的多个边缘特征图;The second acquisition module 902 is configured to acquire multiple convolutional feature maps corresponding to multiple modalities, and multiple edge feature maps corresponding to multiple modalities respectively;
第三获取模块903,用于获取与多种模态分别对应的多种特征距离信息;以及The third acquiring module 903 is configured to acquire various feature distance information respectively corresponding to various modalities; and
训练模块904,用于根据多个图像、多个卷积特征图、多个边缘特征图、多种特征距离信息,以及多个标注目标类别训练初始的重识别模型,以得到目标重识别模型。The training module 904 is configured to train an initial re-identification model according to multiple images, multiple convolutional feature maps, multiple edge feature maps, multiple feature distance information, and multiple labeled target categories to obtain a target re-identification model.
在一些实施例中,图10是根据本公开另一实施例提供的目标重识别模型的训练装置的 示意图。如图10所示,训练模块904,包括:In some embodiments, FIG. 10 is a schematic diagram of a training device for a target re-identification model provided according to another embodiment of the present disclosure. As shown in Figure 10, the training module 904 includes:
第一处理子模块9041,用于采用初始的重识别模型处理多个图像,以得到初始损失值;The first processing sub-module 9041 is used to process multiple images using an initial re-identification model to obtain an initial loss value;
第二处理子模块9042,用于采用初始的重识别模型处理多个卷积特征图和多个边缘特征图,以得到感知边缘损失值;The second processing sub-module 9042 is used to process multiple convolutional feature maps and multiple edge feature maps using the initial re-identification model to obtain perceptual edge loss values;
第三处理子模块9043,用于采用初始的重识别模型处理多种特征距离信息,以得到跨模态中心对比损失值;The third processing sub-module 9043 is used to process various feature distance information using the initial re-identification model to obtain cross-modal center comparison loss values;
训练子模块9044,用于根据初始损失值、感知边缘损失值、以及跨模态中心对比损失值训练初始的重识别模型,以得到目标重识别模型。The training sub-module 9044 is used to train the initial re-identification model according to the initial loss value, perceptual edge loss value, and cross-modal center comparison loss value, so as to obtain the target re-identification model.
在一些实施例中,初始的重识别模型包括:第一网络结构,第一网络结构用于识别卷积特征图和边缘特征图之间的感知损失值。In some embodiments, the initial re-identification model includes: a first network structure, and the first network structure is used to identify the perceptual loss value between the convolutional feature map and the edge feature map.
在一些实施例中,第二处理子模块9042,具体用于:In some embodiments, the second processing submodule 9042 is specifically used to:
将多个卷积特征图和多个边缘特征图输入至第一网络结构之中,以得到与多个卷积特征图分别对应的多个卷积损失特征图,并得到与多个边缘特征图分别对应的多个边缘损失特征图;Input multiple convolution feature maps and multiple edge feature maps into the first network structure to obtain multiple convolution loss feature maps corresponding to the multiple convolution feature maps, and obtain multiple edge feature maps Multiple edge loss feature maps corresponding to each;
确定与多个卷积损失特征图分别对应的多个卷积特征图参数,并确定与多个边缘损失特征图分别对应的多个边缘特征图参数;Determining a plurality of convolution feature map parameters respectively corresponding to a plurality of convolution loss feature maps, and determining a plurality of edge feature map parameters respectively corresponding to a plurality of edge loss feature maps;
根据多个卷积特征图参数处理对应的多个卷积损失特征图,以得到多个第一感知边缘损失值;Processing a plurality of corresponding convolution loss feature maps according to a plurality of convolution feature map parameters to obtain a plurality of first perceptual edge loss values;
根据多个边缘特征图参数处理对应的多个边缘损失特征图,以得到多个第二感知边缘损失值;以及Processing a plurality of corresponding edge loss feature maps according to a plurality of edge feature map parameters to obtain a plurality of second perceptual edge loss values; and
根据多个第一感知边缘损失值和多个第二感知边缘损失值,生成感知边缘损失值。Perceptual edge loss values are generated based on the plurality of first perceptual edge loss values and the plurality of second perceptual edge loss values.
在一些实施例中,如图10所示,初始的重识别模型包括:批标准化层,第三获取模块903,包括:In some embodiments, as shown in FIG. 10, the initial re-identification model includes: a batch normalization layer, and a third acquisition module 903, including:
标准化处理子模块9031,用于将多个图像分别输入至批标准化层之中,以得到批标准化层输出的与多个图像分别对应的多个特征向量;The normalization processing sub-module 9031 is configured to input multiple images into the batch normalization layer respectively, so as to obtain multiple feature vectors respectively corresponding to the multiple images output by the batch normalization layer;
中心点确定子模块9032,用于根据多个特征向量,确定与多个图像分别对应的多个目标的特征中心点;The center point determination submodule 9032 is used to determine the feature center points of multiple targets respectively corresponding to multiple images according to multiple feature vectors;
距离确定子模块9033,用于确定不同目标的特征中心点之间的第一距离,并确定相同目标对应于不同模态的特征中心点之间的第二距离,第一距离和第二距离共同构成多种特征距离信息。The distance determination sub-module 9033 is used to determine the first distance between the feature center points of different targets, and determine the second distance between the feature center points of the same target corresponding to different modalities, the first distance and the second distance are common A variety of feature distance information is formed.
在一些实施例中,第三处理子模块9043,具体用于:In some embodiments, the third processing submodule 9043 is specifically used to:
采用初始的重识别模型从多个第一距离确定出第一目标距离,第一目标距离是多个第一距离中值最小的第一距离;Determining a first target distance from multiple first distances by using an initial re-identification model, where the first target distance is the first distance with the smallest median value among the multiple first distances;
根据第一目标距离和多个第二距离,以及目标的数量计算得到跨模态中心对比损失值。A cross-modal center comparison loss value is calculated according to the first target distance, multiple second distances, and the number of targets.
在一些实施例中,初始的重识别模型包括:顺序连接的全连接层和输出层,第一处理子模块9041,具体用于:In some embodiments, the initial re-identification model includes: a sequentially connected fully connected layer and an output layer, and the first processing submodule 9041 is specifically used for:
将多个图像顺序输入至全连接层和输出层之中,以得到输出层输出的与多个图像分别对应的多个类别特征向量;Sequentially input multiple images into the fully connected layer and the output layer to obtain multiple category feature vectors output by the output layer corresponding to the multiple images;
确定与多个标注目标类别分别对应的多个编码向量;determining a plurality of encoding vectors respectively corresponding to a plurality of labeled target categories;
根据多个类别特征向量和对应的多个编码向量,生成身份损失值,并将身份损失值作为初始损失值。An identity loss value is generated according to multiple category feature vectors and corresponding multiple encoding vectors, and the identity loss value is used as an initial loss value.
在一些实施例中,第一处理子模块9041,具体用于:In some embodiments, the first processing submodule 9041 is specifically used to:
参考多个标注目标类别对多个图像进行图像划分,以得到三元样本集合,三元样本集合包括:多个图像、多个第一图像,以及多个第二图像,多个第一图像对应相同标注目标类别,多个第二图像对应不同标注目标类别;Image division is performed on multiple images with reference to multiple labeled target categories to obtain a triplet sample set, which includes: multiple images, multiple first images, and multiple second images, and the multiple first images correspond to The same tagged target category, multiple second images corresponding to different tagged target categories;
确定图像的特征向量和第一图像的特征向量之间的第一欧式距离,特征向量由批标准化 层输出;determining the first Euclidean distance between the feature vector of the image and the feature vector of the first image, the feature vector being output by the batch normalization layer;
确定图像的特征向量和第二图像的特征向量之间的第二欧式距离;以及determining a second Euclidean distance between the eigenvectors of the image and the eigenvectors of the second image; and
根据多个第一欧式距离和多个第二欧式距离,确定三元损失值,并将三元损失值作为初始损失值。According to the plurality of first Euclidean distances and the plurality of second Euclidean distances, a ternary loss value is determined, and the ternary loss value is used as an initial loss value.
在一些实施例中,训练子模块9044,具体用于:In some embodiments, the training submodule 9044 is specifically used for:
根据初始损失值、感知边缘损失值、以及跨模态中心对比损失值生成目标损失值;Generate a target loss value based on the initial loss value, perceptual edge loss value, and cross-modal center comparison loss value;
如果目标损失值满足设定条件,则将训练得到的重识别模型作为目标重识别模型。If the target loss value satisfies the set conditions, the trained re-identification model is used as the target re-identification model.
在一些实施例中,多种模态包括:彩色图像模态和红外图像模态。In some embodiments, the plurality of modalities includes: a color image modality and an infrared image modality.
需要说明的是,前述对目标重识别模型的训练方法的解释说明也适用于本公开实施例的装置,此处不再赘述。It should be noted that the foregoing explanations on the training method of the target re-identification model are also applicable to the device of the embodiment of the present disclosure, and will not be repeated here.
在本公开实施例中,通过获取多个图像,多个图像分别具有对应的多种模态和对应的多个标注目标类别,并获取与多种模态分别对应的多个卷积特征图,并获取与多种模态分别对应的多个边缘特征图,并获取与多种模态分别对应的多种特征距离信息,以及根据多个图像、多个卷积特征图、多个边缘特征图、多种特征距离信息,以及多个标注目标类别训练初始的重识别模型,以得到目标重识别模型,因此训练的重识别模型能够充分挖掘多种模态图像中的特征,能够增强不同模态下图像匹配的准确度,从而提升跨模态的目标重识别的效果。进而解决了相关技术中存在的网络模型对多模态图像中的特征挖掘不够充分,影响跨模态的目标重识别的效果的技术问题。In the embodiment of the present disclosure, by acquiring multiple images, the multiple images respectively have corresponding multiple modalities and corresponding multiple labeled target categories, and multiple convolutional feature maps corresponding to the multiple modalities are acquired, And obtain multiple edge feature maps corresponding to multiple modalities, and obtain multiple feature distance information corresponding to multiple modalities, and according to multiple images, multiple convolution feature maps, and multiple edge feature maps , a variety of feature distance information, and multiple labeled target categories to train the initial re-identification model to obtain the target re-identification model. Therefore, the trained re-identification model can fully mine the features in various modal images and can enhance different modalities. Improve the accuracy of image matching, thereby improving the effect of cross-modal target re-identification. Furthermore, it solves the technical problem that the network model existing in the related technology is not sufficient for feature mining in multi-modal images, which affects the effect of cross-modal target re-identification.
图11是根据本公开另一实施例提供的目标重识别装置的示意图。参考图11所示,该目标重识别装置100包括:Fig. 11 is a schematic diagram of an object re-identification device according to another embodiment of the present disclosure. Referring to Figure 11, the target re-identification device 100 includes:
第四获取模块1001,用于获取参考图像和待识别图像,参考图像和待识别图像的模态不相同,参考图像包括:参考类别;The fourth acquisition module 1001 is used to acquire a reference image and an image to be recognized, the modalities of the reference image and the image to be recognized are different, and the reference image includes: a reference category;
识别模块1002,用于将参考图像和待识别图像分别输入至上述目标重识别模型的训练方法训练得到的目标重识别模型之中,以得到目标重识别模型输出的与待识别图像对应的目标,目标具有对应的目标类别,目标类别与参考类别相匹配。The recognition module 1002 is configured to respectively input the reference image and the image to be recognized into the target re-recognition model trained by the above-mentioned target re-recognition model training method, so as to obtain the target corresponding to the image to be recognized output by the target re-recognition model, A target has a corresponding target class, and the target class matches the reference class.
需要说明的是,前述对目标重识别模型的训练方法的解释说明也适用于本公开实施例的装置,此处不再赘述。It should be noted that the foregoing explanations on the training method of the target re-identification model are also applicable to the device of the embodiment of the present disclosure, and will not be repeated here.
在本公开实施例中,可以采用上述目标重识别模型的训练方法训练的目标重识别模型对待识别图像进行识别,确定待识别图像对应的目标。从而,能够充分挖掘待识别图像的特征,能够增强不同模态下图像匹配的准确度,从而提升跨模态的目标重识别的效果。In the embodiment of the present disclosure, the object re-identification model trained by the above-mentioned object re-identification model training method can be used to recognize the image to be recognized, and determine the object corresponding to the image to be recognized. Therefore, the features of the image to be recognized can be fully exploited, the accuracy of image matching in different modalities can be enhanced, and the effect of cross-modal target re-recognition can be improved.
根据本公开的实施例,本公开还提供了一种电子设备、一种可读存储介质、一种计算机程序产品和一种计算机程序。According to the embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium, a computer program product, and a computer program.
为了实现上述实施例,本公开实施例提出一种电子设备,包括:至少一个处理器;以及与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行本公开实施例的目标重识别模型的训练方法,或者执行本公开实施例的目标重识别方法。In order to achieve the above-mentioned embodiments, an embodiment of the present disclosure proposes an electronic device, including: at least one processor; and a memory connected to the at least one processor in communication; wherein, the memory stores information that can be processed by the at least one processor. Instructions executed by the processor, the instructions are executed by the at least one processor, so that the at least one processor can execute the target re-identification model training method of the embodiment of the present disclosure, or perform the target re-identification of the embodiment of the present disclosure method.
为了实现上述实施例,本公开实施例提出一种存储有计算机指令的非瞬时计算机可读存储介质,所述计算机指令用于使所述计算机执行本公开实施例的目标重识别模型的训练方法,或者执行本公开实施例的目标重识别方法。In order to realize the above-mentioned embodiments, the embodiments of the present disclosure propose a non-transitory computer-readable storage medium storing computer instructions, the computer instructions are used to enable the computer to execute the target re-identification model training method of the embodiments of the present disclosure, Or execute the target re-identification method of the embodiment of the present disclosure.
为了实现上述实施例,本公开实施例还提出一种计算机程序产品,当计算机程序产品中的指令处理器执行时,执行本公开实施例任一项所述的目标重识别模型的训练方法,或者执行本公开实施例任一项所述的目标重识别方法。In order to achieve the above embodiments, the embodiments of the present disclosure also propose a computer program product, when the instruction processor in the computer program product executes, execute the method for training the target re-identification model described in any one of the embodiments of the present disclosure, or Execute the object re-identification method described in any one of the embodiments of the present disclosure.
为了实现上述实施例,本公开实施例提出一种计算机程序,所述计算机程序包括计算机程序代码,当所述计算机程序代码在计算机上运行时,以使得计算机执行本公开实施例任一项所述的目标重识别模型的训练方法,或者执行本公开实施例任一项所述的目标重识别方法。In order to realize the above-mentioned embodiments, the embodiments of the present disclosure propose a computer program, the computer program includes computer program code, when the computer program code is run on the computer, so that the computer executes any one of the embodiments of the present disclosure. The training method of the target re-identification model, or execute the target re-identification method described in any one of the embodiments of the present disclosure.
图12示出了适于用来实现本公开实施方式的示例性计算机设备的框图。图12显示的计算机设备12仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。Figure 12 shows a block diagram of an exemplary computer device suitable for implementing embodiments of the present disclosure. The computer device 12 shown in FIG. 12 is only an example, and should not limit the functions and scope of use of the embodiments of the present disclosure.
如图12所示,计算机设备12以通用计算设备的形式表现。计算机设备12的组件可以包括但不限于:一个或者多个处理器或者处理单元16,系统存储器28,连接不同系统组件(包括系统存储器28和处理单元16)的总线18。As shown in FIG. 12, computer device 12 takes the form of a general-purpose computing device. Components of computer device 12 may include, but are not limited to: one or more processors or processing units 16 , system memory 28 , bus 18 connecting various system components including system memory 28 and processing unit 16 .
总线18表示几类总线结构中的一种或多种,包括存储器总线或者存储器控制器,外围总线,图形加速端口,处理器或者使用多种总线结构中的任意总线结构的局域总线。举例来说,这些体系结构包括但不限于工业标准体系结构(Industry Standard Architecture;以下简称:ISA)总线,微通道体系结构(Micro Channel Architecture;以下简称:MAC)总线,增强型ISA总线、视频电子标准协会(Video Electronics Standards Association;以下简称:VESA)局域总线以及外围组件互连(Peripheral Component Interconnection;以下简称:PCI)总线。 Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus structures. For example, these architectures include but are not limited to Industry Standard Architecture (Industry Standard Architecture; hereinafter referred to as: ISA) bus, Micro Channel Architecture (Micro Channel Architecture; hereinafter referred to as: MAC) bus, enhanced ISA bus, video electronics Standards Association (Video Electronics Standards Association; hereinafter referred to as: VESA) local bus and Peripheral Component Interconnection (hereinafter referred to as: PCI) bus.
计算机设备12典型地包括多种计算机系统可读介质。这些介质可以是任何能够被计算机设备12访问的可用介质,包括易失性和非易失性介质,可移动的和不可移动的介质。 Computer device 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by computer device 12 and include both volatile and nonvolatile media, removable and non-removable media.
存储器28可以包括易失性存储器形式的计算机系统可读介质,例如随机存取存储器(Random Access Memory;以下简称:RAM)30和/或高速缓存存储器32。计算机设备12可以进一步包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例,存储系统34可以用于读写不可移动的、非易失性磁介质(图12未显示,通常称为“硬盘驱动器”)。The memory 28 may include a computer system readable medium in the form of a volatile memory, such as a random access memory (Random Access Memory; hereinafter referred to as: RAM) 30 and/or a cache memory 32 . Computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read and write to non-removable, non-volatile magnetic media (not shown in FIG. 12, commonly referred to as a "hard drive").
尽管图12中未示出,可以提供用于对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器,以及对可移动非易失性光盘(例如:光盘只读存储器(Compact Disc Read Only Memory;以下简称:CD-ROM)、数字多功能只读光盘(Digital Video Disc Read Only Memory;以下简称:DVD-ROM)或者其它光介质)读写的光盘驱动器。在这些情况下,每个驱动器可以通过一个或者多个数据介质接口与总线18相连。存储器28可以包括至少一个程序产品,该程序产品具有一组(例如至少一个)程序模块,这些程序模块被配置以执行本公开各实施例的功能。Although not shown in FIG. 12, a disk drive for reading and writing to a removable nonvolatile disk (such as a "floppy disk") may be provided, as well as a removable nonvolatile disk (such as a Compact Disk ROM (Compact Disk). Disc Read Only Memory; hereinafter referred to as: CD-ROM), Digital Video Disc Read Only Memory (hereinafter referred to as: DVD-ROM) or other optical media). In these cases, each drive may be connected to bus 18 via one or more data media interfaces. Memory 28 may include at least one program product having a set (eg, at least one) of program modules configured to perform the functions of various embodiments of the present disclosure.
具有一组(至少一个)程序模块42的程序/实用工具40,可以存储在例如存储器28中,这样的程序模块42包括但不限于操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块42通常执行本公开所描述的实施例中的功能和/或方法。A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including but not limited to an operating system, one or more application programs, other program modules, and program data , each or some combination of these examples may include implementations of network environments. The program modules 42 generally perform the functions and/or methods of the embodiments described in this disclosure.
计算机设备12也可以与一个或多个外部设备14(例如键盘、指向设备、显示器24等)通信,还可与一个或者多个使得用户能与该计算机设备12交互的设备通信,和/或与使得该计算机设备12能与一个或多个其它计算设备进行通信的任何设备(例如网卡,调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口22进行。并且,计算机设备12还可以通过网络适配器20与一个或者多个网络(例如局域网(Local Area Network;以下简称:LAN),广域网(Wide Area Network;以下简称:WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器20通过总线18与计算机设备12的其它模块通信。应当明白,尽管图中未示出,可以结合计算机设备12使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。The computer device 12 may also communicate with one or more external devices 14 (e.g., a keyboard, pointing device, display 24, etc.), and with one or more devices that enable a user to interact with the computer device 12, and/or with Any device (eg, network card, modem, etc.) that enables the computing device 12 to communicate with one or more other computing devices. Such communication may occur through input/output (I/O) interface 22 . Moreover, the computer device 12 can also communicate with one or more networks (such as a local area network (Local Area Network; hereinafter referred to as: LAN), a wide area network (Wide Area Network; hereinafter referred to as: WAN) and/or public networks, such as the Internet, through the network adapter 20. ) communication. As shown, network adapter 20 communicates with other modules of computer device 12 via bus 18 . It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with computer device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives And data backup storage system, etc.
处理单元16通过运行存储在系统存储器28中的程序,从而执行各种功能应用以及目标重识别模型的训练,例如实现前述实施例中提及的目标重识别模型的训练方法。The processing unit 16 executes various functional applications and training of the object re-identification model by running the program stored in the system memory 28 , for example, realizing the training method of the object re-identification model mentioned in the foregoing embodiments.
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其它实施方案。本公开旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。Other embodiments of the present disclosure will be readily apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. The present disclosure is intended to cover any modification, use or adaptation of the present disclosure. These modifications, uses or adaptations follow the general principles of the present disclosure and include common knowledge or conventional technical means in the technical field not disclosed in the present disclosure. . The specification and examples are to be considered exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。It should be understood that the present disclosure is not limited to the precise constructions which have been described above and shown in the drawings, and various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.
需要说明的是,在本公开的描述中,术语“第一”、“第二”等仅用于描述目的,而不能理解为指示或暗示相对重要性。此外,在本公开的描述中,除非另有说明,“多个”的含义是两个或两个以上。It should be noted that, in the description of the present disclosure, terms such as "first" and "second" are used for description purposes only, and should not be understood as indicating or implying relative importance. In addition, in the description of the present disclosure, unless otherwise specified, "plurality" means two or more.
流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多个用于实现特定逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本公开的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本公开的实施例所属技术领域的技术人员所理解。Any process or method descriptions in flowcharts or otherwise described herein may be understood to represent modules, segments or portions of code comprising one or more executable instructions for implementing specific logical functions or steps of the process , and the scope of preferred embodiments of the present disclosure includes additional implementations in which functions may be performed out of the order shown or discussed, including substantially concurrently or in reverse order depending on the functions involved, which shall It is understood by those skilled in the art to which the embodiments of the present disclosure pertain.
应当理解,本公开的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。例如,如果用硬件来实现,和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。It should be understood that various parts of the present disclosure may be implemented in hardware, software, firmware or a combination thereof. In the embodiments described above, various steps or methods may be implemented by software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or combination of the following techniques known in the art: Discrete logic circuits, ASICs with suitable combinational logic gates, programmable gate arrays (PGAs), field programmable gate arrays (FPGAs), etc.
本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,该程序在执行时,包括方法实施例的步骤之一或其组合。Those of ordinary skill in the art can understand that all or part of the steps carried by the methods of the above embodiments can be completed by instructing related hardware through a program, and the program can be stored in a computer-readable storage medium. During execution, one or a combination of the steps of the method embodiments is included.
此外,在本公开各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing module, each unit may exist separately physically, or two or more units may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules. If the integrated modules are realized in the form of software function modules and sold or used as independent products, they can also be stored in a computer-readable storage medium.
上述提到的存储介质可以是只读存储器,磁盘或光盘等。The storage medium mentioned above may be a read-only memory, a magnetic disk or an optical disk, and the like.
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本公开的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。In the description of this specification, descriptions referring to the terms "one embodiment", "some embodiments", "example", "specific examples", or "some examples" mean that specific features described in connection with the embodiment or example , structure, material or characteristic is included in at least one embodiment or example of the present disclosure. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
尽管上面已经示出和描述了本公开的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本公开的限制,本领域的普通技术人员在本公开的范围内可以对上述实施例进行变化、修改、替换和变型。Although the embodiments of the present disclosure have been shown and described above, it can be understood that the above embodiments are exemplary and should not be construed as limitations on the present disclosure, and those skilled in the art can understand the above-mentioned embodiments within the scope of the present disclosure. The embodiments are subject to changes, modifications, substitutions and variations.
本公开所有实施例均可以单独被执行,也可以与其他实施例相结合被执行,均视为本公开要求的保护范围。All the embodiments of the present disclosure can be implemented independently or in combination with other embodiments, which are all regarded as the scope of protection required by the present disclosure.

Claims (26)

  1. 一种目标重识别模型的训练方法,所述方法包括:A training method for a target re-identification model, the method comprising:
    获取多个图像,所述多个图像分别具有对应的多种模态和对应的多个标注目标类别;Acquiring multiple images, the multiple images respectively have corresponding multiple modes and corresponding multiple labeled target categories;
    获取与所述多种模态分别对应的多个卷积特征图,并获取与所述多种模态分别对应的多个边缘特征图;Obtaining a plurality of convolutional feature maps respectively corresponding to the various modalities, and obtaining a plurality of edge feature maps respectively corresponding to the various modalities;
    获取与所述多种模态分别对应的多种特征距离信息;以及Acquiring various kinds of feature distance information respectively corresponding to the multiple modalities; and
    根据所述多个图像、所述多个卷积特征图、所述多个边缘特征图、所述多种特征距离信息,以及所述多个标注目标类别训练初始的重识别模型,以得到目标重识别模型。According to the plurality of images, the plurality of convolutional feature maps, the plurality of edge feature maps, the plurality of feature distance information, and the plurality of labeled object categories, train an initial re-identification model to obtain an object re-identification model.
  2. 如权利要求1所述的方法,其中所述根据所述多个图像、所述多个卷积特征图、所述多个边缘特征图、所述多种特征距离信息,以及所述多个标注目标类别训练初始的重识别模型,以得到目标重识别模型,包括:The method according to claim 1, wherein said multiple image features, said multiple convolutional feature maps, said multiple edge feature maps, said multiple feature distance information, and said multiple annotations The target category trains the initial re-identification model to obtain the target re-identification model, including:
    采用所述初始的重识别模型处理所述多个图像,以得到初始损失值;processing the plurality of images using the initial re-identification model to obtain an initial loss value;
    采用所述初始的重识别模型处理所述多个卷积特征图和所述多个边缘特征图,以得到感知边缘损失值;processing the plurality of convolutional feature maps and the plurality of edge feature maps using the initial re-identification model to obtain perceptual edge loss values;
    采用所述初始的重识别模型处理所述多种特征距离信息,以得到跨模态中心对比损失值;Using the initial re-identification model to process the various feature distance information to obtain a cross-modal center comparison loss value;
    根据所述初始损失值、所述感知边缘损失值、以及所述跨模态中心对比损失值训练所述初始的重识别模型,以得到所述目标重识别模型。The initial re-identification model is trained according to the initial loss value, the perceptual edge loss value, and the cross-modal center comparison loss value, so as to obtain the target re-identification model.
  3. 如权利要求1或2所述的方法,其中所述初始的重识别模型包括:第一网络结构,所述第一网络结构用于识别所述卷积特征图和所述边缘特征图之间的感知损失值。The method according to claim 1 or 2, wherein the initial re-identification model includes: a first network structure, the first network structure is used to identify the convolutional feature map and the edge feature map between Perceived loss value.
  4. 如权利要求3所述的方法,其中所述采用所述初始的重识别模型处理所述多个卷积特征图和所述多个边缘特征图,以得到感知边缘损失值,包括:The method according to claim 3, wherein the processing of the plurality of convolutional feature maps and the plurality of edge feature maps using the initial re-identification model to obtain a perceptual edge loss value comprises:
    将所述多个卷积特征图和所述多个边缘特征图输入至所述第一网络结构之中,以得到与所述多个卷积特征图分别对应的多个卷积损失特征图,并得到与所述多个边缘特征图分别对应的多个边缘损失特征图;inputting the plurality of convolutional feature maps and the plurality of edge feature maps into the first network structure to obtain a plurality of convolution loss feature maps respectively corresponding to the plurality of convolutional feature maps, And obtain a plurality of edge loss feature maps respectively corresponding to the plurality of edge feature maps;
    确定与所述多个卷积损失特征图分别对应的多个卷积特征图参数,并确定与所述多个边缘损失特征图分别对应的多个边缘特征图参数;determining a plurality of convolution feature map parameters respectively corresponding to the plurality of convolution loss feature maps, and determining a plurality of edge feature map parameters respectively corresponding to the plurality of edge loss feature maps;
    根据所述多个卷积特征图参数处理对应的所述多个卷积损失特征图,以得到多个第一感知边缘损失值;Processing the corresponding plurality of convolution loss feature maps according to the plurality of convolution feature map parameters to obtain a plurality of first perceptual edge loss values;
    根据所述多个边缘特征图参数处理对应的所述多个边缘损失特征图,以得到多个第二感知边缘损失值;以及processing the corresponding plurality of edge loss feature maps according to the plurality of edge feature map parameters to obtain a plurality of second perceptual edge loss values; and
    根据所述多个第一感知边缘损失值和所述多个第二感知边缘损失值,生成所述感知边缘损失值。The perceptual edge loss value is generated based on the plurality of first perceptual edge loss values and the plurality of second perceptual edge loss values.
  5. 如权利要求1至4中任一项所述的方法,其中所述初始的重识别模型包括:批标准化层,所述获取与所述多种模态分别对应的多种特征距离信息,包括:The method according to any one of claims 1 to 4, wherein the initial re-identification model includes: a batch normalization layer, and the acquisition of various feature distance information corresponding to the various modalities respectively includes:
    将所述多个图像分别输入至所述批标准化层之中,以得到所述批标准化层输出的与所述多个图像分别对应的多个特征向量;Inputting the plurality of images into the batch normalization layer, respectively, to obtain a plurality of feature vectors respectively corresponding to the plurality of images output by the batch normalization layer;
    根据所述多个特征向量,确定与所述多个图像分别对应的多个目标的特征中心点;determining, according to the plurality of feature vectors, feature center points of a plurality of targets respectively corresponding to the plurality of images;
    确定不同所述目标的特征中心点之间的第一距离,并确定相同所述目标对应于不同所述模态的特征中心点之间的第二距离,所述第一距离和所述第二距离共同构成所述多种特征距离信息。determining a first distance between feature center points of different targets, and determining a second distance between feature center points of the same target corresponding to different modes, the first distance and the second The distances together constitute the various kinds of characteristic distance information.
  6. 如权利要求2至5中任一项所述的方法,其中所述采用所述初始的重识别模型处理所述多种特征距离信息,以得到跨模态中心对比损失值,包括:The method according to any one of claims 2 to 5, wherein said use of said initial re-identification model to process said multiple feature distance information to obtain cross-modal center contrast loss values comprises:
    采用所述初始的重识别模型从多个第一距离确定出第一目标距离,所述第一目标距离是所述多个第一距离中值最小的所述第一距离;Using the initial re-identification model to determine a first target distance from multiple first distances, the first target distance is the first distance with the smallest median value among the multiple first distances;
    根据所述第一目标距离和多个所述第二距离,以及所述目标的数量计算得到所述跨模态中心对比损失值。The cross-modal center comparison loss value is calculated according to the first target distance, multiple second distances, and the number of targets.
  7. 如权利要求2至6中任一项所述的方法,其中所述初始的重识别模型包括:顺序连接的全连接层和输出层,所述采用所述初始的重识别模型处理所述多个图像,以得到初始损失值,包括:The method according to any one of claims 2 to 6, wherein the initial re-identification model includes: a sequentially connected fully connected layer and an output layer, and the use of the initial re-identification model to process the multiple Image to get the initial loss value, including:
    将所述多个图像顺序输入至所述全连接层和输出层之中,以得到所述输出层输出的与所述多个图像分别对应的多个类别特征向量;Sequentially input the plurality of images into the fully connected layer and the output layer to obtain a plurality of category feature vectors output by the output layer corresponding to the plurality of images;
    确定与所述多个标注目标类别分别对应的多个编码向量;determining a plurality of encoding vectors respectively corresponding to the plurality of labeling target categories;
    根据所述多个类别特征向量和对应的所述多个编码向量,生成身份损失值,并将所述身份损失值作为所述初始损失值。An identity loss value is generated according to the plurality of category feature vectors and the corresponding encoding vectors, and the identity loss value is used as the initial loss value.
  8. 如权利要求5至7中任一项所述的方法,其中所述采用所述初始的重识别模型处理所述多个图像,以得到初始损失值,包括:The method according to any one of claims 5 to 7, wherein said processing said plurality of images using said initial re-identification model to obtain an initial loss value comprises:
    参考所述多个标注目标类别对所述多个图像进行图像划分,以得到三元样本集合,所述三元样本集合包括:所述多个图像、多个第一图像,以及多个第二图像,所述多个第一图像对应相同所述标注目标类别,所述多个第二图像对应不同所述标注目标类别;performing image division on the plurality of images with reference to the plurality of labeled target categories to obtain a triplet sample set, the triplet sample set including: the plurality of images, the plurality of first images, and the plurality of second images, the multiple first images correspond to the same labeled target category, and the multiple second images correspond to different labeled target categories;
    确定所述图像的特征向量和所述第一图像的特征向量之间的第一欧式距离,所述特征向量由所述批标准化层输出;determining a first Euclidean distance between a feature vector of the image and a feature vector of the first image, the feature vector output by the batch normalization layer;
    确定所述图像的特征向量和所述第二图像的特征向量之间的第二欧式距离;以及determining a second Euclidean distance between the feature vector of the image and the feature vector of the second image; and
    根据多个所述第一欧式距离和多个所述第二欧式距离,确定三元损失值,并将所述三元损失值作为所述初始损失值。A ternary loss value is determined according to the plurality of first Euclidean distances and the plurality of second Euclidean distances, and the ternary loss value is used as the initial loss value.
  9. 如权利要求2至8中任一项所述的方法,其中所述根据所述初始损失值、所述感知边缘损失值、以及所述跨模态中心对比损失值训练所述初始的重识别模型,以得到所述目标重识别模型,包括:The method according to any one of claims 2 to 8, wherein said initial re-identification model is trained according to said initial loss value, said perceptual edge loss value, and said cross-modal center contrast loss value , to obtain the target re-identification model, including:
    根据所述初始损失值、所述感知边缘损失值、以及所述跨模态中心对比损失值生成目标损失值;generating a target loss value based on the initial loss value, the perceptual edge loss value, and the cross-modal center comparison loss value;
    如果所述目标损失值满足设定条件,则将训练得到的所述重识别模型作为所述目标重识别模型。If the target loss value satisfies the set condition, the re-identification model obtained through training is used as the target re-identification model.
  10. 如权利要求1至9中任一项所述的方法,其中所述多种模态包括:彩色图像模态和红外图像模态。The method according to any one of claims 1 to 9, wherein the plurality of modalities include: a color image modality and an infrared image modality.
  11. 一种目标重识别方法,包括:A target re-identification method, comprising:
    获取参考图像和待识别图像,所述参考图像和所述待识别图像的模态不相同,所述参考图像包括:参考类别;Obtaining a reference image and an image to be recognized, the modes of the reference image and the image to be recognized are different, and the reference image includes: a reference category;
    将所述参考图像和所述待识别图像分别输入至如权利要求1-10任一项所述的目标重识别模型的训练方法训练得到的目标重识别模型之中,以得到所述目标重识别模型输出的与所述待识别图像对应的目标,所述目标具有对应的目标类别,所述目标类别与所述参考类别相匹配。The reference image and the image to be recognized are respectively input into the target re-identification model trained by the target re-identification model training method according to any one of claims 1-10, so as to obtain the target re-identification The target corresponding to the image to be recognized outputted by the model has a corresponding target category, and the target category matches the reference category.
  12. 一种目标重识别模型的训练装置,包括:A training device for a target re-identification model, comprising:
    第一获取模块,用于获取多个图像,所述多个图像分别具有对应的多种模态和对应的多个标注目标类别;The first acquisition module is configured to acquire a plurality of images, and the plurality of images respectively have corresponding multiple modalities and corresponding multiple labeled target categories;
    第二获取模块,用于获取与所述多种模态分别对应的多个卷积特征图,并获取与所述多种模态分别对应的多个边缘特征图;The second acquisition module is configured to acquire a plurality of convolutional feature maps respectively corresponding to the various modalities, and obtain a plurality of edge feature maps respectively corresponding to the various modalities;
    第三获取模块,用于获取与所述多种模态分别对应的多种特征距离信息;以及A third acquiring module, configured to acquire various kinds of feature distance information respectively corresponding to the various modalities; and
    训练模块,用于根据所述多个图像、所述多个卷积特征图、所述多个边缘特征图、所述多种特征距离信息,以及所述多个标注目标类别训练初始的重识别模型,以得到目标重识别模型。A training module, configured to train initial re-identification according to the plurality of images, the plurality of convolutional feature maps, the plurality of edge feature maps, the plurality of feature distance information, and the plurality of labeled target categories model to get the target re-identification model.
  13. 如权利要求12所述的装置,其中所述训练模块包括:The apparatus of claim 12, wherein said training module comprises:
    第一处理子模块,用于采用所述初始的重识别模型处理所述多个图像,以得到初始损失值;The first processing submodule is used to process the plurality of images using the initial re-identification model to obtain an initial loss value;
    第二处理子模块,用于采用所述初始的重识别模型处理所述多个卷积特征图和所述多个边缘特征图,以得到感知边缘损失值;The second processing submodule is used to process the plurality of convolutional feature maps and the plurality of edge feature maps using the initial re-identification model to obtain a perceptual edge loss value;
    第三处理子模块,用于采用所述初始的重识别模型处理所述多种特征距离信息,以得到跨模态中心对比损失值;The third processing submodule is used to process the various feature distance information using the initial re-identification model to obtain a cross-modal center comparison loss value;
    训练子模块,用于根据所述初始损失值、所述感知边缘损失值、以及所述跨模态中心对比损失值训练所述初始的重识别模型,以得到所述目标重识别模型。The training submodule is configured to train the initial re-identification model according to the initial loss value, the perceptual edge loss value, and the cross-modal center comparison loss value, so as to obtain the target re-identification model.
  14. 如权利要求12或13所述的装置,其中所述初始的重识别模型包括:第一网络结构,所述第一网络结构用于识别所述卷积特征图和所述边缘特征图之间的感知损失值。The device according to claim 12 or 13, wherein the initial re-identification model includes: a first network structure, the first network structure is used to identify the convolutional feature map and the edge feature map between Perceived loss value.
  15. 如权利要求13或14所述的装置,其中所述第二处理子模块具体用于:The device according to claim 13 or 14, wherein the second processing submodule is specifically used for:
    将所述多个卷积特征图和所述多个边缘特征图输入至所述第一网络结构之中,以得到与所述多个卷积特征图分别对应的多个卷积损失特征图,并得到与所述多个边缘特征图分别对应的多个边缘损失特征图;inputting the plurality of convolutional feature maps and the plurality of edge feature maps into the first network structure to obtain a plurality of convolution loss feature maps respectively corresponding to the plurality of convolutional feature maps, And obtain a plurality of edge loss feature maps respectively corresponding to the plurality of edge feature maps;
    确定与所述多个卷积损失特征图分别对应的多个卷积特征图参数,并确定与所述多个边缘损失特征图分别对应的多个边缘特征图参数;determining a plurality of convolution feature map parameters respectively corresponding to the plurality of convolution loss feature maps, and determining a plurality of edge feature map parameters respectively corresponding to the plurality of edge loss feature maps;
    根据所述多个卷积特征图参数处理对应的所述多个卷积损失特征图,以得到多个第一感知边缘损失值;Processing the corresponding plurality of convolution loss feature maps according to the plurality of convolution feature map parameters to obtain a plurality of first perceptual edge loss values;
    根据所述多个边缘特征图参数处理对应的所述多个边缘损失特征图,以得到多个第二感知边缘损失值;以及processing the corresponding plurality of edge loss feature maps according to the plurality of edge feature map parameters to obtain a plurality of second perceptual edge loss values; and
    根据所述多个第一感知边缘损失值和所述多个第二感知边缘损失值,生成所述感知边缘损失值。The perceptual edge loss value is generated based on the plurality of first perceptual edge loss values and the plurality of second perceptual edge loss values.
  16. 如权利要求12至15中任一项所述的装置,其中所述初始的重识别模型包括:批标准化层,所述第三获取模块,包括:The device according to any one of claims 12 to 15, wherein the initial re-identification model comprises: a batch normalization layer, and the third acquisition module comprises:
    标准化处理子模块,用于将所述多个图像分别输入至所述批标准化层之中,以得到所述批标准化层输出的与所述多个图像分别对应的多个特征向量;A normalization processing submodule, configured to input the plurality of images into the batch normalization layer respectively, so as to obtain a plurality of feature vectors respectively corresponding to the plurality of images output by the batch normalization layer;
    中心点确定子模块,用于根据所述多个特征向量,确定与所述多个图像分别对应的多个目标的特征中心点;A central point determination submodule, configured to determine, according to the multiple feature vectors, the feature center points of multiple targets corresponding to the multiple images;
    距离确定子模块,用于确定不同所述目标的特征中心点之间的第一距离,并确定相同所述目标对应于不同所述模态的特征中心点之间的第二距离,所述第一距离和所述第二距离共同构成所述多种特征距离信息。A distance determining submodule, configured to determine a first distance between feature center points of different targets, and determine a second distance between feature center points corresponding to different modes of the same target, the first The first distance and the second distance together constitute the various kinds of characteristic distance information.
  17. 如权利要求16所述的装置,其中第三处理子模块具体用于:The device according to claim 16, wherein the third processing submodule is specifically used for:
    采用所述初始的重识别模型从多个第一距离确定出第一目标距离,所述第一目标距离是所述多个第一距离中值最小的所述第一距离;Using the initial re-identification model to determine a first target distance from multiple first distances, the first target distance is the first distance with the smallest median value among the multiple first distances;
    根据所述第一目标距离和多个所述第二距离,以及所述目标的数量计算得到所述跨模态中心对比损失值。The cross-modal center comparison loss value is calculated according to the first target distance, multiple second distances, and the number of targets.
  18. 如权利要求13至17中任一项所述的装置,其中所述初始的重识别模型包括:顺序连接的全连接层和输出层,所述第一处理子模块,具体用于:The device according to any one of claims 13 to 17, wherein the initial re-identification model includes: a sequentially connected fully connected layer and an output layer, and the first processing submodule is specifically used for:
    将所述多个图像顺序输入至所述全连接层和输出层之中,以得到所述输出层输出的与所述多个图像分别对应的多个类别特征向量;Sequentially input the plurality of images into the fully connected layer and the output layer to obtain a plurality of category feature vectors output by the output layer corresponding to the plurality of images;
    确定与所述多个标注目标类别分别对应的多个编码向量;determining a plurality of encoding vectors respectively corresponding to the plurality of labeling target categories;
    根据所述多个类别特征向量和对应的所述多个编码向量,生成身份损失值,并将所述身份损失值作为所述初始损失值。An identity loss value is generated according to the plurality of category feature vectors and the corresponding encoding vectors, and the identity loss value is used as the initial loss value.
  19. 如权利要求16至18中任一项所述的装置,其中所述第一处理子模块具体用于:The device according to any one of claims 16 to 18, wherein the first processing submodule is specifically used for:
    参考所述多个标注目标类别对所述多个图像进行图像划分,以得到三元样本集合,所述三元样本集合包括:所述多个图像、多个第一图像,以及多个第二图像,所述多个第一图像对应相同所述标注目标类别,所述多个第二图像对应不同所述标注目标类别;performing image division on the plurality of images with reference to the plurality of labeled target categories to obtain a triplet sample set, the triplet sample set including: the plurality of images, the plurality of first images, and the plurality of second images, the multiple first images correspond to the same labeled target category, and the multiple second images correspond to different labeled target categories;
    确定所述图像的特征向量和所述第一图像的特征向量之间的第一欧式距离,所述特征向量由所述批标准化层输出;determining a first Euclidean distance between a feature vector of the image and a feature vector of the first image, the feature vector output by the batch normalization layer;
    确定所述图像的特征向量和所述第二图像的特征向量之间的第二欧式距离;以及determining a second Euclidean distance between the feature vector of the image and the feature vector of the second image; and
    根据多个所述第一欧式距离和多个所述第二欧式距离,确定三元损失值,并将所述三元损失值作为所述初始损失值。A ternary loss value is determined according to the plurality of first Euclidean distances and the plurality of second Euclidean distances, and the ternary loss value is used as the initial loss value.
  20. 如权利要求13至19中任一项所述的装置,其中所述训练子模块具体用于:The device according to any one of claims 13 to 19, wherein the training submodule is specifically used for:
    根据所述初始损失值、所述感知边缘损失值、以及所述跨模态中心对比损失值生成目标损失值;generating a target loss value based on the initial loss value, the perceptual edge loss value, and the cross-modal center comparison loss value;
    如果所述目标损失值满足设定条件,则将训练得到的所述重识别模型作为所述目标重识别模型。If the target loss value satisfies the set condition, the re-identification model obtained through training is used as the target re-identification model.
  21. 如权利要求12至20中任一项所述的装置,其中所述多种模态包括:彩色图像模态和红外图像模态。The device according to any one of claims 12 to 20, wherein said plurality of modalities comprises: a color image modality and an infrared image modality.
  22. 一种目标重识别装置,包括:A target re-identification device, comprising:
    第四获取模块,用于获取参考图像和待识别图像,所述参考图像和所述待识别图像的模态不相同,所述参考图像包括:参考类别;The fourth acquisition module is used to acquire a reference image and an image to be recognized, the modalities of the reference image and the image to be recognized are different, and the reference image includes: a reference category;
    识别模块,用于将所述参考图像和所述待识别图像分别输入至上述权利要求12-21任一项所述的目标重识别模型的训练装置训练得到的目标重识别模型之中,以得到所述目标重识别模型输出的与所述待识别图像对应的目标,所述目标具有对应的目标类别,所述目标类别与所述参考类别相匹配。A recognition module, configured to input the reference image and the image to be recognized into the target re-recognition model trained by the target re-recognition model training device according to any one of claims 12-21, so as to obtain The object corresponding to the image to be recognized outputted by the object re-identification model has a corresponding object category, and the object category matches the reference category.
  23. 一种电子设备,包括:An electronic device comprising:
    至少一个处理器;以及at least one processor; and
    与所述至少一个处理器通信连接的存储器;其中,a memory communicatively coupled to the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理 器执行,以使所述至少一个处理器能够执行权利要求1-10中任一项所述的方法,或者执行权利要求11所述的方法。The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can perform any one of claims 1-10. method, or perform the method described in claim 11.
  24. 一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使所述计算机执行根据权利要求1-10中任一项所述的方法,或者执行权利要求11所述的方法。A non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to make the computer execute the method according to any one of claims 1-10, or execute the method described in claim 11 Methods.
  25. 一种计算机程序产品,其中所述计算机程序产品中包括计算机程序代码,当所述计算机程序代码在计算机上运行时,以执行根据权利要求1-10中任一项所述的方法,或者执行权利要求11所述的方法。A computer program product, wherein the computer program product includes computer program code, when the computer program code is run on a computer, to perform the method according to any one of claims 1-10, or to perform the right The method described in claim 11.
  26. 一种计算机程序,其中所述计算机程序包括计算机程序代码,当所述计算机程序代码在计算机上运行时,以使得计算机执行根据权利要求1-10中任一项所述的方法,或者执行权利要求11所述的方法。A computer program, wherein the computer program includes computer program code, when the computer program code is run on the computer, so that the computer executes the method according to any one of claims 1-10, or executes the claim 11 as described in the method.
PCT/CN2022/099257 2021-07-06 2022-06-16 Target re-recognition model training method and device, and target re-recognition method and device WO2023279935A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110763047.3 2021-07-06
CN202110763047.3A CN113408472B (en) 2021-07-06 2021-07-06 Training method of target re-identification model, target re-identification method and device

Publications (1)

Publication Number Publication Date
WO2023279935A1 true WO2023279935A1 (en) 2023-01-12

Family

ID=77685330

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/099257 WO2023279935A1 (en) 2021-07-06 2022-06-16 Target re-recognition model training method and device, and target re-recognition method and device

Country Status (2)

Country Link
CN (1) CN113408472B (en)
WO (1) WO2023279935A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117350177A (en) * 2023-12-05 2024-01-05 西安热工研究院有限公司 Training method and device for ship unloader path generation model, electronic equipment and medium
CN117670878A (en) * 2024-01-31 2024-03-08 天津市沛迪光电科技有限公司 VOCs gas detection method based on multi-mode data fusion

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113408472B (en) * 2021-07-06 2023-09-26 京东科技信息技术有限公司 Training method of target re-identification model, target re-identification method and device
CN114581838B (en) * 2022-04-26 2022-08-26 阿里巴巴达摩院(杭州)科技有限公司 Image processing method and device and cloud equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109492666A (en) * 2018-09-30 2019-03-19 北京百卓网络技术有限公司 Image recognition model training method, device and storage medium
US20200226421A1 (en) * 2019-01-15 2020-07-16 Naver Corporation Training and using a convolutional neural network for person re-identification
CN112115805A (en) * 2020-08-27 2020-12-22 山东师范大学 Pedestrian re-identification method and system with bimodal hard-excavation ternary-center loss
CN113408472A (en) * 2021-07-06 2021-09-17 京东数科海益信息科技有限公司 Training method of target re-recognition model, target re-recognition method and device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10579880B2 (en) * 2017-08-31 2020-03-03 Konica Minolta Laboratory U.S.A., Inc. Real-time object re-identification in a multi-camera system using edge computing
US11594006B2 (en) * 2019-08-27 2023-02-28 Nvidia Corporation Self-supervised hierarchical motion learning for video action recognition
CN111325115B (en) * 2020-02-05 2022-06-21 山东师范大学 Cross-modal countervailing pedestrian re-identification method and system with triple constraint loss
CN111931627A (en) * 2020-08-05 2020-11-13 智慧互通科技有限公司 Vehicle re-identification method and device based on multi-mode information fusion
CN111931637B (en) * 2020-08-07 2023-09-15 华南理工大学 Cross-modal pedestrian re-identification method and system based on double-flow convolutional neural network
CN112541421A (en) * 2020-12-08 2021-03-23 浙江科技学院 Pedestrian reloading identification method in open space

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109492666A (en) * 2018-09-30 2019-03-19 北京百卓网络技术有限公司 Image recognition model training method, device and storage medium
US20200226421A1 (en) * 2019-01-15 2020-07-16 Naver Corporation Training and using a convolutional neural network for person re-identification
CN112115805A (en) * 2020-08-27 2020-12-22 山东师范大学 Pedestrian re-identification method and system with bimodal hard-excavation ternary-center loss
CN113408472A (en) * 2021-07-06 2021-09-17 京东数科海益信息科技有限公司 Training method of target re-recognition model, target re-recognition method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GAO YAJUN, LIANG TENGFEI, JIN YI, GU XIAOYAN, LIU WU, LI YIDONG, LANG CONGYAN: "MSO: Multi-Feature Space Joint Optimization Network for RGB-Infrared Person Re-Identification ", PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM '21), ACM, 21 October 2021 (2021-10-21), pages 10, XP093022419 *
ZHU YUANXIN; YANG ZHAO; WANG LI; ZHAO SAI; HU XIAO; TAO DAPENG: "Hetero-Center loss for cross-modality person Re-identification", NEUROCOMPUTING, ELSEVIER, AMSTERDAM, NL, vol. 386, 28 December 2019 (2019-12-28), AMSTERDAM, NL , pages 97 - 109, XP086275248, ISSN: 0925-2312, DOI: 10.1016/j.neucom.2019.12.100 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117350177A (en) * 2023-12-05 2024-01-05 西安热工研究院有限公司 Training method and device for ship unloader path generation model, electronic equipment and medium
CN117350177B (en) * 2023-12-05 2024-03-22 西安热工研究院有限公司 Training method and device for ship unloader path generation model, electronic equipment and medium
CN117670878A (en) * 2024-01-31 2024-03-08 天津市沛迪光电科技有限公司 VOCs gas detection method based on multi-mode data fusion
CN117670878B (en) * 2024-01-31 2024-04-26 天津市沛迪光电科技有限公司 VOCs gas detection method based on multi-mode data fusion

Also Published As

Publication number Publication date
CN113408472A (en) 2021-09-17
CN113408472B (en) 2023-09-26

Similar Documents

Publication Publication Date Title
WO2023279935A1 (en) Target re-recognition model training method and device, and target re-recognition method and device
US9965719B2 (en) Subcategory-aware convolutional neural networks for object detection
CN104599275B (en) The RGB-D scene understanding methods of imparametrization based on probability graph model
CN107273458B (en) Depth model training method and device, and image retrieval method and device
CN111104867B (en) Recognition model training and vehicle re-recognition method and device based on part segmentation
US11023714B2 (en) Suspiciousness degree estimation model generation device
US20230041943A1 (en) Method for automatically producing map data, and related apparatus
Cho Weighted intersection over union (wIoU): a new evaluation metric for image segmentation
CN114663371A (en) Image salient target detection method based on modal unique and common feature extraction
CN114820655A (en) Weak supervision building segmentation method taking reliable area as attention mechanism supervision
CN115546553A (en) Zero sample classification method based on dynamic feature extraction and attribute correction
CN114612666A (en) RGB-D semantic segmentation method based on multi-modal contrast learning
Song et al. A novel deep learning network for accurate lane detection in low-light environments
CN105740879B (en) The zero sample image classification method based on multi-modal discriminant analysis
CN114332122A (en) Cell counting method based on attention mechanism segmentation and regression
WO2023160666A1 (en) Target detection method and apparatus, and target detection model training method and apparatus
CN116798070A (en) Cross-mode pedestrian re-recognition method based on spectrum sensing and attention mechanism
Mei et al. Adversarial multiscale feature learning for overlapping chromosome segmentation
CN108229491B (en) Method, device and equipment for detecting object relation from picture
CN113743251B (en) Target searching method and device based on weak supervision scene
CN112487927B (en) Method and system for realizing indoor scene recognition based on object associated attention
CN114332993A (en) Face recognition method and device, electronic equipment and computer readable storage medium
CN112579824A (en) Video data classification method and device, electronic equipment and storage medium
CN114882525B (en) Cross-modal pedestrian re-identification method based on modal specific memory network
CN116052220B (en) Pedestrian re-identification method, device, equipment and medium

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE