WO2023082561A1 - Person re-identification method and system, and electronic device and storage medium - Google Patents

Person re-identification method and system, and electronic device and storage medium Download PDF

Info

Publication number
WO2023082561A1
WO2023082561A1 PCT/CN2022/090217 CN2022090217W WO2023082561A1 WO 2023082561 A1 WO2023082561 A1 WO 2023082561A1 CN 2022090217 W CN2022090217 W CN 2022090217W WO 2023082561 A1 WO2023082561 A1 WO 2023082561A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
loss function
pedestrian
class center
network
Prior art date
Application number
PCT/CN2022/090217
Other languages
French (fr)
Chinese (zh)
Inventor
王立
郭振华
范宝余
赵雅倩
李仁刚
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Publication of WO2023082561A1 publication Critical patent/WO2023082561A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects

Definitions

  • the present application relates to the technical field of deep learning, in particular to a pedestrian re-identification method and system, an electronic device and a storage medium.
  • Deep learning techniques can solve problems in the field of computer vision such as image classification, image segmentation and object detection. With the continuous development of deep learning technology, pedestrian re-identification technology has also made great progress.
  • Re-identification of pedestrians is an important image recognition technology, which is widely used in public security systems, traffic supervision and other fields.
  • Pedestrian re-identification searches cameras distributed in different locations to determine whether pedestrians in different camera fields of view are the same pedestrian.
  • related technologies usually improve the accuracy of pedestrian re-identification technology by building a more complex network structure.
  • deeper, wider or more complex networks usually bring a surge in the amount of parameters and calculations. The increase in the amount of parameters is not conducive to the storage and deployment of portable devices, and the increase in the amount of calculations is not conducive to real-time requirements. application in the scene.
  • the purpose of this application is to provide a method and system for pedestrian re-identification, an electronic device and a storage medium, which can improve the accuracy of pedestrian re-identification without increasing the amount of parameters and computation.
  • the pedestrian re-identification method includes:
  • the knowledge of the auxiliary training model is transferred to the target model to obtain a pedestrian re-identification model
  • an auxiliary training model and a target model based on a convolutional neural network including:
  • auxiliary training model comprising at least two convolutional neural networks
  • target model comprising at least two convolutional neural networks
  • a convolutional neural network comprising at least two head networks to construct the auxiliary training model
  • a convolutional neural network comprising at least two head networks to construct the target model
  • the head network includes a pooling layer, an embedding layer, a fully connected layer, an output layer and a softmax layer.
  • determining the loss function of the auxiliary training model and the target model includes:
  • the cross-entropy loss function of the convolutional neural network is provided, and the cross-entropy loss function is used to calculate the cross-entropy loss of each of the convolutional neural networks;
  • a feature similarity loss function of the convolutional neural network is provided, and the feature similarity loss function is used to calculate the feature similarity loss between any two convolutional neural networks in the at least two convolutional neural networks;
  • a class center loss function of the convolutional neural network is provided, and the class center loss function is used to calculate the class center loss of each of the convolutional neural networks;
  • a loss function that constrains the class center distance of the convolutional neural network is provided, and the loss function that constrains the class center distance is used to calculate each of the convolutional neural networks that constrains the class center distance. loss;
  • the loss functions of the auxiliary training model and the target model are determined according to the cross-entropy loss function, the feature similarity loss function, the class center loss function, and the loss function constraining class center distances.
  • the feature similarity loss function is:
  • L m represents the feature similarity loss
  • N represents the number of samples
  • n represents the nth input sample
  • u and v represent the uth network
  • the vth network represents the output of the embedding layer of the nth input sample of the uth network
  • the class center loss function is:
  • N represents the number of samples
  • n represents the n-th input sample
  • u represents the u-th network
  • the category of the nth sample is the c class
  • the corresponding class center that is, Represents the class center of the c-th class of embedding layer features of the u-th network.
  • the pedestrian re-identification method also includes:
  • Represents the updated class center Represents the embedding feature of the nth sample of the uth network, and the category of the nth sample is the c class, represent The corresponding class center, that is, Represents the class center of the c-th class of the embedding layer features of the u-th network, and ⁇ and ⁇ represent weighted values.
  • the last determined class center and the current output embedding layer features before weighting the last determined class center and the current output embedding layer features, it also includes:
  • said constraining the class center distance includes:
  • the loss function constraining the class center distance is:
  • C represents the number of categories of samples
  • the determination of the auxiliary training model and the The loss function of the target model includes:
  • the cross-entropy loss function adds the cross-entropy loss function, the feature similarity loss function, the class center loss function, and the loss function constraining the class center distance to obtain the auxiliary training model and the The loss function of the target model; wherein, the loss calculated by the auxiliary training model and the loss function of the target model is the cross-entropy loss function, the feature similarity loss function, the class center loss function, and the sum of the losses respectively calculated by the loss function that constrains the class center distance.
  • using the loss function to train the auxiliary training model and the target model includes:
  • Step a Initialize the weights of each network layer in the model to be trained, wherein the model to be trained is any one of the auxiliary training model and the template model;
  • Step b select training data, input the training data to the model to be trained, propagate the training data forward in the model to be trained so that the training data passes through the various network layers in turn, and output Propagate the output value forward;
  • Step c using a loss function to obtain the error between the forward propagation output value and the target value
  • Step d backpropagating the error in the model to be trained to obtain the backpropagation error of each network layer
  • Step e updating the weights of each network layer based on the backpropagation error
  • Step f Repeat steps b to e, and when the error is less than an error threshold, end the training of the model to be trained, or end the training of the model to be trained when the repetition reaches a specified number of times .
  • an auxiliary training model and a target model based on a convolutional neural network includes:
  • the present application also provides a pedestrian re-identification system, which includes:
  • a model training module configured to determine a loss function of the auxiliary training model and the target model, and use the loss function to train the auxiliary training model and the target model;
  • a knowledge transfer module configured to transfer the knowledge of the auxiliary training model to the target model to obtain a pedestrian re-identification model after the training of the auxiliary training model is completed;
  • a feature extraction module configured to input a pedestrian image to the pedestrian re-identification model to obtain the embedded layer features of the pedestrian image
  • the pedestrian re-identification module is used to compare the similarity between the embedded layer features of the pedestrian image and the embedded layer of the image to be queried, and output the pedestrian re-identification result according to the similarity comparison result.
  • the present application also provides a storage medium on which a computer program is stored, and when the computer program is executed, the steps performed by the above-mentioned pedestrian re-identification method are realized.
  • the present application also provides an electronic device, including a memory and a processor, the memory stores a computer program, and the processor implements the steps performed by the above pedestrian re-identification method when calling the computer program in the memory.
  • the present application provides a pedestrian re-identification method, including: constructing an auxiliary training model and a target model based on a convolutional neural network; determining the loss function of the auxiliary training model and the target model, and using the loss function to train the The auxiliary training model and the target model; after the training of the auxiliary training model is completed, transfer the knowledge of the auxiliary training model to the target model to obtain a pedestrian re-identification model; input pedestrians to the pedestrian re-identification model image to obtain the embedded layer features of the pedestrian image; comparing the similarity between the embedded layer features of the pedestrian image and the embedded layer of the image to be queried, and outputting a pedestrian re-identification result according to the similarity comparison result.
  • the application constructs an auxiliary training model and a target model based on a convolutional neural network, and determines the loss functions of the auxiliary training model and the target model, and then uses the loss function to train the auxiliary training model and the target model.
  • the application transfers the knowledge learned in the auxiliary training model to the target model through knowledge transfer to obtain a pedestrian re-identification model. Since the pedestrian re-identification model includes knowledge learned from the auxiliary training model and the target model, the accuracy of the pedestrian re-identification model can be improved without additional reasoning costs. Therefore, the present application can improve the accuracy of pedestrian re-identification without increasing the amount of parameters and computation.
  • the present application also provides a pedestrian re-identification system, an electronic device and a storage medium, which have the above-mentioned beneficial effects and will not be repeated here.
  • FIG. 1 is a flowchart of a pedestrian re-identification method provided by an embodiment of the present application
  • Fig. 2 is the schematic diagram of the first kind of model provided by the embodiment of the present application.
  • FIG. 3 is a schematic diagram of the second model provided by the embodiment of the present application.
  • Fig. 4 is a schematic diagram of a model retention result provided by the embodiment of the present application.
  • FIG. 5 is a schematic diagram of a pedestrian re-identification application provided by the embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a pedestrian re-identification system provided by an embodiment of the present application.
  • Deeper, wider or more complex networks usually bring a surge in the number of parameters, which is not conducive to the storage and deployment of portable devices. For example, to realize the deployment of a real-time pedestrian detection and recognition program in a network camera, the network needs to have a small amount of parameters (easy to store) and a high recognition accuracy.
  • Deeper, wider or more complex networks usually increase the amount of calculation, which is not conducive to the application of scenarios with high real-time requirements. For example: retrieval and tracking of criminal suspects. A large calculation delay will cause the entire system to miss the best opportunity, and have a negative impact on system functions. Therefore, how to improve the performance of the network without increasing the amount of parameters and calculations has become a problem that needs to be solved.
  • this embodiment proposes a method for constructing, training, and inferring a convolutional neural network based on knowledge supervision.
  • This method can realize knowledge transfer without increasing the amount of parameters and calculations (large model knowledge transfer to Small model migration), maximize the mining of network potential, and improve network performance.
  • multiple results for the same image can assist each other, so that more accurate results can be obtained by utilizing the knowledge learned by the group.
  • the multiple results include both final results and intermediate results.
  • This embodiment is based on the idea of knowledge-supervised learning.
  • one or more networks are established, and the networks realize knowledge transfer and improve the generalization ability of each model through mutual supervised learning.
  • FIG. 1 is a flow chart of a pedestrian re-identification method provided by an embodiment of the present application.
  • S101 Construct an auxiliary training model and a target model based on a convolutional neural network
  • this embodiment can establish an auxiliary training model and a target model including one or more convolutional neural networks based on the idea of knowledge-supervised learning.
  • the above-mentioned convolutional neural networks can realize knowledge transfer through mutual supervised learning to improve The generalization ability of each model.
  • the present application can construct the auxiliary training model and the target model based on the convolutional neural network in the following manner: construct the auxiliary training model including at least two convolutional neural networks, and construct the auxiliary training model including at least two convolutional neural networks.
  • the target model of the convolutional neural network or, utilize the convolutional neural network comprising at least two head networks to construct the auxiliary training model, utilize the convolutional neural network comprising at least two head networks to construct the target model;
  • the head network includes a pooling layer, an embedding layer, a fully connected layer, an output layer and a softmax layer.
  • FIG. 2 is a schematic diagram of the first model provided by the embodiment of the present application.
  • the schematic diagram of the model shows the implementation of establishing an auxiliary training model and a target model including two convolutional neural networks.
  • the auxiliary training model and /or the target model may be the model shown in FIG. 2 .
  • two convolutional neural networks Net1 and Net2 are established.
  • the two convolutional neural networks can be isomorphic or heterogeneous.
  • the output of the network can reduce the dimensionality of the feature map (Batchsize ⁇ Channel ⁇ H ⁇ W) into a vector through the Pooling layer.
  • e1 and e2 are used to represent the embedded layer features, and the dimensions of e1 and e2 are Batchsize ⁇ Channel.
  • the above model includes a backbone model and a head model.
  • the backbone network is used to extract features, and the head network is used to realize classification and loss function calculation.
  • the head network includes a pooling layer pool, an embedding layer, a fully connected layer fc, an output layer, and a softmax layer.
  • the head network can use the triplet loss function Triplet loss and the cross-entropy loss function for parameter adjustment.
  • Fig. 3 is a schematic diagram of the second model provided by the embodiment of the present application.
  • the schematic diagram of the model shows the implementation of establishing an auxiliary training model and a target model including a convolutional neural network.
  • the model is multi-headed Convolutional neural network (i.e. there are multiple head networks).
  • the auxiliary training model in this embodiment may be the model shown in any one of FIG. 2 and FIG. 3
  • the target training model may also be the model shown in any one of FIG. 2 and FIG. 3
  • the complexity of the above-mentioned auxiliary training model is higher than that of the target model, and the complexity of the model can be measured by the amount of parameters and computation of the model.
  • the auxiliary training model and the target model based on a convolutional neural network can be constructed according to preset rules; wherein, the preset rule is that the model complexity of the auxiliary training model is higher than that of the The model complexity of the target model.
  • S102 Determine a loss function of the auxiliary training model and the target model, and use the loss function to train the auxiliary training model and the target model;
  • the loss functions of the auxiliary training model and the target model can be the same, specifically, the loss functions of the two can be determined in the following manner: providing the cross-entropy loss function of the convolutional neural network, the cross-entropy loss The function is used to calculate the cross entropy loss of each of the convolutional neural networks; the feature similarity loss function of the convolutional neural network is provided, and the feature similarity loss function is used to calculate the at least two convolutional neural networks The feature similarity loss between any two convolutional neural networks; the class center loss function of the convolutional neural network is provided, and the class center loss function is used to calculate the class center loss of each of the convolutional neural networks ; Provide the loss function of the convolutional neural network that constrains the class center distance, and the loss function that constrains the class center distance is used to calculate each of the convolutional neural networks that constrains the class center distance The loss of the loss; according to the cross-entropy loss function, the feature similarity loss function, the class center loss
  • the features of the embedding layer are finally used for feature matching retrieval, so constrained optimization of them is of great significance for the pedestrian re-identification task.
  • This embodiment designs a new loss function, and the implementation steps of this function are as follows:
  • Loss function (1) Calculation process: Add a fully connected layer (fc) after the embedding layer to obtain the fully connected layer features, and perform softmax normalization on the fully connected layer features, and finally calculate the loss through the cross entropy loss function
  • the superscript 1 represents the first branch.
  • Net1 and Net2 have different network structures and initialization weight coefficients, and their e1 and e2 have diversity, but their commonality is that they have excellent expressive ability for pedestrians. In order to take advantage of their commonality, they suppress noise.
  • This embodiment provides a feature similarity loss function to implement a mutual learning mechanism, that is, e1 learns from e2, and e2 learns from e1 to obtain the feature similarity between the convolutional neural network Net1 and the convolutional neural network Net2 Loss L m .
  • n the nth input sample
  • u and v represent the uth network and the vth network.
  • All samples of each batch are traversed. As mentioned above, assuming that the samples of each batch contain N samples, it is traversed N times. The samples are passed through each network in turn, and the output results of the samples in the embedding layer of each network are obtained. For example, for sample x n , assuming there are 2 networks, there are 2 embedding layer output results and Similarly, if there are 3 networks, there are 3 embedding layer outputs.
  • a pairwise traversal is performed. For example, there are two networks 1 and 2 in this embodiment. Use the above formula to calculate the feature similarity loss L e between the two networks. In the same way, assuming that there are 3 networks, there are 3 combinations without repeated traversal: (1, 2) (1, 3) (2, 3), and the feature similarity loss function L e ( u,v).
  • Loss function (3) Calculation process: The features of e1 and e2 are similar, and have the following defects in the mutual learning process. For example: in the initial stage of the training process, the prediction of the network model is very inaccurate, and its network features e1 and e2 have large deviations and noises. The mutual learning between e1 and e2 may be inaccurate features to inaccurate features. Learning, may not have a good effect. In order to suppress the noise, this embodiment proposes an embedded feature optimization method, which can effectively reduce the noise of the embedded feature by using the class center loss function. The specific implementation is as follows:
  • the core idea of constructing the class center loss function is: the embedding layer features of each image are learned from their respective class centers. Because the various centers of image samples are relatively stable, it can effectively suppress the deviation caused by embedding layer features learning from other branches.
  • the learning method is: Find the class centers of all types of samples, input all samples x N into each network in turn, and obtain the embedding features of all samples and The superscripts 1 and 2 represent different branches, and the subscript N represents a total of N samples. For the output of each network, the class center of the sample is calculated separately.
  • the u-th network learns each sample from the embedding layer class center corresponding to its sample category, and finally obtains the class center loss as follows: in, representative sample The corresponding class center. Traverse the features of each network in turn, and calculate the loss of each network by the class center loss function and
  • the update method of the class center can use the method of first-in-first-out stack, and the class center of the n-step samples closest to the current step is obtained as the real class center.
  • the class center may also be selected according to the classification probability corresponding to the embedding feature. That is: first judge whether the classification using this feature is correct, and if it is correct, it will be included in the calculation of the class center.
  • Loss function (4) Calculation process: For each network, the position of the class centers can be further constrained so that the centers of each class can be separated as much as possible, which is beneficial to distinguish different pedestrian characteristics and improve the discriminability of the network. Even if each pedestrian feature can be better separated.
  • a loss function that constrains the class center distance (class center distance) can be constructed:
  • Difficult sample mining is not the mean value of the inter-class differences of all classes, but the minimum inter-class difference (ie, the smallest class center distance) of all classes.
  • the loss functions (1)-(4) are combined to obtain the total loss function:
  • L loss is the total loss of the model, is the cross-entropy loss of the first convolutional neural network, is the cross-entropy loss of the second convolutional neural network, L m is the feature similarity loss of the first convolutional neural network and the second convolutional neural network, is the class center loss of the first convolutional neural network, is the cross-entropy loss of the second convolutional neural network, is the loss that constrains the class center distance for the first convolutional neural network, The loss that constrains the class center distance for the second convolutional neural network.
  • This embodiment provides a network structure for multi-model knowledge collaborative training, which protects and combines the above mutual learning loss, class center loss, and class center optimization loss functions for supervised learning training.
  • the multi-model knowledge supervised training method mines the features in the network embedding layer, improves the discrimination of the embedding layer features, and deletes redundant models during reasoning, so no additional reasoning costs are required to improve accuracy.
  • This method is in the field of image classification have a broad vision of application.
  • the model training idea of this embodiment is as follows: (1) according to different network structures, build a plurality of network models for training, usually select a larger model (i.e., auxiliary training model) and a smaller model (i.e. , target model) to achieve knowledge transfer. Calculate cross-entropy loss, mutual learning loss, class-centered loss, and class-centered optimization loss for all network models.
  • the cross entropy loss is calculated by the cross entropy loss function
  • the mutual learning loss is calculated by the feature similarity loss function
  • the class center loss is calculated by the class center loss function
  • the class center optimization loss is calculated by the loss function that constrains the class center distance. According to the above loss function, the network is trained to converge.
  • the convolutional neural network training process is as follows: The training process of the convolutional neural network is divided into two stages. The first stage is the stage in which the data propagates from the low level to the high level, that is, the forward propagation stage. The other stage is that when the results obtained by the forward propagation do not match the expectations, the stage of propagating the error from the high level to the bottom level is the stage of backpropagation.
  • the training process includes the following steps:
  • Step 1 The network layer weights are initialized, generally using random initialization
  • Step 2 The input image data is forward-propagated through the convolution layer, down-sampling layer, fully connected layer and other layers to obtain the output value;
  • Step 3 Calculate the error between the output value of the network and the target value (label):
  • Step 4 The error is reversely transmitted back to the network, and the backpropagation error of each layer of the network: fully connected layer, convolutional layer, etc. is obtained in turn.
  • Step 5 Each layer of the network adjusts all weight coefficients in the network according to the backpropagation error of each layer, that is, updates the weights.
  • Step 6 Randomly select new image data again, and then enter the second step to obtain the output value from the forward propagation of the network.
  • Step 7 Infinite reciprocating iterations. When the error between the output value of the network and the target value (label) is found to be less than a certain threshold, or the number of iterations exceeds a certain threshold, the training ends.
  • Step 8 Save the trained network parameters of all layers and store the trained weights.
  • the constraining the class center distance includes: finding the minimum inter-class difference of each class center through difficult sample mining; using the minimum inter-class difference to constrain the class center distance.
  • the accuracy of training and reasoning of the neural network is improved without increasing the amount of parameters and the amount of calculation of the network during reasoning.
  • the auxiliary training model has learned the knowledge information about pedestrian re-identification.
  • This embodiment can transfer the above knowledge information to the target model through knowledge transfer.
  • the training is completed, and the transfer
  • the target model with the knowledge of the auxiliary training model is used as the person re-identification model.
  • the above knowledge refers to features in the network, and this embodiment of multiple views of the same data will provide additional regularization information, thereby improving network accuracy.
  • S104 Input the pedestrian image to the pedestrian re-identification model to obtain the embedded layer features of the pedestrian image;
  • the pedestrian image is input to the pedestrian re-identification model to obtain the embedded layer features of each pedestrian image.
  • S105 Compare the similarity between the embedded layer feature of the pedestrian image and the embedded layer of the image to be queried, and output a pedestrian re-identification result according to the similarity comparison result.
  • this embodiment can compare the similarity between the embedded layer features of the pedestrian image and the embedded layer of the image to be queried, and determine the pedestrian image with the highest similarity according to the similarity comparison result, so that the pedestrian image with the highest similarity can be used as the pedestrian image. re-identification results.
  • an auxiliary training model and a target model based on a convolutional neural network are constructed, and loss functions of the auxiliary training model and the target model are determined, and then the auxiliary training model and the target model are trained using the loss function.
  • the knowledge learned in the auxiliary training model is transferred to the target model through knowledge transfer to obtain a pedestrian re-identification model. Since the pedestrian re-identification model includes knowledge learned from the auxiliary training model and the target model, the accuracy of the pedestrian re-identification model can be improved without additional reasoning costs. Therefore, this embodiment can improve the accuracy of pedestrian re-identification without increasing the amount of parameters and computation.
  • FIG. 4 is a schematic diagram of a model retention result provided by the embodiment of the present application.
  • the reasoning process provided in this embodiment is as follows: remove all auxiliary training models, retain only one network model (ie, the target model), load pre-trained weights, classify images or extract image features.
  • FIG. 5 is a schematic diagram of a pedestrian re-identification application provided by an embodiment of the present application.
  • conv represents the convolutional layer
  • bottleneck represents the bottleneck layer
  • the bottleneck layer represents a specific network structure of ResNet.
  • the input images 1, 2, 3 and the image to be queried are input into the network to obtain the embedding layer features in the network, and the images 1, 2, and 3 constitute the query data set for the pedestrian re-identification task.
  • the image to be queried is also input into the network to obtain the embedding layer features of the image to be queried.
  • the comparison method is to find the distance between the embedding layer features of the image to be queried and all the features in the query data set, that is, the distance between the vector and the query data sample with the smallest distance It is the same person as the image to be queried.
  • the present invention proposes a new embedding feature mining method and a multi-model collaborative training method, and establishes a basis for feature mining by establishing multiple neural network models. Embedding mining between branches is realized by mutual learning of embedding features between two models and constructing a new type of loss function. At the same time, the loss function learned from each classification center is combined with the embedding features in the branch, and combined into a new loss function to train the entire network.
  • the training method proposed in this embodiment does not increase the amount of parameters and calculations during network inference. By optimizing the training process, the potential of the network is tapped so that it can achieve optimal performance, thereby showing better results in the inference process.
  • the present invention proposes a method of embedding feature mining based on multi-model knowledge supervision and collaborative training, which can improve the accuracy of pedestrian re-identification without increasing the amount of parameters and computation.
  • FIG. 6 is a schematic structural diagram of a pedestrian re-identification system provided by an embodiment of the present application.
  • the system may include:
  • Model construction module 601 for constructing the auxiliary training model and target model based on convolutional neural network
  • a model training module 602 configured to determine a loss function of the auxiliary training model and the target model, and use the loss function to train the auxiliary training model and the target model;
  • a knowledge transfer module 603, configured to transfer the knowledge of the auxiliary training model to the target model to obtain a pedestrian re-identification model after the training of the auxiliary training model is completed;
  • a feature extraction module 604 configured to input a pedestrian image to the pedestrian re-identification model to obtain the embedded layer features of the pedestrian image;
  • the pedestrian re-identification module 605 is configured to perform a similarity comparison between the embedded layer features of the pedestrian image and the embedded layer of the image to be queried, and output a pedestrian re-identification result according to the similarity comparison result.
  • an auxiliary training model and a target model based on a convolutional neural network are constructed, and loss functions of the auxiliary training model and the target model are determined, and then the auxiliary training model and the target model are trained using the loss function.
  • the knowledge learned in the auxiliary training model is transferred to the target model through knowledge transfer to obtain a pedestrian re-identification model. Since the pedestrian re-identification model includes knowledge learned from the auxiliary training model and the target model, the accuracy of the pedestrian re-identification model can be improved without additional reasoning costs. Therefore, this embodiment can improve the accuracy of pedestrian re-identification without increasing the amount of parameters and computation.
  • the model construction module 601 is configured to construct the auxiliary training model comprising at least two convolutional neural networks, construct the target model comprising at least two convolutional neural networks;
  • the convolutional neural network of the head network constructs the auxiliary training model, and utilizes the convolutional neural network comprising at least two head networks to construct the target model; wherein, the head network includes a pooling layer, an embedding layer, a full Connection layer, output layer and softmax layer.
  • the model training module 602 is used to provide the cross-entropy loss function of the convolutional neural network, and the cross-entropy loss function is used to calculate the cross-entropy loss of each of the convolutional neural networks;
  • the feature similarity loss function of the convolutional neural network is used to calculate the feature similarity loss between any two convolutional neural networks in the at least two convolutional neural networks; also used for The class center loss function of the convolutional neural network is provided, and the class center loss function is used to calculate the class center loss of each of the convolutional neural networks; it is also used to provide the class center of the convolutional neural network.
  • a loss function that is constrained by distance, and the loss function that constrains the class center distance is used to calculate the loss of each of the convolutional neural networks that constrains the class center distance; , the feature similarity loss function, the class center loss function, and the loss function constraining the class center distance determine the loss functions of the auxiliary training model and the target model.
  • the class center update module is used to perform weighted calculation on the latest determined class center and the currently output embedding layer features to obtain the updated class center.
  • Judgment module used to judge whether the feature classification corresponding to the embedded layer feature of the current output is correct before the weighted calculation of the latest determined class center and the current output embedded layer feature; if so, enter the latest determined The step of weighting the class center and the current output embedding layer features.
  • the constraining the class center distance includes: finding the minimum inter-class difference of each class center through difficult sample mining; using the minimum inter-class difference to constrain the class center distance .
  • the model construction module 601 is configured to construct the auxiliary training model and the target model based on a convolutional neural network according to preset rules; wherein, the preset rule is the model complexity of the auxiliary training model Model complexity higher than the target model.
  • the present application also provides a storage medium on which a computer program is stored. When the computer program is executed, the steps provided in the above-mentioned embodiments can be realized.
  • the storage medium may include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes.
  • the present application also provides an electronic device, which may include a memory and a processor, where a computer program is stored in the memory, and when the processor invokes the computer program in the memory, the steps provided in the above embodiments can be implemented.
  • the electronic device may also include various network interfaces, power supplies and other components.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Image Analysis (AREA)

Abstract

A person re-identification method, comprising: constructing an auxiliary training model and a target model, which are based on convolutional neural networks; determining loss functions of the auxiliary training model and the target model, and training the auxiliary training model and the target model by using the loss functions; after the training of the auxiliary training model is completed, migrating knowledge of the auxiliary training model to the target model, so as to obtain a person re-identification model; inputting a person image into the person re-identification model to obtain an embedding layer feature of the person image; and performing similarity comparison on the embedding layer feature of the person image and an embedding layer feature of an image to be queried, and outputting a person re-identification result according to a similarity comparison result. By means of the method, the accuracy of person re-identification can be improved without increasing a parameter amount and a calculation amount.

Description

一种行人重识别方法、系统、电子设备及存储介质A pedestrian re-identification method, system, electronic device and storage medium
本申请要求在2021年11月15日提交中国专利局、申请号为202111344388.3、发明名称为“一种行人重识别方法、系统、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application submitted to the China Patent Office on November 15, 2021, with the application number 202111344388.3, and the title of the invention is "A Pedestrian Re-identification Method, System, Electronic Device and Storage Medium", the entire content of which Incorporated in this application by reference.
技术领域technical field
本申请涉及深度学习技术领域,特别涉及一种行人重识别方法、系统、一种电子设备及一种存储介质。The present application relates to the technical field of deep learning, in particular to a pedestrian re-identification method and system, an electronic device and a storage medium.
背景技术Background technique
深度学习技术可以解决图片分类、图像分割和物体检测等计算机视觉领域的问题。随着深度学习技术的不断发展,行人重识别技术也取得了长足的进步。Deep learning techniques can solve problems in the field of computer vision such as image classification, image segmentation and object detection. With the continuous development of deep learning technology, pedestrian re-identification technology has also made great progress.
行人重识别(Re-ID)是一种重要的图像识别技术,广泛应用于公安系统、交通监管等领域。行人重识别对分布在不同位置的摄像头进行搜索来确定不同摄像头视野中的行人是否是同一个行人。为了进一步提高网络性能,相关技术通常通过构建更为复杂的网络结构提高行人重识别技术的准确率。但是,更深、更宽或更为复杂的网络通常会带来参数量和计算量的激增,参数量的增加不利于便携式设备的存储与部署,计算量的增加不利于在实时性要求较高的场景中应用。Re-identification of pedestrians (Re-ID) is an important image recognition technology, which is widely used in public security systems, traffic supervision and other fields. Pedestrian re-identification searches cameras distributed in different locations to determine whether pedestrians in different camera fields of view are the same pedestrian. In order to further improve network performance, related technologies usually improve the accuracy of pedestrian re-identification technology by building a more complex network structure. However, deeper, wider or more complex networks usually bring a surge in the amount of parameters and calculations. The increase in the amount of parameters is not conducive to the storage and deployment of portable devices, and the increase in the amount of calculations is not conducive to real-time requirements. application in the scene.
因此,如何在不提高参数量和计算量的前提下,提高行人重识别的准确率是本领域技术人员目前需要解决的技术问题。Therefore, how to improve the accuracy of pedestrian re-identification without increasing the amount of parameters and calculation is a technical problem that those skilled in the art need to solve.
发明内容Contents of the invention
本申请的目的是提供一种行人重识别方法、系统、一种电子设备及一种存储介质,能够在不提高参数量和计算量的前提下,提高行人重识别的准确率。The purpose of this application is to provide a method and system for pedestrian re-identification, an electronic device and a storage medium, which can improve the accuracy of pedestrian re-identification without increasing the amount of parameters and computation.
为解决上述技术问题,本申请提供一种行人重识别方法,该行人重识别方法包括:In order to solve the above technical problems, this application provides a pedestrian re-identification method, the pedestrian re-identification method includes:
构建基于卷积神经网络的辅助训练模型和目标模型;Build auxiliary training models and target models based on convolutional neural networks;
确定所述辅助训练模型和所述目标模型的损失函数,并利用所述损失函数训练所述辅助训练模型和所述目标模型;determining a loss function of the auxiliary training model and the target model, and using the loss function to train the auxiliary training model and the target model;
在所述辅助训练模型训练完毕后,将所述辅助训练模型的知识迁移至所述目标模型,得到行人重识别模型;After the auxiliary training model is trained, the knowledge of the auxiliary training model is transferred to the target model to obtain a pedestrian re-identification model;
向所述行人重识别模型输入行人图像,得到所述行人图像的嵌入层特征;Input the pedestrian image to the pedestrian re-identification model to obtain the embedded layer features of the pedestrian image;
将所述行人图像的嵌入层特征与待查询图像的嵌入层进行相似度比对,并根据相似度比对结果输出行人重识别结果。Comparing the features of the embedded layer of the pedestrian image with the embedded layer of the image to be queried for similarity comparison, and outputting a pedestrian re-identification result according to the similarity comparison result.
可选的,构建基于卷积神经网络的辅助训练模型和目标模型,包括:Optionally, construct an auxiliary training model and a target model based on a convolutional neural network, including:
构建包括至少两个卷积神经网络的所述辅助训练模型,构建包括至少两个卷积神经网络的所述目标模型;Constructing the auxiliary training model comprising at least two convolutional neural networks, constructing the target model comprising at least two convolutional neural networks;
或,利用包括至少两个头部网络的卷积神经网络构建所述辅助训练模型,利用包括至少两个头部网络的卷积神经网络构建所述目标模型;Or, using a convolutional neural network comprising at least two head networks to construct the auxiliary training model, and utilizing a convolutional neural network comprising at least two head networks to construct the target model;
其中,所述头部网络包括池化层、嵌入层、全连接层、输出层和softmax层。Wherein, the head network includes a pooling layer, an embedding layer, a fully connected layer, an output layer and a softmax layer.
可选的,确定所述辅助训练模型和所述目标模型的损失函数,包括:Optionally, determining the loss function of the auxiliary training model and the target model includes:
提供所述卷积神经网络的交叉熵损失函数,所述交叉熵损失函数用于算出每一所述卷积神经网络的交叉熵损失;The cross-entropy loss function of the convolutional neural network is provided, and the cross-entropy loss function is used to calculate the cross-entropy loss of each of the convolutional neural networks;
提供所述卷积神经网络的特征相似性损失函数,所述特征相似性损失函用于计算所述至少两个卷积神经网络中的任意两个卷积神经网络间的特征相似性损失;A feature similarity loss function of the convolutional neural network is provided, and the feature similarity loss function is used to calculate the feature similarity loss between any two convolutional neural networks in the at least two convolutional neural networks;
提供所述卷积神经网络的类中心损失函数,所述类中心损失函数用于计算每一所述卷积神经网络的类中心损失;A class center loss function of the convolutional neural network is provided, and the class center loss function is used to calculate the class center loss of each of the convolutional neural networks;
提供所述卷积神经网络的、对类中心距离进行约束的损失函数,所述对类中心距离进行约束的损失函数用于计算每一所述卷积神经网络的、对类中心距离进行约束的损失;A loss function that constrains the class center distance of the convolutional neural network is provided, and the loss function that constrains the class center distance is used to calculate each of the convolutional neural networks that constrains the class center distance. loss;
根据所述交叉熵损失函数、所述特征相似性损失函数、所述类中心损失函数、以及所述对类中心距离进行约束的损失函数确定所述辅助训练模型和所述目标模型的损失函数。The loss functions of the auxiliary training model and the target model are determined according to the cross-entropy loss function, the feature similarity loss function, the class center loss function, and the loss function constraining class center distances.
可选地,所述特征相似性损失函数为:Optionally, the feature similarity loss function is:
Figure PCTCN2022090217-appb-000001
Figure PCTCN2022090217-appb-000001
其中,L m代表特征相似性损失,N代表样本数,n代表第n个输入样本,u和v代表第u个网络和第v个网络,
Figure PCTCN2022090217-appb-000002
代表第u个网络的第n个输入样本的嵌入层输出结果,
Figure PCTCN2022090217-appb-000003
代表第v个网络的第n个输入样本的嵌入层输出结果。
Among them, L m represents the feature similarity loss, N represents the number of samples, n represents the nth input sample, u and v represent the uth network and the vth network,
Figure PCTCN2022090217-appb-000002
represents the output of the embedding layer of the nth input sample of the uth network,
Figure PCTCN2022090217-appb-000003
Represents the output of the embedding layer for the nth input sample of the vth network.
可选地,所述类中心损失函数为:Optionally, the class center loss function is:
Figure PCTCN2022090217-appb-000004
Figure PCTCN2022090217-appb-000004
其中,
Figure PCTCN2022090217-appb-000005
代表第u个网络的类中心损失,N代表样本数,n代表第n个输入样本,u代表第u个网络,
Figure PCTCN2022090217-appb-000006
代表第u个网络的第n个样本的嵌入特征,且,该第n个样本的类别是c类,
Figure PCTCN2022090217-appb-000007
代表
Figure PCTCN2022090217-appb-000008
所对应的类中心,即,
Figure PCTCN2022090217-appb-000009
代表第u个网络的嵌入层特征的第c类的类中心。
in,
Figure PCTCN2022090217-appb-000005
Represents the class center loss of the u-th network, N represents the number of samples, n represents the n-th input sample, u represents the u-th network,
Figure PCTCN2022090217-appb-000006
Represents the embedding feature of the nth sample of the uth network, and the category of the nth sample is the c class,
Figure PCTCN2022090217-appb-000007
represent
Figure PCTCN2022090217-appb-000008
The corresponding class center, that is,
Figure PCTCN2022090217-appb-000009
Represents the class center of the c-th class of embedding layer features of the u-th network.
可选的,所述行人重识别方法还包括:Optionally, the pedestrian re-identification method also includes:
对最近一次确定的类中心和当前输出的嵌入层特征进行加权计算,得到更新后的类中心,即,通过以下公式:Perform weighted calculations on the latest determined class center and the current output embedding layer features to obtain the updated class center, that is, through the following formula:
Figure PCTCN2022090217-appb-000010
Figure PCTCN2022090217-appb-000010
其中,
Figure PCTCN2022090217-appb-000011
代表更新后的类中心
Figure PCTCN2022090217-appb-000012
代表第u个网络的第n个样本的嵌入特征,且,该第n个样本的类别是c类,
Figure PCTCN2022090217-appb-000013
代表
Figure PCTCN2022090217-appb-000014
所对应的类中心,即,
Figure PCTCN2022090217-appb-000015
代表第u个网络的嵌入层特征的第c类的类中心,α、β代表加权值。
in,
Figure PCTCN2022090217-appb-000011
Represents the updated class center
Figure PCTCN2022090217-appb-000012
Represents the embedding feature of the nth sample of the uth network, and the category of the nth sample is the c class,
Figure PCTCN2022090217-appb-000013
represent
Figure PCTCN2022090217-appb-000014
The corresponding class center, that is,
Figure PCTCN2022090217-appb-000015
Represents the class center of the c-th class of the embedding layer features of the u-th network, and α and β represent weighted values.
可选的,在对最近一次确定的类中心和当前输出的嵌入层特征进行加权计算之前,还包括:Optionally, before weighting the last determined class center and the current output embedding layer features, it also includes:
判断所述当前输出的嵌入层特征对应的特征分类是否正确;Judging whether the feature classification corresponding to the currently output embedding layer feature is correct;
若是,则进入对最近一次确定的类中心和当前输出的嵌入层特征进行加权计算的步骤。If yes, then enter the step of weighting the last determined class center and the current output embedding layer features.
可选的,所述所述对类中心距离进行约束包括:Optionally, said constraining the class center distance includes:
通过艰难样本挖掘,求出各个类中心的最小类间差;Through difficult sample mining, find the minimum inter-class difference of each class center;
利用所述最小类间差,对类中心距离进行约束。Using the minimum inter-class difference, constrain the class center distance.
可选地,所述对类中心距离进行约束的损失函数为:Optionally, the loss function constraining the class center distance is:
Figure PCTCN2022090217-appb-000016
Figure PCTCN2022090217-appb-000016
其中,
Figure PCTCN2022090217-appb-000017
代表第u个卷积神经网络的对类中心距离进行约束的损失,C代表样本的类别数,
Figure PCTCN2022090217-appb-000018
代表第u个网络的第i类的类中心,
Figure PCTCN2022090217-appb-000019
代表距离
Figure PCTCN2022090217-appb-000020
最近的类中心。
in,
Figure PCTCN2022090217-appb-000017
Represents the loss of the u-th convolutional neural network that constrains the class center distance, C represents the number of categories of samples,
Figure PCTCN2022090217-appb-000018
represents the class center of the i-th class of the u-th network,
Figure PCTCN2022090217-appb-000019
representative distance
Figure PCTCN2022090217-appb-000020
nearest class center.
可选地,所述根据所述交叉熵损失函数、所述特征相似性损失函数、所述类中心损失函数、以及所述对类中心距离进行约束的损失函数确定所述辅助训练模型和所述目标模型的损失函数包括:Optionally, the determination of the auxiliary training model and the The loss function of the target model includes:
可选地,将所述交叉熵损失函数、所述特征相似性损失函数、所述类中心损失函数、以及所述对类中心距离进行约束的损失函数相加,获得所述辅助训练模型和所述目标模型的损失函数;其中,所述辅助训练模型和所述目标模型的损失函数所计算出的损失为所述交叉熵损失函数、所述特征相似性损失函数、所述类中心损失函数、以及所述对类中心距离进行约束的损失函数所分别计算出的损失之和。Optionally, add the cross-entropy loss function, the feature similarity loss function, the class center loss function, and the loss function constraining the class center distance to obtain the auxiliary training model and the The loss function of the target model; wherein, the loss calculated by the auxiliary training model and the loss function of the target model is the cross-entropy loss function, the feature similarity loss function, the class center loss function, and the sum of the losses respectively calculated by the loss function that constrains the class center distance.
可选地,所述利用所述损失函数训练所述辅助训练模型和所述目标模型包括:Optionally, using the loss function to train the auxiliary training model and the target model includes:
步骤a:对所述待训练模型中的各个网络层的权重进行初始化,其中,所述待训练模型为所述辅助训练模型和所述模板模型中的任一;Step a: Initialize the weights of each network layer in the model to be trained, wherein the model to be trained is any one of the auxiliary training model and the template model;
步骤b:选取训练数据,向所述待训练模型输入所述训练数据,将所述训练数据在所述待训练模型中前向传播而使所述训练数据依次通过所述各个网络层,并输出前向传播输出值;Step b: select training data, input the training data to the model to be trained, propagate the training data forward in the model to be trained so that the training data passes through the various network layers in turn, and output Propagate the output value forward;
步骤c:利用损失函数求得所述前向传播输出值与目标值之间的误差;Step c: using a loss function to obtain the error between the forward propagation output value and the target value;
步骤d:将所述误差在所述待训练模型中反向传播而求得所述各个网络层的反向传播误差;Step d: backpropagating the error in the model to be trained to obtain the backpropagation error of each network layer;
步骤e:基于所述反向传播误差更新所述各个网络层的权重;Step e: updating the weights of each network layer based on the backpropagation error;
步骤f:重复所述步骤b至步骤e,并在所述误差小于误差阈值时,结束所述待训练模型的训练,或者,在所述重复达到指定次数时,结束所述待训练模型的训练。Step f: Repeat steps b to e, and when the error is less than an error threshold, end the training of the model to be trained, or end the training of the model to be trained when the repetition reaches a specified number of times .
可选的,所述构建基于卷积神经网络的辅助训练模型和目标模型,包括:Optionally, the construction of an auxiliary training model and a target model based on a convolutional neural network includes:
按照预设规则构建基于卷积神经网络的所述辅助训练模型和所述目标模型;其中,所述预设规则为所述辅助训练模型的模型复杂度高于所述目标模型的模型复杂度。Constructing the auxiliary training model and the target model based on a convolutional neural network according to preset rules; wherein, the preset rule is that the model complexity of the auxiliary training model is higher than the model complexity of the target model.
本申请还提供了一种行人重识别系统,该系统包括:The present application also provides a pedestrian re-identification system, which includes:
模型构建模块,用于构建基于卷积神经网络的辅助训练模型和目标模型;Model building blocks for constructing convolutional neural network-based auxiliary training models and target models;
模型训练模块,用于确定所述辅助训练模型和所述目标模型的损失函数,并利用所述损失函数训练所述辅助训练模型和所述目标模型;A model training module, configured to determine a loss function of the auxiliary training model and the target model, and use the loss function to train the auxiliary training model and the target model;
知识迁移模块,用于在所述辅助训练模型训练完毕后,将所述辅助训练模型的知识迁移至所述目标模型,得到行人重识别模型;A knowledge transfer module, configured to transfer the knowledge of the auxiliary training model to the target model to obtain a pedestrian re-identification model after the training of the auxiliary training model is completed;
特征提取模块,用于向所述行人重识别模型输入行人图像,得到所述行人图像的嵌入层特征;A feature extraction module, configured to input a pedestrian image to the pedestrian re-identification model to obtain the embedded layer features of the pedestrian image;
行人重识别模块,用于将所述行人图像的嵌入层特征与待查询图像的嵌入层进行相似度比对,并根据相似度比对结果输出行人重识别结果。The pedestrian re-identification module is used to compare the similarity between the embedded layer features of the pedestrian image and the embedded layer of the image to be queried, and output the pedestrian re-identification result according to the similarity comparison result.
本申请还提供了一种存储介质,其上存储有计算机程序,所述计算机程序执行时实现上述行人重识别方法执行的步骤。The present application also provides a storage medium on which a computer program is stored, and when the computer program is executed, the steps performed by the above-mentioned pedestrian re-identification method are realized.
本申请还提供了一种电子设备,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器调用所述存储器中的计算机程序时实现上述行人重识别方法执行的步骤。The present application also provides an electronic device, including a memory and a processor, the memory stores a computer program, and the processor implements the steps performed by the above pedestrian re-identification method when calling the computer program in the memory.
本申请提供了一种行人重识别方法,包括:构建基于卷积神经网络的辅助训练模型和目标模型;确定所述辅助训练模型和所述目标模型的损失函数,并利用所述损失函数训练所述辅助训练模型和所述目标模型;在所述辅助训练模型训练完毕后,将所述辅助训练模型的知识迁移至所述目标模型,得到行人重识别模型;向所述行人重识别模型输入行人图像,得到所述行人图像的嵌入层特征;将所述行人图像的嵌入层特征与待查询图像的嵌入层进行相似度比对,并根据相似度比对结果输出行人重识别结果。The present application provides a pedestrian re-identification method, including: constructing an auxiliary training model and a target model based on a convolutional neural network; determining the loss function of the auxiliary training model and the target model, and using the loss function to train the The auxiliary training model and the target model; after the training of the auxiliary training model is completed, transfer the knowledge of the auxiliary training model to the target model to obtain a pedestrian re-identification model; input pedestrians to the pedestrian re-identification model image to obtain the embedded layer features of the pedestrian image; comparing the similarity between the embedded layer features of the pedestrian image and the embedded layer of the image to be queried, and outputting a pedestrian re-identification result according to the similarity comparison result.
本申请构建基于卷积神经网络的辅助训练模型和目标模型,并确定辅助训练模型和目标模型的损失函数,进而利用损失函数训练辅助训练模型和目标模型。在辅助训练模型训练完毕后,本申请通过知识迁移的方式将辅助训练模型中学习的知识迁移至目标模型,得到行人重识别模型。由于行人重识别模型中包括辅助训练模型和目标模型学习到知识,能够在不需要额外推理代价的前提下来提高行人重识别模型的准确率。因此,本申请能够在不提高参数量和计算量的前提下,提高行人重识别的准确率。本申请同时还提供了一种行人重识别系统、一种电子设备和一种存储介质,具有上述有益效果,在此不再赘述。The application constructs an auxiliary training model and a target model based on a convolutional neural network, and determines the loss functions of the auxiliary training model and the target model, and then uses the loss function to train the auxiliary training model and the target model. After the auxiliary training model is trained, the application transfers the knowledge learned in the auxiliary training model to the target model through knowledge transfer to obtain a pedestrian re-identification model. Since the pedestrian re-identification model includes knowledge learned from the auxiliary training model and the target model, the accuracy of the pedestrian re-identification model can be improved without additional reasoning costs. Therefore, the present application can improve the accuracy of pedestrian re-identification without increasing the amount of parameters and computation. At the same time, the present application also provides a pedestrian re-identification system, an electronic device and a storage medium, which have the above-mentioned beneficial effects and will not be repeated here.
附图说明Description of drawings
为了更清楚地说明本申请实施例,下面将对实施例中所需要使用的附图做简单的介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the embodiments of the present application more clearly, the following will briefly introduce the accompanying drawings used in the embodiments. Obviously, the accompanying drawings in the following description are only some embodiments of the present application. As far as people are concerned, other drawings can also be obtained based on these drawings on the premise of not paying creative work.
图1为本申请实施例所提供的一种行人重识别方法的流程图;FIG. 1 is a flowchart of a pedestrian re-identification method provided by an embodiment of the present application;
图2为本申请实施例所提供的第一种模型示意图;Fig. 2 is the schematic diagram of the first kind of model provided by the embodiment of the present application;
图3为本申请实施例所提供的第二种模型示意图;FIG. 3 is a schematic diagram of the second model provided by the embodiment of the present application;
图4为本申请实施例所提供的一种模型保留结果示意图;Fig. 4 is a schematic diagram of a model retention result provided by the embodiment of the present application;
图5为本申请实施例所提供的一种行人重识别应用示意图;FIG. 5 is a schematic diagram of a pedestrian re-identification application provided by the embodiment of the present application;
图6为本申请实施例所提供的一种行人重识别系统的结构示意图。FIG. 6 is a schematic structural diagram of a pedestrian re-identification system provided by an embodiment of the present application.
具体实施方式Detailed ways
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments It is a part of the embodiments of this application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.
下面通过在实际应用中的实施例说明上述实施例描述的流程。The process described in the above-mentioned embodiments will be described below through embodiments in practical applications.
随着深度学习的不断发展,深度学习网络在各个领域都取得了引入瞩目的表现。为了进一步提高网络性能,学者们通常通过构建更为复杂的网络结构的形式继续提高其性能。但是以该方式提高网络性能有以下缺点:(1)更深、更宽或更为复杂的网络通常会带来参数量的激增,参数量的增加不利于便携式设备的存储与部署。例如,在网络摄像头中实现实时的行人检测识别程序的部署,需要网络具有较小的参数量(便于存储)和较高的识别准确率。(2)更深、更宽或更为复杂的网络通常会带来计算量的增加,不利于对实时性要求较高的场景应用。例如:对犯罪嫌疑人的检索与跟踪。大的计算延迟会使整个系统 错失最好的时机,给系统功能带来负面影响。因此,如何在不增加参数量和计算量的前提下,提升网络的性能成为需要解决的问题。With the continuous development of deep learning, deep learning networks have achieved remarkable performance in various fields. In order to further improve network performance, scholars usually continue to improve its performance by constructing more complex network structures. However, improving network performance in this way has the following disadvantages: (1) Deeper, wider or more complex networks usually bring a surge in the number of parameters, which is not conducive to the storage and deployment of portable devices. For example, to realize the deployment of a real-time pedestrian detection and recognition program in a network camera, the network needs to have a small amount of parameters (easy to store) and a high recognition accuracy. (2) Deeper, wider or more complex networks usually increase the amount of calculation, which is not conducive to the application of scenarios with high real-time requirements. For example: retrieval and tracking of criminal suspects. A large calculation delay will cause the entire system to miss the best opportunity, and have a negative impact on system functions. Therefore, how to improve the performance of the network without increasing the amount of parameters and calculations has become a problem that needs to be solved.
为解决如上问题,本实施例提出了一种基于知识监督的卷积神经网络的构造、训练、推理方法,该方法可以在不增加参数量和计算量的前提下实现知识迁移(大模型知识向小模型迁移),最大化的挖掘网络潜能,提升网络性能。本实施例中对于同一图像的多个结果可以相互辅助,从而利用群体学习到的知识获得更准确的结果。其中,多个结果既包括最终结果,也包括中间结果。In order to solve the above problems, this embodiment proposes a method for constructing, training, and inferring a convolutional neural network based on knowledge supervision. This method can realize knowledge transfer without increasing the amount of parameters and calculations (large model knowledge transfer to Small model migration), maximize the mining of network potential, and improve network performance. In this embodiment, multiple results for the same image can assist each other, so that more accurate results can be obtained by utilizing the knowledge learned by the group. Wherein, the multiple results include both final results and intermediate results.
本实施例基于知识监督学习的思想,首先建立一个或多个网络,网络之间通过相互之间的监督学习,实现知识迁移,提高各个模型的泛化能力。This embodiment is based on the idea of knowledge-supervised learning. First, one or more networks are established, and the networks realize knowledge transfer and improve the generalization ability of each model through mutual supervised learning.
下面请参见图1,图1为本申请实施例所提供的一种行人重识别方法的流程图。Please refer to FIG. 1 below. FIG. 1 is a flow chart of a pedestrian re-identification method provided by an embodiment of the present application.
具体步骤可以包括:Specific steps can include:
S101:构建基于卷积神经网络的辅助训练模型和目标模型;S101: Construct an auxiliary training model and a target model based on a convolutional neural network;
其中,本实施例可以基于知识监督学习的思想建立包括一个或多个卷积神经网络的辅助训练模型和目标模型,上述卷积神经网络之间通过相互之间的监督学习,实现知识迁移进而提高各个模型的泛化能力。Among them, this embodiment can establish an auxiliary training model and a target model including one or more convolutional neural networks based on the idea of knowledge-supervised learning. The above-mentioned convolutional neural networks can realize knowledge transfer through mutual supervised learning to improve The generalization ability of each model.
作为一种可行的实施方式,本申请可以通过以下方式构建基于卷积神经网络的辅助训练模型和目标模型:构建包括至少两个卷积神经网络的所述辅助训练模型,构建包括至少两个卷积神经网络的所述目标模型;或,利用包括至少两个头部网络的卷积神经网络构建所述辅助训练模型,利用包括至少两个头部网络的卷积神经网络构建所述目标模型;其中,所述头部网络包括池化层、embedding层、全连接层、输出层和softmax层。As a feasible implementation, the present application can construct the auxiliary training model and the target model based on the convolutional neural network in the following manner: construct the auxiliary training model including at least two convolutional neural networks, and construct the auxiliary training model including at least two convolutional neural networks. The target model of the convolutional neural network; or, utilize the convolutional neural network comprising at least two head networks to construct the auxiliary training model, utilize the convolutional neural network comprising at least two head networks to construct the target model; Wherein, the head network includes a pooling layer, an embedding layer, a fully connected layer, an output layer and a softmax layer.
请参见图2,图2为本申请实施例所提供的第一种模型示意图,该模型示意图给出了建立包括2个卷积神经网络的辅助训练模型和目标模型的实现方式,辅助训练模型和/或目标模型可以为图2所示的模型。如图2所示,建立2个卷积神经网络Net1和Net2。这两个卷积神经网络可以同构,也可以异构。网络的输出通过Pooling层可以将特征图(Batchsize×Channel×H×W)降维成为一个向量,这里用e1、e2表示嵌入层特征,e1和e2的维度是Batchsize×Channel。上述模型包括主干模型和头部模型,主干网络用于实现特征的提取,头部网络用于实现分类和损失函数的计算。头部网络包括池化层pool、embedding层、全连接层fc、输出层和softmax层,头部网络可以利用三元组损失函数Triplet loss和交叉熵损失函数cross-entropy loss进行参数调整。Please refer to FIG. 2. FIG. 2 is a schematic diagram of the first model provided by the embodiment of the present application. The schematic diagram of the model shows the implementation of establishing an auxiliary training model and a target model including two convolutional neural networks. The auxiliary training model and /or the target model may be the model shown in FIG. 2 . As shown in Figure 2, two convolutional neural networks Net1 and Net2 are established. The two convolutional neural networks can be isomorphic or heterogeneous. The output of the network can reduce the dimensionality of the feature map (Batchsize×Channel×H×W) into a vector through the Pooling layer. Here, e1 and e2 are used to represent the embedded layer features, and the dimensions of e1 and e2 are Batchsize×Channel. The above model includes a backbone model and a head model. The backbone network is used to extract features, and the head network is used to realize classification and loss function calculation. The head network includes a pooling layer pool, an embedding layer, a fully connected layer fc, an output layer, and a softmax layer. The head network can use the triplet loss function Triplet loss and the cross-entropy loss function for parameter adjustment.
请参见图3,图3为本申请实施例所提供的第二种模型示意图,该模型示意图给出了建立包括一个卷积神经网络的辅助训练模型和目标模型的实现方式,该模型为多头的卷积神经网络(即存在多个头部网络)。Please refer to Fig. 3. Fig. 3 is a schematic diagram of the second model provided by the embodiment of the present application. The schematic diagram of the model shows the implementation of establishing an auxiliary training model and a target model including a convolutional neural network. The model is multi-headed Convolutional neural network (i.e. there are multiple head networks).
本实施例中的辅助训练模型可以为图2和图3任一种所示的模型,目标训练模型也可以为图2和图3任一种所示的模型。上述辅助训练模型的复杂度高于目标模型,可以使用模型的参数量和计算量衡量模型的复杂度。具体的,本实施例可以按照预设规则构建基于卷积神经网络的所述辅助训练模型和所述目标模型;其中,所述预设规则为所述辅助训练模型的模型复杂度高于所述目标模型的模型复杂度。The auxiliary training model in this embodiment may be the model shown in any one of FIG. 2 and FIG. 3 , and the target training model may also be the model shown in any one of FIG. 2 and FIG. 3 . The complexity of the above-mentioned auxiliary training model is higher than that of the target model, and the complexity of the model can be measured by the amount of parameters and computation of the model. Specifically, in this embodiment, the auxiliary training model and the target model based on a convolutional neural network can be constructed according to preset rules; wherein, the preset rule is that the model complexity of the auxiliary training model is higher than that of the The model complexity of the target model.
S102:确定所述辅助训练模型和所述目标模型的损失函数,并利用所述损失函数训练所述辅助训练模型和所述目标模型;S102: Determine a loss function of the auxiliary training model and the target model, and use the loss function to train the auxiliary training model and the target model;
其中,本实施例中辅助训练模型和所述目标模型的损失函数可以相同,具体可以通过以下方式确定二者的损失函数:提供所述卷积神经网络的交叉熵损失函数,所述交叉熵损失函数用于算出每一所述卷积神经网络的交叉熵损失;提供所述卷积神经网络的特征相似性损失函数,所述特征相似性损失函用于计算所述至少两个卷积神经网络中的任意两个卷积神经网络间的特征相似性损失;提供所述卷积神经网 络的类中心损失函数,所述类中心损失函数用于计算每一所述卷积神经网络的类中心损失;提供所述卷积神经网络的、对类中心距离进行约束的损失函数,所述对类中心距离进行约束的损失函数用于计算每一所述卷积神经网络的、对类中心距离进行约束的损失;根据所述交叉熵损失函数、所述特征相似性损失函数、所述类中心损失函数、以及所述对类中心距离进行约束的损失函数确定所述辅助训练模型和所述目标模型的损失函数。Wherein, in this embodiment, the loss functions of the auxiliary training model and the target model can be the same, specifically, the loss functions of the two can be determined in the following manner: providing the cross-entropy loss function of the convolutional neural network, the cross-entropy loss The function is used to calculate the cross entropy loss of each of the convolutional neural networks; the feature similarity loss function of the convolutional neural network is provided, and the feature similarity loss function is used to calculate the at least two convolutional neural networks The feature similarity loss between any two convolutional neural networks; the class center loss function of the convolutional neural network is provided, and the class center loss function is used to calculate the class center loss of each of the convolutional neural networks ; Provide the loss function of the convolutional neural network that constrains the class center distance, and the loss function that constrains the class center distance is used to calculate each of the convolutional neural networks that constrains the class center distance The loss of the loss; according to the cross-entropy loss function, the feature similarity loss function, the class center loss function, and the loss function that constrains the class center distance to determine the auxiliary training model and the target model loss function.
在行人重识别任务中,embedding层特征是最终被用来进行特征匹配检索的,所以对其进行约束优化对行人重识别任务具有重要意义。In the pedestrian re-identification task, the features of the embedding layer are finally used for feature matching retrieval, so constrained optimization of them is of great significance for the pedestrian re-identification task.
本实施例设计一种新的loss函数,该函数实施步骤如下:This embodiment designs a new loss function, and the implementation steps of this function are as follows:
损失函数(1)计算过程:在embedding层后面加入全连接层(fc),获得全连接层特征,全连接层特征进行softmax归一化,最后通过交叉熵损失函数计算损失
Figure PCTCN2022090217-appb-000021
其中上标1代表第1个分支。
Loss function (1) Calculation process: Add a fully connected layer (fc) after the embedding layer to obtain the fully connected layer features, and perform softmax normalization on the fully connected layer features, and finally calculate the loss through the cross entropy loss function
Figure PCTCN2022090217-appb-000021
The superscript 1 represents the first branch.
损失函数(2)计算过程:卷积神经网络Net1和卷积神经网络Net2的嵌入式特征e1、e2应具有相似性,因为其在训练过程中都支持相同的行人分类任务,在推理过程中,e1、e2会被用来进行相似性比对。因此Net1和Net2的嵌入式特征e1、e2在功能上是相同的,应当学习到类似的特征。Loss function (2) Calculation process: The embedded features e1 and e2 of the convolutional neural network Net1 and convolutional neural network Net2 should be similar, because they both support the same pedestrian classification task during the training process, and during the inference process, e1 and e2 will be used for similarity comparison. Therefore, the embedded features e1 and e2 of Net1 and Net2 are functionally the same, and similar features should be learned.
Net1和Net2具有不同的网络结构和初始化权重系数,其e1、e2具有多样性,但其共性是对于行人具有优良的表达能力,为了发挥其共性优势,抑制噪声。本实施例提供了特征相似性损失函数,用以实现互学习机制,即:e1向e2进行学习,e2向e1进行学习,得到卷积神经网络Net1与卷积神经网络Net2间的特征相似性的损失L mNet1 and Net2 have different network structures and initialization weight coefficients, and their e1 and e2 have diversity, but their commonality is that they have excellent expressive ability for pedestrians. In order to take advantage of their commonality, they suppress noise. This embodiment provides a feature similarity loss function to implement a mutual learning mechanism, that is, e1 learns from e2, and e2 learns from e1 to obtain the feature similarity between the convolutional neural network Net1 and the convolutional neural network Net2 Loss L m .
其Loss函数为:
Figure PCTCN2022090217-appb-000022
Its Loss function is:
Figure PCTCN2022090217-appb-000022
其中,n代表第n个输入样本,u和v代表第u网络和第v网络。该公式可以归纳如下:Among them, n represents the nth input sample, and u and v represent the uth network and the vth network. The formula can be summarized as follows:
对每个批次batch的所有样本进行遍历,如上所述,假设每个batch的样本包含N个样本,则遍历N次。将样本依次经过各个网络,获取样本在各个网络的embedding层输出结果。例如,对于样本x n,假设有2个网络,则共有2个embedding层输出结果
Figure PCTCN2022090217-appb-000023
Figure PCTCN2022090217-appb-000024
同理,若有3个网络,则有3个embedding层输出。
All samples of each batch are traversed. As mentioned above, assuming that the samples of each batch contain N samples, it is traversed N times. The samples are passed through each network in turn, and the output results of the samples in the embedding layer of each network are obtained. For example, for sample x n , assuming there are 2 networks, there are 2 embedding layer output results
Figure PCTCN2022090217-appb-000023
and
Figure PCTCN2022090217-appb-000024
Similarly, if there are 3 networks, there are 3 embedding layer outputs.
对于所有网络的embedding层输出结果,进行两两遍历。例如,本实施例共有2个网络1、2。使用如上公式求取两两网络之间的特征相似性损失L e。同理,假设有个3个网络,则不重复的遍历共有3种组合:(1,2)(1,3)(2,3),求取每种组合的特征相似性损失函数L e(u,v)。 For the output results of the embedding layer of all networks, a pairwise traversal is performed. For example, there are two networks 1 and 2 in this embodiment. Use the above formula to calculate the feature similarity loss L e between the two networks. In the same way, assuming that there are 3 networks, there are 3 combinations without repeated traversal: (1, 2) (1, 3) (2, 3), and the feature similarity loss function L e ( u,v).
损失函数(3)计算过程:e1、e2特征具有相似性,在相互学习过程中具有如下缺陷。例如:训练过程起始阶段,网络模型预测很不准确,其网络特征e1、e2具有较大的偏差和噪声,e1和e2之间的相互学习有可能是不准确的特征向不准确的特征进行学习,可能不会起到良好效果。为了抑制该噪声,本实施例提出一种嵌入式特征优化方式,使用类中心损失函数可以有效的减少嵌入式特征的噪声。具体实施方式如下:Loss function (3) Calculation process: The features of e1 and e2 are similar, and have the following defects in the mutual learning process. For example: in the initial stage of the training process, the prediction of the network model is very inaccurate, and its network features e1 and e2 have large deviations and noises. The mutual learning between e1 and e2 may be inaccurate features to inaccurate features. Learning, may not have a good effect. In order to suppress the noise, this embodiment proposes an embedded feature optimization method, which can effectively reduce the noise of the embedded feature by using the class center loss function. The specific implementation is as follows:
构建类中心损失函数的核心思想是:各个图像的embedding层特征分别向各自的类中心去学习。因为图像样本的各类中心是相对稳定的,可以有效的抑制embedding层特征向其它分支学习带来的偏差。学习方法为:求取所有样本各类的类中心,将所有样本x N依次输入各个网络,获得所有样本的 embedding特征
Figure PCTCN2022090217-appb-000025
Figure PCTCN2022090217-appb-000026
其中上标1,2代表不同的分支,下标N代表共N个样本。对于每个网络的输出分别求样本的类中心,假设所有样本x N共包含C个类别(即,C个行人),使用如下公式求取类中心:
Figure PCTCN2022090217-appb-000027
其中,
Figure PCTCN2022090217-appb-000028
代表第1个网络的embedding层特征的第c类的类中心。共有C个类中心,用
Figure PCTCN2022090217-appb-000029
表示。同理,对于多个网络,分别求取各个网络embedding层类中心。
Figure PCTCN2022090217-appb-000030
代表第1个网络的第n个样本的embedding特征,它的类别是第c类。
The core idea of constructing the class center loss function is: the embedding layer features of each image are learned from their respective class centers. Because the various centers of image samples are relatively stable, it can effectively suppress the deviation caused by embedding layer features learning from other branches. The learning method is: Find the class centers of all types of samples, input all samples x N into each network in turn, and obtain the embedding features of all samples
Figure PCTCN2022090217-appb-000025
and
Figure PCTCN2022090217-appb-000026
The superscripts 1 and 2 represent different branches, and the subscript N represents a total of N samples. For the output of each network, the class center of the sample is calculated separately. Assuming that all samples x N contain a total of C categories (that is, C pedestrians), use the following formula to find the class center:
Figure PCTCN2022090217-appb-000027
in,
Figure PCTCN2022090217-appb-000028
Represents the class center of the c-th class of the embedding layer features of the first network. There are C class centers in total, with
Figure PCTCN2022090217-appb-000029
express. Similarly, for multiple networks, the class center of the embedding layer of each network is obtained separately.
Figure PCTCN2022090217-appb-000030
Represents the embedding feature of the nth sample of the first network, and its category is the cth category.
第u个网络对每个样本,分别去向其样本类别所对应的embedding层类中心进行学习后,最终得到的类中心损失为:
Figure PCTCN2022090217-appb-000031
其中,
Figure PCTCN2022090217-appb-000032
代表样本
Figure PCTCN2022090217-appb-000033
所对应的类中心。依次遍历每个网络的特征,由类中心损失函数求取各个网络的损失
Figure PCTCN2022090217-appb-000034
Figure PCTCN2022090217-appb-000035
The u-th network learns each sample from the embedding layer class center corresponding to its sample category, and finally obtains the class center loss as follows:
Figure PCTCN2022090217-appb-000031
in,
Figure PCTCN2022090217-appb-000032
representative sample
Figure PCTCN2022090217-appb-000033
The corresponding class center. Traverse the features of each network in turn, and calculate the loss of each network by the class center loss function
Figure PCTCN2022090217-appb-000034
and
Figure PCTCN2022090217-appb-000035
由于网络是不断迭代优化的,其样本相对于各个网络的类中心也是不断变化,通过以下方式动态更新样本各类的类中心
Figure PCTCN2022090217-appb-000036
其中上标u代表第u个网络。类中心的更新方式可以用先进先出堆栈的方式,求取离当前步最近的n步样本的类中心为真正的类中心。
Since the network is continuously iteratively optimized, its samples are constantly changing relative to the class center of each network, and the class centers of various types of samples are dynamically updated in the following way
Figure PCTCN2022090217-appb-000036
where the superscript u represents the uth network. The update method of the class center can use the method of first-in-first-out stack, and the class center of the n-step samples closest to the current step is obtained as the real class center.
本实施例还可以根据该embedding特征所对应的分类概率选取类中心。即:先判断使用该特征分类是否正确,在正确的情况下,才计入类中心计算。In this embodiment, the class center may also be selected according to the classification probability corresponding to the embedding feature. That is: first judge whether the classification using this feature is correct, and if it is correct, it will be included in the calculation of the class center.
损失函数(4)计算过程:对于各个网络,可以进一步对类中心位置进行约束,使其各类别中心能够尽量分开,从而有利于区分不同的行人特征,提高网络的可鉴别性。即使每个行人特征能够更好的分开。本实施例可以构建对类中心距离(类中心间距)进行约束的损失函数:
Figure PCTCN2022090217-appb-000037
Loss function (4) Calculation process: For each network, the position of the class centers can be further constrained so that the centers of each class can be separated as much as possible, which is beneficial to distinguish different pedestrian characteristics and improve the discriminability of the network. Even if each pedestrian feature can be better separated. In this embodiment, a loss function that constrains the class center distance (class center distance) can be constructed:
Figure PCTCN2022090217-appb-000037
Figure PCTCN2022090217-appb-000038
代表第u个网络的第i类的类中心,
Figure PCTCN2022090217-appb-000039
代表距离
Figure PCTCN2022090217-appb-000040
最近的类中心。本实施例可以采用艰难样本挖掘(困难样本挖掘)的方法实现类中心求loss优化。艰难样本挖掘不是求所有类的类间差的平均值,而是求所有类的最小类间差(即,最小的类中心距离)。
Figure PCTCN2022090217-appb-000038
represents the class center of the i-th class of the u-th network,
Figure PCTCN2022090217-appb-000039
representative distance
Figure PCTCN2022090217-appb-000040
nearest class center. In this embodiment, a difficult sample mining (difficult sample mining) method may be used to implement class-centered loss optimization. Difficult sample mining is not the mean value of the inter-class differences of all classes, but the minimum inter-class difference (ie, the smallest class center distance) of all classes.
本实施例将损失函数(1)~(4)联合得到总的损失函数:In this embodiment, the loss functions (1)-(4) are combined to obtain the total loss function:
Figure PCTCN2022090217-appb-000041
Figure PCTCN2022090217-appb-000041
L loss为模型的总损失,
Figure PCTCN2022090217-appb-000042
为第一个卷积神经网络的交叉熵损失,
Figure PCTCN2022090217-appb-000043
为第二个卷积神经网络的交叉熵损失,L m为第一个卷积神经网络和第二个卷积神经网络的特征相似性损失,
Figure PCTCN2022090217-appb-000044
为第一个卷积神经网络的类中心损失,
Figure PCTCN2022090217-appb-000045
为第二个卷积神经网络的交叉熵损失,
Figure PCTCN2022090217-appb-000046
为第一个卷积神经网络的对类中心距离进行约束的损失,
Figure PCTCN2022090217-appb-000047
为第二个卷积神经网络的对类中心距离进行约束的损失。
L loss is the total loss of the model,
Figure PCTCN2022090217-appb-000042
is the cross-entropy loss of the first convolutional neural network,
Figure PCTCN2022090217-appb-000043
is the cross-entropy loss of the second convolutional neural network, L m is the feature similarity loss of the first convolutional neural network and the second convolutional neural network,
Figure PCTCN2022090217-appb-000044
is the class center loss of the first convolutional neural network,
Figure PCTCN2022090217-appb-000045
is the cross-entropy loss of the second convolutional neural network,
Figure PCTCN2022090217-appb-000046
is the loss that constrains the class center distance for the first convolutional neural network,
Figure PCTCN2022090217-appb-000047
The loss that constrains the class center distance for the second convolutional neural network.
本实施例提供了一种多模型知识协同训练的网络结构,保护联合以上互学习损失、类中心损失、类中心优化损失损失函数进行监督学习训练。多模型知识监督训练方法通过在网络embedding层中的特征进行挖掘,提高embedding层特征的鉴别力,推理时删除冗余模型,因此不需要额外的推理代价来提高准确率,该方法在图像分类领域具有广阔的应用前景。This embodiment provides a network structure for multi-model knowledge collaborative training, which protects and combines the above mutual learning loss, class center loss, and class center optimization loss functions for supervised learning training. The multi-model knowledge supervised training method mines the features in the network embedding layer, improves the discrimination of the embedding layer features, and deletes redundant models during reasoning, so no additional reasoning costs are required to improve accuracy. This method is in the field of image classification have a broad vision of application.
下面介绍上述实施例中训练模型的过程,一个卷积神经网络建立好以后需要训练使其收敛,收敛后得到训练好的网络权重。在推理过程中,预先加载网络训练好的权重系数对输入数据进行最终的分类。The following describes the process of training the model in the above embodiment. After a convolutional neural network is established, it needs to be trained to make it converge, and the trained network weights are obtained after convergence. In the inference process, the weight coefficients trained by the network are preloaded to perform final classification on the input data.
本实施例的模型训练思路如下:(1)根据不同的网络结构,构建多个用于训练的网络模型,通常选择一个较大的模型(即,辅助训练模型)和一个较小的模型(即,目标模型)实现知识迁移。对所有网络模型求取交叉熵损失、互学习损失、类中心损失、类中心优化损失。其中,交叉熵损失通过交叉熵损失函数计算,互学习损失通过特征相似性损失函数计算,类中心损失通过类中心损失函数计算,类中心优化损失通过对类中心距离进行约束的损失函数。根据如上损失函数对网络进行训练使其收敛。The model training idea of this embodiment is as follows: (1) according to different network structures, build a plurality of network models for training, usually select a larger model (i.e., auxiliary training model) and a smaller model (i.e. , target model) to achieve knowledge transfer. Calculate cross-entropy loss, mutual learning loss, class-centered loss, and class-centered optimization loss for all network models. Among them, the cross entropy loss is calculated by the cross entropy loss function, the mutual learning loss is calculated by the feature similarity loss function, the class center loss is calculated by the class center loss function, and the class center optimization loss is calculated by the loss function that constrains the class center distance. According to the above loss function, the network is trained to converge.
卷积神经网络训练过程如下:卷积神经网络的训练过程分为两个阶段。第一个阶段是数据由低层次向高层次传播的阶段,即前向传播阶段。另外一个阶段是,当前向传播得出的结果与预期不相符时,将误差从高层次向底层次进行传播训练的阶段,即反向传播阶段。训练过程包括以下步骤:The convolutional neural network training process is as follows: The training process of the convolutional neural network is divided into two stages. The first stage is the stage in which the data propagates from the low level to the high level, that is, the forward propagation stage. The other stage is that when the results obtained by the forward propagation do not match the expectations, the stage of propagating the error from the high level to the bottom level is the stage of backpropagation. The training process includes the following steps:
步骤1、网络层权值进行初始化,一般采用随机初始化;Step 1. The network layer weights are initialized, generally using random initialization;
步骤2、输入图像数据经过卷e积层、下采样层、全连接层等各层的前向传播得到输出值;Step 2. The input image data is forward-propagated through the convolution layer, down-sampling layer, fully connected layer and other layers to obtain the output value;
步骤3、求出网络的输出值与目标值(标签)之间的误差:Step 3. Calculate the error between the output value of the network and the target value (label):
步骤4、将误差反向传回网络中,依次求得网络各层:全连接层,卷积层等各层的反向传播误差。Step 4. The error is reversely transmitted back to the network, and the backpropagation error of each layer of the network: fully connected layer, convolutional layer, etc. is obtained in turn.
步骤5、网络各层根据各层的反向传播误差对网络中的所有权重系数进行调整,即进行权重的更新。Step 5. Each layer of the network adjusts all weight coefficients in the network according to the backpropagation error of each layer, that is, updates the weights.
步骤6、重新随机选取新的图像数据,然后进入到第二步,获得网络前向传播得到输出值。Step 6. Randomly select new image data again, and then enter the second step to obtain the output value from the forward propagation of the network.
步骤7、无限往复迭代,当求出网络的输出值与目标值(标签)之间的误差小于某个阈值,或者迭代次数超过某个阈值时,结束训练。Step 7. Infinite reciprocating iterations. When the error between the output value of the network and the target value (label) is found to be less than a certain threshold, or the number of iterations exceeds a certain threshold, the training ends.
步骤8、保存训练好的所有层的网络参数,存储训练好的权重。Step 8. Save the trained network parameters of all layers and store the trained weights.
可选地,本实施例中,所述对类中心距离进行约束包括:通过艰难样本挖掘,求出各个类中心的最小类间差;利用所述最小类间差,对类中心距离进行约束。Optionally, in this embodiment, the constraining the class center distance includes: finding the minimum inter-class difference of each class center through difficult sample mining; using the minimum inter-class difference to constrain the class center distance.
在本步骤之前还可以存在获取行人重识别的训练数据,进而利用训练数据分别对辅助训练模型和目标模型进行训练。本实施例在不增加网络在推理时的参数量和计算量的前提下,提高了神经网络在训练、推理的精度。Before this step, there may also be acquired training data for pedestrian re-identification, and then use the training data to train the auxiliary training model and the target model respectively. In this embodiment, the accuracy of training and reasoning of the neural network is improved without increasing the amount of parameters and the amount of calculation of the network during reasoning.
S103:在所述辅助训练模型训练完毕后,将所述辅助训练模型的知识迁移至所述目标模型,得到行人重识别模型;S103: After the auxiliary training model is trained, transfer the knowledge of the auxiliary training model to the target model to obtain a pedestrian re-identification model;
其中,在对辅助训练模型训练完毕后,辅助训练模型已学习到关于行人重识别的知识信息,本实施例可以通过知识迁移将上述知识信息迁移至目标模型,本实施例将训练完毕、且迁移了辅助训练模型的知识的目标模型作为行人重识别模型。上述知识指网络中的特征,本实施例对同一数据的多种观点将提供额外的正则化信息,从而提高网络准确性。Among them, after the training of the auxiliary training model is completed, the auxiliary training model has learned the knowledge information about pedestrian re-identification. This embodiment can transfer the above knowledge information to the target model through knowledge transfer. In this embodiment, the training is completed, and the transfer The target model with the knowledge of the auxiliary training model is used as the person re-identification model. The above knowledge refers to features in the network, and this embodiment of multiple views of the same data will provide additional regularization information, thereby improving network accuracy.
S104:向行人重识别模型输入行人图像,得到行人图像的嵌入层特征;S104: Input the pedestrian image to the pedestrian re-identification model to obtain the embedded layer features of the pedestrian image;
在得到行人重识别模型后,若接收到行人重识别任务,则向行人重识别模型输入行人图像,得到每一张行人图像的嵌入层特征。After the pedestrian re-identification model is obtained, if the pedestrian re-identification task is received, the pedestrian image is input to the pedestrian re-identification model to obtain the embedded layer features of each pedestrian image.
S105:将所述行人图像的嵌入层特征与待查询图像的嵌入层进行相似度比对,并根据相似度比对结果输出行人重识别结果。S105: Compare the similarity between the embedded layer feature of the pedestrian image and the embedded layer of the image to be queried, and output a pedestrian re-identification result according to the similarity comparison result.
其中,本实施例可以将行人图像的嵌入层特征与待查询图像的嵌入层进行相似度比对,根据相似度 比对结果确定相似度最高的行人图像,以便将相似度最高的行人图像作为行人重识别结果。Among them, this embodiment can compare the similarity between the embedded layer features of the pedestrian image and the embedded layer of the image to be queried, and determine the pedestrian image with the highest similarity according to the similarity comparison result, so that the pedestrian image with the highest similarity can be used as the pedestrian image. re-identification results.
本实施例构建基于卷积神经网络的辅助训练模型和目标模型,并确定辅助训练模型和目标模型的损失函数,进而利用损失函数训练辅助训练模型和目标模型。在辅助训练模型训练完毕后,本实施例通过知识迁移的方式将辅助训练模型中学习的知识迁移至目标模型,得到行人重识别模型。由于行人重识别模型中包括辅助训练模型和目标模型学习到知识,能够在不需要额外推理代价的前提下来提高行人重识别模型的准确率。因此,本实施例能够在不提高参数量和计算量的前提下,提高行人重识别的准确率。In this embodiment, an auxiliary training model and a target model based on a convolutional neural network are constructed, and loss functions of the auxiliary training model and the target model are determined, and then the auxiliary training model and the target model are trained using the loss function. After the auxiliary training model is trained, in this embodiment, the knowledge learned in the auxiliary training model is transferred to the target model through knowledge transfer to obtain a pedestrian re-identification model. Since the pedestrian re-identification model includes knowledge learned from the auxiliary training model and the target model, the accuracy of the pedestrian re-identification model can be improved without additional reasoning costs. Therefore, this embodiment can improve the accuracy of pedestrian re-identification without increasing the amount of parameters and computation.
下面提供一个使用上述实施例知识协同网络训练方法训练一个模型,并将其应用到行人重识别领域中。训练过程上面已经详细说明,下面讲解推理应用具体方法:The following provides a knowledge collaborative network training method using the above embodiments to train a model and apply it to the field of pedestrian re-identification. The training process has been described in detail above, and the specific method of reasoning application is explained below:
请参见图4,图4为本申请实施例所提供的一种模型保留结果示意图。本实施例提供的推理过程如下:去掉所有的辅助训练模型,只保留唯一一个网络模型(即目标模型),加载预训练权重,对图像进行分类或提取图像特征。Please refer to FIG. 4, which is a schematic diagram of a model retention result provided by the embodiment of the present application. The reasoning process provided in this embodiment is as follows: remove all auxiliary training models, retain only one network model (ie, the target model), load pre-trained weights, classify images or extract image features.
推理时:去掉其余模型(辅助训练模型),只保留主模型(目标模型)。请参见图5,图5为本申请实施例所提供的一种行人重识别应用示意图。图5中conv代表卷积层,bottleneck代表瓶颈层,其中bottleneck层代表ResNet的一种特定的网络结构。在行人重识别应用中,将输入图像1、2、3和待查询图像输入到网络中,获取其网络中embedding层特征,图像1、2、3构成行人重识别任务的查询数据集。将待查询图像也输入到网络中,获取待查询图像的embedding层特征。将待查询图像的embedding层特征与查询数据集中所有特征进行比对,比对方法就是求待查询图像的embedding层特征与查询数据集中所有特征的距离,即向量求距离,距离最小的查询数据样本与待查询图像是同一个人。During inference: remove the rest of the models (auxiliary training model), and only keep the main model (target model). Please refer to FIG. 5 , which is a schematic diagram of a pedestrian re-identification application provided by an embodiment of the present application. In Figure 5, conv represents the convolutional layer, bottleneck represents the bottleneck layer, and the bottleneck layer represents a specific network structure of ResNet. In the pedestrian re-identification application, the input images 1, 2, 3 and the image to be queried are input into the network to obtain the embedding layer features in the network, and the images 1, 2, and 3 constitute the query data set for the pedestrian re-identification task. The image to be queried is also input into the network to obtain the embedding layer features of the image to be queried. Compare the embedding layer features of the image to be queried with all the features in the query data set. The comparison method is to find the distance between the embedding layer features of the image to be queried and all the features in the query data set, that is, the distance between the vector and the query data sample with the smallest distance It is the same person as the image to be queried.
对于行人重识别任务,embedding特征的可鉴别性,直接影响着模型的最高性能。因此如何对模型embedding层特征进行挖掘,使样本能够正确的分类和判别是极为重要的。因此本发明提出一种新的embedding特征挖掘方法和多模型协同训练方法,通过建立多个神经网络模型,建立起特征挖掘的基础。通过两两模型间的embedding特征互学习、构造新型的损失函数来实现分支间的embedding挖掘。同时结合了分支内的embedding特征向各分类中心学习的损失函数,组合成为一个新的损失函数来对整个网络进行训练。For pedestrian re-identification tasks, the discriminability of embedding features directly affects the highest performance of the model. Therefore, how to mine the features of the embedding layer of the model so that the samples can be correctly classified and discriminated is extremely important. Therefore, the present invention proposes a new embedding feature mining method and a multi-model collaborative training method, and establishes a basis for feature mining by establishing multiple neural network models. Embedding mining between branches is realized by mutual learning of embedding features between two models and constructing a new type of loss function. At the same time, the loss function learned from each classification center is combined with the embedding features in the branch, and combined into a new loss function to train the entire network.
本实施例提出的训练方法不增加网络推理时的参数量和计算量,通过优化训练过程,挖掘网络的潜力,使其能够达到最优性能,从而在推理过程中表现出更好的结果。本实施例针对行人重识别任务,本发明提出一种多模型知识监督协同训练的embedding特征挖掘的方法,能够在不提高参数量和计算量的前提下,提高行人重识别的准确率。The training method proposed in this embodiment does not increase the amount of parameters and calculations during network inference. By optimizing the training process, the potential of the network is tapped so that it can achieve optimal performance, thereby showing better results in the inference process. In this embodiment, for the task of pedestrian re-identification, the present invention proposes a method of embedding feature mining based on multi-model knowledge supervision and collaborative training, which can improve the accuracy of pedestrian re-identification without increasing the amount of parameters and computation.
请参见图6,图6为本申请实施例所提供的一种行人重识别系统的结构示意图,该系统可以包括:Please refer to FIG. 6. FIG. 6 is a schematic structural diagram of a pedestrian re-identification system provided by an embodiment of the present application. The system may include:
模型构建模块601,用于构建基于卷积神经网络的辅助训练模型和目标模型; Model construction module 601, for constructing the auxiliary training model and target model based on convolutional neural network;
模型训练模块602,用于确定所述辅助训练模型和所述目标模型的损失函数,并利用所述损失函数训练所述辅助训练模型和所述目标模型;A model training module 602, configured to determine a loss function of the auxiliary training model and the target model, and use the loss function to train the auxiliary training model and the target model;
知识迁移模块603,用于在所述辅助训练模型训练完毕后,将所述辅助训练模型的知识迁移至所述目标模型,得到行人重识别模型;A knowledge transfer module 603, configured to transfer the knowledge of the auxiliary training model to the target model to obtain a pedestrian re-identification model after the training of the auxiliary training model is completed;
特征提取模块604,用于向所述行人重识别模型输入行人图像,得到所述行人图像的嵌入层特征;A feature extraction module 604, configured to input a pedestrian image to the pedestrian re-identification model to obtain the embedded layer features of the pedestrian image;
行人重识别模块605,用于将所述行人图像的嵌入层特征与待查询图像的嵌入层进行相似度比对,并根据相似度比对结果输出行人重识别结果。The pedestrian re-identification module 605 is configured to perform a similarity comparison between the embedded layer features of the pedestrian image and the embedded layer of the image to be queried, and output a pedestrian re-identification result according to the similarity comparison result.
本实施例构建基于卷积神经网络的辅助训练模型和目标模型,并确定辅助训练模型和目标模型的损失函数,进而利用损失函数训练辅助训练模型和目标模型。在辅助训练模型训练完毕后,本实施例通过知识迁移的方式将辅助训练模型中学习的知识迁移至目标模型,得到行人重识别模型。由于行人重识别 模型中包括辅助训练模型和目标模型学习到知识,能够在不需要额外推理代价的前提下来提高行人重识别模型的准确率。因此,本实施例能够在不提高参数量和计算量的前提下,提高行人重识别的准确率。In this embodiment, an auxiliary training model and a target model based on a convolutional neural network are constructed, and loss functions of the auxiliary training model and the target model are determined, and then the auxiliary training model and the target model are trained using the loss function. After the auxiliary training model is trained, in this embodiment, the knowledge learned in the auxiliary training model is transferred to the target model through knowledge transfer to obtain a pedestrian re-identification model. Since the pedestrian re-identification model includes knowledge learned from the auxiliary training model and the target model, the accuracy of the pedestrian re-identification model can be improved without additional reasoning costs. Therefore, this embodiment can improve the accuracy of pedestrian re-identification without increasing the amount of parameters and computation.
可选地,模型构建模块601用于构建包括至少两个卷积神经网络的所述辅助训练模型,构建包括至少两个卷积神经网络的所述目标模型;或,用于利用包括至少两个头部网络的卷积神经网络构建所述辅助训练模型,利用包括至少两个头部网络的卷积神经网络构建所述目标模型;其中,所述头部网络包括池化层、embedding层、全连接层、输出层和softmax层。Optionally, the model construction module 601 is configured to construct the auxiliary training model comprising at least two convolutional neural networks, construct the target model comprising at least two convolutional neural networks; The convolutional neural network of the head network constructs the auxiliary training model, and utilizes the convolutional neural network comprising at least two head networks to construct the target model; wherein, the head network includes a pooling layer, an embedding layer, a full Connection layer, output layer and softmax layer.
可选地,模型训练模块602用于提供所述卷积神经网络的交叉熵损失函数,所述交叉熵损失函数用于算出每一所述卷积神经网络的交叉熵损失;还用于提供所述卷积神经网络的特征相似性损失函数,所述特征相似性损失函用于计算所述至少两个卷积神经网络中的任意两个卷积神经网络间的特征相似性损失;还用于提供所述卷积神经网络的类中心损失函数,所述类中心损失函数用于计算每一所述卷积神经网络的类中心损失;还用于提供所述卷积神经网络的、对类中心距离进行约束的损失函数,所述对类中心距离进行约束的损失函数用于计算每一所述卷积神经网络的、对类中心距离进行约束的损失;还用于根据所述交叉熵损失函数、所述特征相似性损失函数、所述类中心损失函数、以及所述对类中心距离进行约束的损失函数确定所述辅助训练模型和所述目标模型的损失函数。Optionally, the model training module 602 is used to provide the cross-entropy loss function of the convolutional neural network, and the cross-entropy loss function is used to calculate the cross-entropy loss of each of the convolutional neural networks; The feature similarity loss function of the convolutional neural network, the feature similarity loss function is used to calculate the feature similarity loss between any two convolutional neural networks in the at least two convolutional neural networks; also used for The class center loss function of the convolutional neural network is provided, and the class center loss function is used to calculate the class center loss of each of the convolutional neural networks; it is also used to provide the class center of the convolutional neural network. A loss function that is constrained by distance, and the loss function that constrains the class center distance is used to calculate the loss of each of the convolutional neural networks that constrains the class center distance; , the feature similarity loss function, the class center loss function, and the loss function constraining the class center distance determine the loss functions of the auxiliary training model and the target model.
可选地,还包括:Optionally, also include:
类中心更新模块,用于对最近一次确定的类中心和当前输出的嵌入层特征进行加权计算,得到更新后的类中心。The class center update module is used to perform weighted calculation on the latest determined class center and the currently output embedding layer features to obtain the updated class center.
可选地,还包括:Optionally, also include:
判断模块,用于在对最近一次确定的类中心和当前输出的嵌入层特征进行加权计算之前,判断所述当前输出的嵌入层特征对应的特征分类是否正确;若是,则进入对最近一次确定的类中心和当前输出的嵌入层特征进行加权计算的步骤。Judgment module, used to judge whether the feature classification corresponding to the embedded layer feature of the current output is correct before the weighted calculation of the latest determined class center and the current output embedded layer feature; if so, enter the latest determined The step of weighting the class center and the current output embedding layer features.
可选地,关于模型训练模块602,所述对类中心距离进行约束包括:通过艰难样本挖掘,求出各个类中心的最小类间差;利用所述最小类间差,对类中心距离进行约束。Optionally, with regard to the model training module 602, the constraining the class center distance includes: finding the minimum inter-class difference of each class center through difficult sample mining; using the minimum inter-class difference to constrain the class center distance .
可选地,模型构建模块601,用于按照预设规则构建基于卷积神经网络的所述辅助训练模型和所述目标模型;其中,所述预设规则为所述辅助训练模型的模型复杂度高于所述目标模型的模型复杂度。Optionally, the model construction module 601 is configured to construct the auxiliary training model and the target model based on a convolutional neural network according to preset rules; wherein, the preset rule is the model complexity of the auxiliary training model Model complexity higher than the target model.
由于系统部分的实施例与方法部分的实施例相互对应,因此系统部分的实施例请参见方法部分的实施例的描述,这里暂不赘述。Since the embodiments of the system part correspond to the embodiments of the method part, please refer to the description of the embodiments of the method part for the embodiments of the system part, and details will not be repeated here.
本申请还提供了一种存储介质,其上存有计算机程序,该计算机程序被执行时可以实现上述实施例所提供的步骤。该存储介质可以包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The present application also provides a storage medium on which a computer program is stored. When the computer program is executed, the steps provided in the above-mentioned embodiments can be realized. The storage medium may include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes.
本申请还提供了一种电子设备,可以包括存储器和处理器,所述存储器中存有计算机程序,所述处理器调用所述存储器中的计算机程序时,可以实现上述实施例所提供的步骤。当然所述电子设备还可以包括各种网络接口,电源等组件。The present application also provides an electronic device, which may include a memory and a processor, where a computer program is stored in the memory, and when the processor invokes the computer program in the memory, the steps provided in the above embodiments can be implemented. Of course, the electronic device may also include various network interfaces, power supplies and other components.
说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的系统而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以对本申请进行若干改进和修饰,这些改进和修饰也落入本申请权利要求的保护范围内。Each embodiment in the description is described in a progressive manner, each embodiment focuses on the difference from other embodiments, and the same and similar parts of each embodiment can be referred to each other. As for the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and for relevant details, please refer to the description of the method part. It should be pointed out that those skilled in the art can make several improvements and modifications to the application without departing from the principles of the application, and these improvements and modifications also fall within the protection scope of the claims of the application.
还需要说明的是,在本说明书中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或 者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的状况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should also be noted that in this specification, relative terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that these entities or operations There is no such actual relationship or order between the operations. Furthermore, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes elements not expressly listed. other elements of or also include elements inherent in such a process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article or apparatus comprising said element.

Claims (14)

  1. 一种行人重识别方法,其特征在于,包括:A pedestrian re-identification method, characterized in that, comprising:
    构建基于卷积神经网络的辅助训练模型和目标模型;Build auxiliary training models and target models based on convolutional neural networks;
    确定所述辅助训练模型和所述目标模型的损失函数,并利用所述损失函数训练所述辅助训练模型和所述目标模型;determining a loss function of the auxiliary training model and the target model, and using the loss function to train the auxiliary training model and the target model;
    在所述辅助训练模型训练完毕后,将所述辅助训练模型的知识迁移至所述目标模型,得到行人重识别模型;After the auxiliary training model is trained, the knowledge of the auxiliary training model is transferred to the target model to obtain a pedestrian re-identification model;
    向所述行人重识别模型输入行人图像,得到所述行人图像的嵌入层特征;Input the pedestrian image to the pedestrian re-identification model to obtain the embedded layer features of the pedestrian image;
    将所述行人图像的嵌入层特征与待查询图像的嵌入层进行相似度比对,并根据相似度比对结果输出行人重识别结果。Comparing the features of the embedded layer of the pedestrian image with the embedded layer of the image to be queried for similarity comparison, and outputting a pedestrian re-identification result according to the similarity comparison result.
  2. 根据权利要求1所述行人重识别方法,其特征在于,构建基于卷积神经网络的辅助训练模型和目标模型,包括:According to the described pedestrian re-identification method of claim 1, it is characterized in that constructing an auxiliary training model and a target model based on a convolutional neural network, comprising:
    构建包括至少两个卷积神经网络的所述辅助训练模型,构建包括至少两个卷积神经网络的所述目标模型;Constructing the auxiliary training model comprising at least two convolutional neural networks, constructing the target model comprising at least two convolutional neural networks;
    或,利用包括至少两个头部网络的卷积神经网络构建所述辅助训练模型,利用包括至少两个头部网络的卷积神经网络构建所述目标模型;Or, using a convolutional neural network comprising at least two head networks to construct the auxiliary training model, and utilizing a convolutional neural network comprising at least two head networks to construct the target model;
    其中,所述头部网络包括池化层、嵌入层、全连接层、输出层和softmax层。Wherein, the head network includes a pooling layer, an embedding layer, a fully connected layer, an output layer and a softmax layer.
  3. 根据权利要求2所述行人重识别方法,其特征在于,确定所述辅助训练模型和所述目标模型的损失函数,包括:The pedestrian re-identification method according to claim 2, wherein determining the loss function of the auxiliary training model and the target model comprises:
    提供所述卷积神经网络的交叉熵损失函数,所述交叉熵损失函数用于算出每一所述卷积神经网络的交叉熵损失;The cross-entropy loss function of the convolutional neural network is provided, and the cross-entropy loss function is used to calculate the cross-entropy loss of each of the convolutional neural networks;
    提供所述卷积神经网络的特征相似性损失函数,所述特征相似性损失函用于计算所述至少两个卷积神经网络中的任意两个卷积神经网络间的特征相似性损失;A feature similarity loss function of the convolutional neural network is provided, and the feature similarity loss function is used to calculate the feature similarity loss between any two convolutional neural networks in the at least two convolutional neural networks;
    提供所述卷积神经网络的类中心损失函数,所述类中心损失函数用于计算每一所述卷积神经网络的类中心损失;A class center loss function of the convolutional neural network is provided, and the class center loss function is used to calculate the class center loss of each of the convolutional neural networks;
    提供所述卷积神经网络的、对类中心距离进行约束的损失函数,所述对类中心距离进行约束的损失函数用于计算每一所述卷积神经网络的、对类中心距离进行约束的损失;A loss function that constrains the class center distance of the convolutional neural network is provided, and the loss function that constrains the class center distance is used to calculate each of the convolutional neural networks that constrains the class center distance. loss;
    根据所述交叉熵损失函数、所述特征相似性损失函数、所述类中心损失函数、以及所述对类中心距离进行约束的损失函数确定所述辅助训练模型和所述目标模型的损失函数。The loss functions of the auxiliary training model and the target model are determined according to the cross-entropy loss function, the feature similarity loss function, the class center loss function, and the loss function constraining class center distances.
  4. 根据权利要求3所述行人重识别方法,其特征在于,所述特征相似性损失函数为:The pedestrian re-identification method according to claim 3, wherein the feature similarity loss function is:
    Figure PCTCN2022090217-appb-100001
    Figure PCTCN2022090217-appb-100001
    其中,L m代表特征相似性损失,N代表样本数,n代表第n个输入样本,u和v代表第u个网络和第v个网络,
    Figure PCTCN2022090217-appb-100002
    代表第u个网络的第n个输入样本的嵌入层输出结果,
    Figure PCTCN2022090217-appb-100003
    代表第v个网络的第n个输入样本的嵌入层输出结果。5.根据权利要求3所述行人重识别方法,其特征在于,所述类中心损失函数为:
    Among them, L m represents the feature similarity loss, N represents the number of samples, n represents the nth input sample, u and v represent the uth network and the vth network,
    Figure PCTCN2022090217-appb-100002
    represents the output of the embedding layer of the nth input sample of the uth network,
    Figure PCTCN2022090217-appb-100003
    Represents the output of the embedding layer for the nth input sample of the vth network. 5. according to the described pedestrian re-identification method of claim 3, it is characterized in that, described class center loss function is:
    Figure PCTCN2022090217-appb-100004
    Figure PCTCN2022090217-appb-100004
    其中,
    Figure PCTCN2022090217-appb-100005
    代表第u个网络的类中心损失,N代表样本数,n代表第n个输入样本,u代表第u个网络,
    Figure PCTCN2022090217-appb-100006
    代表第u个网络的第n个样本的嵌入特征,且,该第n个样本的类别是c类,
    Figure PCTCN2022090217-appb-100007
    代表
    Figure PCTCN2022090217-appb-100008
    所对应的类中心,即,
    Figure PCTCN2022090217-appb-100009
    代表第u个网络的嵌入层特征的第c类的类中心。
    in,
    Figure PCTCN2022090217-appb-100005
    Represents the class center loss of the u-th network, N represents the number of samples, n represents the n-th input sample, u represents the u-th network,
    Figure PCTCN2022090217-appb-100006
    Represents the embedding feature of the nth sample of the uth network, and the category of the nth sample is the c class,
    Figure PCTCN2022090217-appb-100007
    represent
    Figure PCTCN2022090217-appb-100008
    The corresponding class center, that is,
    Figure PCTCN2022090217-appb-100009
    Represents the class center of the c-th class of embedding layer features of the u-th network.
  5. 根据权利要求5所述行人重识别方法,其特征在于,还包括:The pedestrian re-identification method according to claim 5, further comprising:
    对最近一次确定的类中心和当前输出的嵌入层特征进行加权计算,得到更新后的类中心,即,通过以下公式:Perform weighted calculations on the latest determined class center and the current output embedding layer features to obtain the updated class center, that is, through the following formula:
    Figure PCTCN2022090217-appb-100010
    Figure PCTCN2022090217-appb-100010
    其中,
    Figure PCTCN2022090217-appb-100011
    代表更新后的类中心
    Figure PCTCN2022090217-appb-100012
    代表第u个网络的第n个样本的嵌入特征,且,该第n个样本的类别是c类,
    Figure PCTCN2022090217-appb-100013
    代表
    Figure PCTCN2022090217-appb-100014
    所对应的类中心,即,
    Figure PCTCN2022090217-appb-100015
    代表第u个网络的嵌入层特征的第c类的类中心,α、β代表加权值。
    in,
    Figure PCTCN2022090217-appb-100011
    Represents the updated class center
    Figure PCTCN2022090217-appb-100012
    Represents the embedding feature of the nth sample of the uth network, and the category of the nth sample is the c class,
    Figure PCTCN2022090217-appb-100013
    represent
    Figure PCTCN2022090217-appb-100014
    The corresponding class center, that is,
    Figure PCTCN2022090217-appb-100015
    Represents the class center of the c-th class of the embedding layer features of the u-th network, and α and β represent weighted values.
  6. 根据权利要求6所述行人重识别方法,其特征在于,在对最近一次确定的类中心和当前输出的嵌入层特征进行加权计算之前,还包括:According to the described pedestrian re-identification method of claim 6, it is characterized in that, before carrying out weighted calculation to the class center determined last time and the embedding layer feature of current output, also include:
    判断所述当前输出的嵌入层特征对应的特征分类是否正确;Judging whether the feature classification corresponding to the currently output embedding layer feature is correct;
    若是,则进入对最近一次确定的类中心和当前输出的嵌入层特征进行加权计算的步骤。If yes, then enter the step of weighting the last determined class center and the current output embedding layer features.
  7. 根据权利要求3所述行人重识别方法,其特征在于,所述所述对类中心距离进行约束包括:The pedestrian re-identification method according to claim 3, wherein said constraining the class center distance comprises:
    通过艰难样本挖掘,求出各个类中心的最小类间差;Through difficult sample mining, find the minimum inter-class difference of each class center;
    利用所述最小类间差,对类中心距离进行约束。Using the minimum inter-class difference, constrain the class center distance.
  8. 根据权利要求8所述行人重识别方法,其特征在于,所述对类中心距离进行约束的损失函数为:The pedestrian re-identification method according to claim 8, wherein the loss function constraining the class center distance is:
    Figure PCTCN2022090217-appb-100016
    Figure PCTCN2022090217-appb-100016
    其中,
    Figure PCTCN2022090217-appb-100017
    代表第u个卷积神经网络的对类中心距离进行约束的损失,C代表样本的类别数,
    Figure PCTCN2022090217-appb-100018
    代表第u个网络的第i类的类中心,
    Figure PCTCN2022090217-appb-100019
    代表距离
    Figure PCTCN2022090217-appb-100020
    最近的类中心。
    in,
    Figure PCTCN2022090217-appb-100017
    Represents the loss of the u-th convolutional neural network that constrains the class center distance, C represents the number of categories of samples,
    Figure PCTCN2022090217-appb-100018
    represents the class center of the i-th class of the u-th network,
    Figure PCTCN2022090217-appb-100019
    representative distance
    Figure PCTCN2022090217-appb-100020
    nearest class center.
  9. 根据权利要求3所述行人重识别方法,其特征在于,所述根据所述交叉熵损失函数、所述特征相似性损失函数、所述类中心损失函数、以及所述对类中心距离进行约束的损失函数确定所述辅助训练模型和所述目标模型的损失函数包括:According to the described pedestrian re-identification method of claim 3, it is characterized in that, according to the described cross-entropy loss function, the feature similarity loss function, the class center loss function, and the constraint on the class center distance The loss function determining the loss function of the auxiliary training model and the target model includes:
    将所述交叉熵损失函数、所述特征相似性损失函数、所述类中心损失函数、以及所述对类中心距离进行约束的损失函数相加,获得所述辅助训练模型和所述目标模型的损失函数;Adding the cross-entropy loss function, the feature similarity loss function, the class center loss function, and the loss function constraining the class center distance to obtain the auxiliary training model and the target model loss function;
    其中,所述辅助训练模型和所述目标模型的损失函数所计算出的损失为所述交叉熵损失函数、所述特征相似性损失函数、所述类中心损失函数、以及所述对类中心距离进行约束的损失函数所分别计算出的损失之和。Wherein, the loss calculated by the loss function of the auxiliary training model and the target model is the cross-entropy loss function, the feature similarity loss function, the class center loss function, and the class center distance The sum of losses calculated separately by the constrained loss functions.
  10. 根据权利要求3所述行人重识别方法,其特征在于,所述利用所述损失函数训练所述辅助训练模型和所述目标模型包括:The pedestrian re-identification method according to claim 3, wherein said using said loss function to train said auxiliary training model and said target model comprises:
    步骤a:对所述待训练模型中的各个网络层的权重进行初始化,其中,所述待训练模型为所述辅助训练模型和所述模板模型中的任一;Step a: Initialize the weights of each network layer in the model to be trained, wherein the model to be trained is any one of the auxiliary training model and the template model;
    步骤b:选取训练数据,向所述待训练模型输入所述训练数据,将所述训练数据在所述待训练模型中前向传播而使所述训练数据依次通过所述各个网络层,并输出前向传播输出值;Step b: select training data, input the training data to the model to be trained, propagate the training data forward in the model to be trained so that the training data passes through the various network layers in turn, and output Propagate the output value forward;
    步骤c:利用损失函数求得所述前向传播输出值与目标值之间的误差;Step c: using a loss function to obtain the error between the forward propagation output value and the target value;
    步骤d:将所述误差在所述待训练模型中反向传播而求得所述各个网络层的反向传播误差;Step d: backpropagating the error in the model to be trained to obtain the backpropagation error of each network layer;
    步骤e:基于所述反向传播误差更新所述各个网络层的权重;Step e: updating the weights of each network layer based on the backpropagation error;
    步骤f:重复所述步骤b至步骤e,并在所述误差小于误差阈值时,结束所述待训练模型的训练,或者,在所述重复达到指定次数时,结束所述待训练模型的训练。Step f: Repeat steps b to e, and when the error is less than an error threshold, end the training of the model to be trained, or end the training of the model to be trained when the repetition reaches a specified number of times .
  11. 根据权利要求1至11任一项所述行人重识别方法,其特征在于,所述构建基于卷积神经网络的辅助训练模型和目标模型,包括:The pedestrian re-identification method according to any one of claims 1 to 11, wherein the construction of an auxiliary training model and a target model based on a convolutional neural network includes:
    按照预设规则构建基于卷积神经网络的所述辅助训练模型和所述目标模型;其中,所述预设规则为所述辅助训练模型的模型复杂度高于所述目标模型的模型复杂度。Constructing the auxiliary training model and the target model based on a convolutional neural network according to preset rules; wherein, the preset rule is that the model complexity of the auxiliary training model is higher than the model complexity of the target model.
  12. 一种行人重识别系统,其特征在于,包括:A pedestrian re-identification system, characterized in that it includes:
    模型构建模块,用于构建基于卷积神经网络的辅助训练模型和目标模型;Model building blocks for constructing convolutional neural network-based auxiliary training models and target models;
    模型训练模块,用于确定所述辅助训练模型和所述目标模型的损失函数,并利用所述损失函数训练所述辅助训练模型和所述目标模型;A model training module, configured to determine a loss function of the auxiliary training model and the target model, and use the loss function to train the auxiliary training model and the target model;
    知识迁移模块,用于在所述辅助训练模型训练完毕后,将所述辅助训练模型的知识迁移至所述目标模型,得到行人重识别模型;A knowledge transfer module, configured to transfer the knowledge of the auxiliary training model to the target model to obtain a pedestrian re-identification model after the training of the auxiliary training model is completed;
    特征提取模块,用于向所述行人重识别模型输入行人图像,得到所述行人图像的嵌入层特征;A feature extraction module, configured to input a pedestrian image to the pedestrian re-identification model to obtain the embedded layer features of the pedestrian image;
    行人重识别模块,用于将所述行人图像的嵌入层特征与待查询图像的嵌入层进行相似度比对,并根据相似度比对结果输出行人重识别结果。The pedestrian re-identification module is used to compare the similarity between the embedded layer features of the pedestrian image and the embedded layer of the image to be queried, and output the pedestrian re-identification result according to the similarity comparison result.
  13. 一种电子设备,其特征在于,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器调用所述存储器中的计算机程序时实现如权利要求1至11任一项所述行人重识别方法的步骤。An electronic device, characterized in that it includes a memory and a processor, wherein a computer program is stored in the memory, and when the processor invokes the computer program in the memory, the pedestrian described in any one of claims 1 to 11 is implemented. Steps in the re-identification method.
  14. 一种存储介质,其特征在于,所述存储介质中存储有计算机可执行指令,所述计算机可执行指令被处理器加载并执行时,实现如权利要求1至11任一项所述行人重识别方法的步骤。A storage medium, characterized in that computer-executable instructions are stored in the storage medium, and when the computer-executable instructions are loaded and executed by a processor, pedestrian re-identification according to any one of claims 1 to 11 is realized method steps.
PCT/CN2022/090217 2021-11-15 2022-04-29 Person re-identification method and system, and electronic device and storage medium WO2023082561A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111344388.3A CN114299442A (en) 2021-11-15 2021-11-15 Pedestrian re-identification method and system, electronic equipment and storage medium
CN202111344388.3 2021-11-15

Publications (1)

Publication Number Publication Date
WO2023082561A1 true WO2023082561A1 (en) 2023-05-19

Family

ID=80964180

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/090217 WO2023082561A1 (en) 2021-11-15 2022-04-29 Person re-identification method and system, and electronic device and storage medium

Country Status (2)

Country Link
CN (1) CN114299442A (en)
WO (1) WO2023082561A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116311387A (en) * 2023-05-25 2023-06-23 浙江工业大学 Cross-modal pedestrian re-identification method based on feature intersection

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114299442A (en) * 2021-11-15 2022-04-08 苏州浪潮智能科技有限公司 Pedestrian re-identification method and system, electronic equipment and storage medium
CN116824695A (en) * 2023-06-07 2023-09-29 南通大学 Pedestrian re-identification non-local defense method based on feature denoising

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019029935A (en) * 2017-08-02 2019-02-21 キヤノン株式会社 Image processing system and control method thereof
CN112560631A (en) * 2020-12-09 2021-03-26 昆明理工大学 Knowledge distillation-based pedestrian re-identification method
CN113255604A (en) * 2021-06-29 2021-08-13 苏州浪潮智能科技有限公司 Pedestrian re-identification method, device, equipment and medium based on deep learning network
CN113297906A (en) * 2021-04-20 2021-08-24 之江实验室 Knowledge distillation-based pedestrian re-recognition model compression method and evaluation method
CN114299442A (en) * 2021-11-15 2022-04-08 苏州浪潮智能科技有限公司 Pedestrian re-identification method and system, electronic equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113111968B (en) * 2021-04-30 2024-03-22 北京大米科技有限公司 Image recognition model training method, device, electronic equipment and readable storage medium
CN113191338B (en) * 2021-06-29 2021-09-17 苏州浪潮智能科技有限公司 Pedestrian re-identification method, device and equipment and readable storage medium
CN113191461B (en) * 2021-06-29 2021-09-17 苏州浪潮智能科技有限公司 Picture identification method, device and equipment and readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019029935A (en) * 2017-08-02 2019-02-21 キヤノン株式会社 Image processing system and control method thereof
CN112560631A (en) * 2020-12-09 2021-03-26 昆明理工大学 Knowledge distillation-based pedestrian re-identification method
CN113297906A (en) * 2021-04-20 2021-08-24 之江实验室 Knowledge distillation-based pedestrian re-recognition model compression method and evaluation method
CN113255604A (en) * 2021-06-29 2021-08-13 苏州浪潮智能科技有限公司 Pedestrian re-identification method, device, equipment and medium based on deep learning network
CN114299442A (en) * 2021-11-15 2022-04-08 苏州浪潮智能科技有限公司 Pedestrian re-identification method and system, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GAO, HAN; TIAN, YU-LONG; XU, FENG-YUAN; ZHONG, SHENG: "Survey of Deep Learning Model Compression and Acceleration", JOURNAL OF SOFTWARE, vol. 32, no. 1, 31 January 2021 (2021-01-31), pages 68 - 92, XP009546333, ISSN: 1000-9825, DOI: 10.13328/j.cnki.jos.006096 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116311387A (en) * 2023-05-25 2023-06-23 浙江工业大学 Cross-modal pedestrian re-identification method based on feature intersection
CN116311387B (en) * 2023-05-25 2023-09-01 浙江工业大学 Cross-modal pedestrian re-identification method based on feature intersection

Also Published As

Publication number Publication date
CN114299442A (en) 2022-04-08

Similar Documents

Publication Publication Date Title
WO2023082561A1 (en) Person re-identification method and system, and electronic device and storage medium
US11816149B2 (en) Electronic device and control method thereof
CN112905827B (en) Cross-modal image-text matching method, device and computer readable storage medium
CN111382868B (en) Neural network structure searching method and device
WO2023280065A1 (en) Image reconstruction method and apparatus for cross-modal communication system
US20160224903A1 (en) Hyper-parameter selection for deep convolutional networks
CN113204952B (en) Multi-intention and semantic slot joint identification method based on cluster pre-analysis
CN110298395B (en) Image-text matching method based on three-modal confrontation network
WO2023272995A1 (en) Person re-identification method and apparatus, device, and readable storage medium
CN113326377A (en) Name disambiguation method and system based on enterprise incidence relation
CN113065587B (en) Scene graph generation method based on hyper-relation learning network
CN109190521B (en) Construction method and application of face recognition model based on knowledge purification
US20220383127A1 (en) Methods and systems for training a graph neural network using supervised contrastive learning
CN111931641A (en) Pedestrian re-identification method based on weight diversity regularization and application thereof
WO2023272993A1 (en) Image recognition method and apparatus, and device and readable storage medium
CN114817673A (en) Cross-modal retrieval method based on modal relation learning
CN113723238B (en) Face lightweight network model construction method and face recognition method
CN113806582B (en) Image retrieval method, image retrieval device, electronic equipment and storage medium
CN113361627A (en) Label perception collaborative training method for graph neural network
CN114969367B (en) Cross-language entity alignment method based on multi-aspect subtask interaction
WO2023272994A1 (en) Person re-identification method and apparatus based on deep learning network, device, and medium
CN114818719A (en) Community topic classification method based on composite network and graph attention machine mechanism
CN110633689A (en) Face recognition model based on semi-supervised attention network
CN116015967B (en) Industrial Internet intrusion detection method based on improved whale algorithm optimization DELM
CN115830643A (en) Light-weight pedestrian re-identification method for posture-guided alignment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22891372

Country of ref document: EP

Kind code of ref document: A1