CN112560631B - Knowledge distillation-based pedestrian re-identification method - Google Patents

Knowledge distillation-based pedestrian re-identification method Download PDF

Info

Publication number
CN112560631B
CN112560631B CN202011431855.1A CN202011431855A CN112560631B CN 112560631 B CN112560631 B CN 112560631B CN 202011431855 A CN202011431855 A CN 202011431855A CN 112560631 B CN112560631 B CN 112560631B
Authority
CN
China
Prior art keywords
network
student
teacher
distillation
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011431855.1A
Other languages
Chinese (zh)
Other versions
CN112560631A (en
Inventor
尚振宏
李粘粘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN202011431855.1A priority Critical patent/CN112560631B/en
Publication of CN112560631A publication Critical patent/CN112560631A/en
Application granted granted Critical
Publication of CN112560631B publication Critical patent/CN112560631B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Human Computer Interaction (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)
  • Traffic Control Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a pedestrian re-identification method based on knowledge distillation, which comprises the following steps: inputting a pedestrian image training set into a teacher network, and inputting the same data set into a student network; distilling is carried out simultaneously at a plurality of stages of the whole backbone network through the synergistic effect of student network transfer, characteristic distilling positions and distance loss functions, so that the characteristic output of the student network is continuously close to the characteristic output of the teacher network; parameters of the student model are updated in a minimized mode through a distillation loss function, and a student network is trained; distance measurement is carried out on the obtained feature vectors, a pedestrian target graph with the highest similarity is searched, and finally the accuracy of the student network resnet18 is greatly improved to be close to that of the teacher network resnet 50. The method realizes the re-recognition of personnel by using a knowledge distillation migration learning method, effectively reduces the computational complexity by replacing a large model with a small model, and ensures the accuracy of a student model.

Description

Knowledge distillation-based pedestrian re-identification method
Technical Field
The invention relates to the field of computer vision and image processing, in particular to a pedestrian re-identification method based on knowledge distillation.
Background
The purpose of person re-identification is to find a particular pedestrian in a library of images taken by many different cameras. The difficulty of this problem lies in the following aspects: the shooting visual angles, pedestrian postures, illumination intensities and shelters of different pictures can be greatly different. And in the pedestrian re-identification module, comparing the specified query image with the pictures in the gallery, and retrieving the picture of the same person as the query image. To compare the pictures in the gallery to the query pictures, the system first extracts a feature representation describing each image using a hand-made descriptor or deep neural network. Usually, the characteristics of the gallery are calculated and stored off-line in advance, so that during testing, only the characteristics of the query image need to be extracted. After the features are extracted, they can be compared to the features of the gallery by computing a similarity measure.
In an actual application scenario, the computing resources are often limited to a certain extent, the computing cost of the algorithm in operation must be optimized, and the algorithm can still maintain a high accuracy while the computing overhead is reduced. The pedestrian re-identification algorithm mainly comprises a manual characteristic-based method and a deep learning method, and the accuracy of the deep learning-based pedestrian re-identification method far exceeds that of the traditional manual characteristic-based method. However, the calculation cost of the operation of the deep neural network is high, so that a pedestrian re-identification method based on deep learning can be adopted, the calculation cost is reduced on the basis of the existing deep learning method, and the actual scene requirements are better met.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a pedestrian re-identification method based on knowledge distillation, which is used for reducing the calculation overhead on the existing deep learning method and better meeting the actual scene requirements. New methods of knowledge distillation are therefore proposed, smaller models trained with the support of larger/deeper networks, allowing for reduced computational effort and enabling small models to achieve accuracy very similar to deep networks.
In order to solve the technical problems, the technical scheme of the invention is as follows: a pedestrian re-identification method based on knowledge distillation comprises the following steps:
step 1, inputting a pedestrian image training set into a PCB of a resnet50 as a teacher network, and inputting the same data set into a PCB of a resnet18 as a student network;
step 2, distilling is carried out simultaneously at a plurality of stages of the whole backbone network through the synergistic effect of student network transfer, characteristic distillation positions and distance loss functions, so that the characteristic output of the student network is continuously close to the characteristic output of the teacher network;
step 3, loss function L by distillationdistillMinimizing and updating parameters of the student model, and training a student network;
and 4, measuring the distance of the obtained feature vectors, and searching out a pedestrian target map with the highest similarity.
As a further description of the above technical solution: the teacher model in the step 1 is a trained model, and is a complex model which completes the same task as the student model and is used for assisting in training a student network; the teacher network trains by using a network structure of a PCB with a backbone network as resnet50, and the student network trains by simulating a teacher by using a distillation method by using a PCB with a backbone network as resnet 18; the feature graph output by the backbone network is longitudinally and uniformly divided into 6 parts, namely 6 tensors with the space size of 4 × 8, then global average pooling is respectively carried out to obtain 6 features A, the features A are reduced into the number of channels by 1 × 1 convolution, and then the full connection layer and softmax are respectively connected.
As a further description of the above technical solution: the student network transfer process in the step 2 is as follows: changing the dimensionality of a student network, processing the feature diagram of the student network, increasing the dimensionality of the student network to the number of channels of the feature diagram of the teacher network corresponding to the student network through 1 × 1 convolution, and distilling the feature diagram before ReLU, wherein the values in the feature diagram comprise positive numbers and negative numbers, the student network only needs to be close to the positive values of the teacher network as much as possible, and for the negative values, the output of the student network does not need to be completely consistent with the negative values of the teacher network, but only needs to be negative as the teacher network; thus, after passing through the ReLU layer, negative values of both the teacher network and the student network will output 0.
As a further description of the above technical solution: in the step 2, distillation position selection is carried out at a plurality of down-sampling stages of a backbone network, and when resnet is adopted by the backbone network, the distillation position selection is carried out at the ends of Conv2_ x, Conv3_ x, Conv4_ x and Conv5_ x of the resnet; the distillation method is structurally divided into two parts, wherein the first part is distilled at different stages of a backbone network; the second part distills the features behind the fully connected layer; finally, the feature sFeatureD output by the student network after the full connection layer is similar to the feature tFeatureD output by the teacher network as much as possible.
As a further description of the above technical solution: in the step 2, the loss function of the distillation of the first part in the backbone network part, and the characteristics N, S e R extracted by the teacher network and the student network at each stage of the backbone networkW×H×CValue N of the ith bit in the featurei,SiE is R, R is a three-dimensional vector of the characteristic diagram, W is width, H is height, and C is the number of channels; feature pass for distillation in student networkAfter the 1 × 1 convolution and batch normalization are converted to be consistent with the characteristic dimension of the teacher network for distillation, calculating the distance between the student and the teacher network characteristic as shown in the formula (1);
Figure BDA0002826774730000031
in the formula (1), N represents teacher 'S characteristics, S represents student' S characteristics, dp(N, S) represents a distance function, by dpThe distance loss function calculated by the (N, S) enables the output of the student network at a plurality of stages of the backbone network to be more and more similar to the output of the teacher network at corresponding stages, so that the human body characteristics extracted by the network are also more similar, r is used as a conversion function of 1x1 convolution and batch normalization after the characteristic diagram is extracted by the student backbone network, and the distillation loss function of the first part is defined as shown in a formula (2):
Ldistill1=dp(Fn,r(Fs)) (2)
in the formula (2), FnRepresenting teacher features, FsRepresenting the characteristics of the students before the network conversion.
As a further description of the above technical solution: the second part of distillation in the step 2 is to distill the extracted human body characteristics, namely the network characteristics behind the full connection layer; the modified Softmax loss function proposed by Hinton is shown in equation (3):
Figure BDA0002826774730000041
in the formula (3), T is a temperature parameter, and when T is equal to 1, the T is a standard softmax function; when T is increased, the probability distribution of softmax output becomes smoother, so that more information of the teacher network can be utilized;
when the student network is trained, the softmax function of the student network uses the same T as the teacher network, and the loss function takes the soft label output by the teacher network as a target; such a loss function is called "cancellation loss"; the effect is better when the correct data label is used in the training process; the method specifically comprises the steps of calculating the rejection loss, and calculating the standard loss T which is 1 by using hard label at the same time, wherein the loss is called 'student loss'; and integrating the two losses to obtain a distillation loss function of the second part, wherein an integrated formula is shown as a formula (4):
Ldistill2(x;θ)=α*M(y,σ(zs,T=1))+β*M(σ(zt;T=τ),σ(zs,T=τ)) (4)
in the formula (4), x is input, theta is a parameter of the student model, M is a cross entropy loss function, y is a real label, sigma is a function of which the parameter has T, alpha and beta are coefficients, and z iss,ztThe locations output of the student and teacher, respectively.
As a further description of the above technical solution: the loss function of the knowledge distillation pedestrian re-recognition finally obtained in the step 3 is shown as a formula (5):
Ldistill=λ*Ldistill1+μLdistill2 (5)
in the formula (5), λ and μ are constants.
As a further description of the above technical solution: and 4, comparing the feature vector of the image to be identified with the pedestrian feature vector of the image set, and searching out the pedestrian target image with the highest similarity.
Compared with the prior art, the invention has the following beneficial effects: the present invention proposes an improvement to optimize this trade-off, in view of the most appropriate configuration for the actual application conditions, providing a trade-off analysis between accuracy of test time and computational cost in the human re-identification problem. To this end, the present invention uses resnet50 (as a teacher) to transfer knowledge into a more compact model represented by resnet-18 (as a student), and finally the accuracy of resnet18 is greatly improved to close to that of resnet 50. The method realizes the re-recognition of personnel by using a knowledge distillation migration learning method, effectively reduces the computational complexity by replacing a large model with a small model, and ensures the accuracy of a student model. The calculation amount is reduced, and the small model can achieve the accuracy very similar to that of a deep network.
In the invention, because the dimensionality of the teacher network is higher in the student network conversion, the dimensionality of the student model and the dimensionality of the teacher model can be kept consistent as much as possible by the 1x1 convolution + BN, so that the method is more beneficial to extracting characteristic information and enables the training to be more stable. The distance function is used as the distance loss function, when the loss function is large, the output characteristics of students and the characteristics of teachers are continuously close to each other as much as possible through gradient back propagation, parameter updating and loss function minimization.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
The technical solutions of the present invention will be described in further detail with reference to the drawings and specific examples, but the present invention is not limited to the following technical solutions.
Example 1
Step 1, inputting a pedestrian image training set into a PCB of a resnet50 as a teacher network, and using the PCB of a resnet18 as a student network;
in the step 1, the teacher model is a trained model and a complex model which completes the same task with the student model and is used for assisting in training a student network. The teacher network trains by using a network structure of a PCB with a backbone network as resnet50, and the student network trains by simulating teachers by using a distillation method by using a PCB with a backbone network as resnet 18. The feature graph output by the backbone network is longitudinally and uniformly divided into 6 parts, namely 6 tensors with the space size of 4 × 8, then global average pooling is respectively carried out to obtain 6 features A, the features A are reduced into the number of channels by using 1 × 1 convolution, and then the full connection layer and softmax are respectively connected.
Step 2, simultaneously distilling the student network at multiple stages of the whole backbone network through the synergistic effect of student network transfer, characteristic distillation positions and distance loss functions, so that the characteristic output of the student network is continuously close to the characteristic output of a teacher network;
the student network transfer process is as follows: the dimensionality of the student network is changed, and because the number of output characteristic diagram channels of the teacher network and the student network in different stages of the backbone network is not consistent, the difference between the teacher network characteristic diagram and the student network characteristic diagram cannot be directly calculated. We process the feature map of the student network, and make the student network increase dimension to the number of channels of the corresponding teacher network feature map through 1x1 convolution. Further, the distillation method in the present embodiment sufficiently takes the characteristics of relu into consideration here. And (4) taking the characteristic diagram before ReLU for distillation, wherein the values in the characteristic diagram comprise positive numbers and negative numbers. The student network only needs to be as close as possible to the positive value of the teacher network, and for negative values, the output of the student network does not need to be completely consistent with the negative value of the teacher network, but only needs to be negative as the teacher network. Thus, after passing through the ReLU layer, negative values of both the teacher network and the student network will output 0.
Distillation position selection is performed at multiple down-sampling stages of the backbone network. When resnet is used as the backbone network, it is performed at ends of Conv2_ x, Conv3_ x, Conv4_ x and Conv5_ x of resnet. The distillation method is structurally divided into two parts, wherein the first part is used for distilling at different stages of a backbone network; the second part distills the features after full connection of the layers. Finally, it is desirable that the feature sFeatureD before the fully connected layer of the student network output be as similar as possible to the feature tFeatureD of the teacher network output.
Loss function of distillation of the first part in the backbone network part, and features N, S epsilon R extracted by the teacher network and the student network at each stage of the backbone networkW×H×CValue N of the ith bit in the featurei,SiE.g. R, after the characteristics of the student network for distillation are converted into the characteristics with the same dimension as the characteristics of the teacher network for distillation through 1x1 convolution and batch normalization, calculating the distance between the network characteristics of the student and the teacher, as shown in the formula (1);
Figure BDA0002826774730000061
in the formula (1), N represents teacher 'S characteristics, S represents student' S characteristics, dp(N, S) represents a distance function, by dpThe distance loss function calculated by the (N, S) enables the output of the student network at a plurality of stages of the backbone network to be more and more similar to the output of the teacher network at corresponding stages, so that the human body characteristics extracted by the network are also more similar, r is used as a conversion function of 1x1 convolution and batch normalization after the characteristic diagram is extracted by the student backbone network, and the distillation loss function of the first part is defined as shown in a formula (2):
Ldistill1=dp(Fn,r(Fs)) (2);
in the formula (2), FnRepresenting teacher features, FsRepresenting the characteristics of students before the network conversion;
the second part of distillation is to distill the extracted human body characteristics, namely the network characteristics behind the full connection layer; the second part of distillation in the step 2 is to distill the extracted human body characteristics, namely the network characteristics after the full connection layer, and an improved Soft max function proposed by Hinton is utilized, as shown in formula (3):
Figure BDA0002826774730000071
in the formula (3), T is a temperature parameter. When T is equal to 1, it is the standard softmax function. As T increases, the probability distribution of softmax output becomes more soft (smooth), thus making more information available to the teacher model. When training a student, the student's softmax function uses the same T as the teacher, and the loss function targets the soft label output by the teacher. Such a loss function is called "rejection loss". The use of the correct data label (hard label) during the training process will work better. Specifically, while calculating the rejection loss, i also calculate the loss (T ═ 1) of the standard by using hard label, and this loss is called "student loss". The formula for integrating the two types of loss is shown in formula (4):
Ldistill2(x;θ)=α*M(y,σ(zs,T=1))+β*M(σ(zt;T=τ),σ(zs,T=τ)) (4)
in the formula (4), x is input, theta is a parameter of the student model, M is a cross entropy loss function, y is a true label, sigma is a function of which the parameter has T, tau is greater than 1, alpha and beta are coefficients, z iss,ztThe locations output of the student and teacher, respectively.
Step 3, loss function L by distillationdistillMinimally updating parameters of the student model, and training a student network; finally, the loss function of knowledge distillation pedestrian re-identification is obtained as shown in the formula (5):
Ldistill=λ*Ldistill1+μLdistill2 (5)
in the formula (5), λ is 2 and μ is 6.
And 4, comparing the feature vector of the image to be identified with the pedestrian feature vector of the image set, measuring the distance of the obtained feature vector, and searching out the pedestrian target image with the highest similarity.
In the embodiment, the size of the input picture is 384 × 128, the batch size is 64, the resnet50 is used as the backbone network of the teacher network, and the parameters trained on the ImageNet are used as the pre-training model. The SGD with momentum set to 0.9 was chosen as the optimizer. The initial learning rate for the distillation training was 0.5, the learning rate was attenuated to 0.05 and 0.0005 at 20 epochs and 40 epochs, respectively, and the training was stopped at 60 epochs. During training, the pre-processing of the pictures adopts random horizontal turning for data amplification. The method of the embodiment and the existing method are adopted to carry out verification comparison on the Market-1501 data set and the DukeMTMC-reiD data set, the two data sets contain some problems existing in practical application, each identity appears in different cameras and presents different visual angles, postures and illumination changes, and therefore testing on the two data sets has great challenges and certain significance.
The results are shown in tables 1 and 2.
TABLE 1 Experimental results on Market-1501 data set
Figure BDA0002826774730000081
Figure BDA0002826774730000091
TABLE 2 Experimental results on DukeMTMC-reiD data set
Figure BDA0002826774730000092
In the experimental method of this example, PCB + MKD indicates that at the last four stages of resnet, resnet18 was distilled by resnet50 as teacher, PCB + FKD indicates that after the last fully-connected layer, resnet50 was distilled by resnet18 as teacher, PCB + MKD + FKD indicates that the four-stage distillation of resnet and the distillation after the fully-connected layer of PCB + FKD simultaneously used resnet50 as teacher to distill resnet 18.
During distillation, it is desirable that the pre-full connectivity layer signature of the student network output (sFeatureD) be as similar as possible to the teacher network output signature (tFeatureD). Although I only need to obtain sFeatureD similar to tFeatureD, satisfactory results are difficult to achieve if sFeatureD is distilled alone. Because of the large difference between the backbone network of the student network and the backbone network of the teacher network, it is extremely difficult to distill only with sFeatureD and tFeatureD if it is desired to make sFeatureD and tFeatureD as close as possible. We therefore perform simultaneous distillation at multiple stages throughout the backbone network to aid in the distillation of sFeatureD and tFeatureD, which makes sFeatureD similar to tFeatureD more readily available. The experimental results in tables 1 and 2 show that the performance of the student network is even better than that of the teacher network, and the effectiveness of the method provided by the invention is proved.
While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit and scope of the present invention.

Claims (2)

1. A pedestrian re-identification method based on knowledge distillation is characterized by comprising the following steps:
step 1, inputting a pedestrian image training set into a PCB of a resnet50 as a teacher network, and inputting the same data set into a PCB of a resnet18 as a student network;
step 2, distilling is carried out simultaneously at a plurality of stages of the whole backbone network through the synergistic effect of student network transfer, characteristic distillation positions and distance loss functions, so that the characteristic output of the student network is continuously close to the characteristic output of the teacher network;
step 3, loss function L by distillationdistillMinimizing and updating parameters of the student model, and training a student network;
step 4, measuring the distance of the obtained characteristic vectors, and searching out a pedestrian target map with the highest similarity;
the teacher network in the step 1 is a trained model, and a complex model which completes the same task with the student network is used for assisting in training the student network; the teacher network trains by using a network structure of a PCB with a backbone network as resnet50, and the student network trains by simulating a teacher model by using a PCB with a backbone network as resnet18 through a distillation method; the characteristic diagram output by the backbone network is longitudinally and uniformly divided into 6 parts, namely 6 tensors with the space size of 4 × 8, then global average pooling is respectively carried out to obtain 6 characteristic A, the characteristic A is reduced into the number of channels by 1 × 1 convolution, and then a full connection layer and softmax are respectively connected;
the student network transfer process in the step 2 is as follows: changing the dimensionality of a student network, processing the feature diagram of the student network, increasing the dimensionality of the student network to the number of channels of the feature diagram of the teacher network corresponding to the student network through 1 × 1 convolution, and distilling the feature diagram before ReLU, wherein the values in the feature diagram comprise positive numbers and negative numbers, the student network only needs to be close to the positive values of the teacher network as much as possible, and for the negative values, the output of the student network does not need to be completely consistent with the negative values of the teacher network, but only needs to be negative as the teacher network; therefore, after passing through the ReLU layer, the negative value of both the teacher network and the student network can output 0;
in the step 2, distillation position selection is carried out at a plurality of down-sampling stages of a backbone network, and when resnet is adopted by the backbone network, the distillation position selection is carried out at the ends of Conv2_ x, Conv3_ x, Conv4_ x and Conv5_ x of the resnet; the distillation method is structurally divided into two parts, wherein the first part is distilled at different stages of a backbone network; the second part distills the features behind the fully connected layer; finally, the feature sFeatureD output by the student network after the full connection layer is similar to the feature tFeatureD output by the teacher network as much as possible;
in the step 2, the loss function of the distillation of the first part in the backbone network part, and the characteristics N, S e R extracted by the teacher network and the student network at each stage of the backbone networkW×H×CValue N of the ith bit in the featurei,SiThe method comprises the following steps of (1) belonging to the field of R, wherein R is a three-dimensional vector of a characteristic diagram, W is width, H is height, and C is the number of channels; after the feature used for distillation of the student network is converted into the feature dimension consistent with the feature used for distillation of the teacher network through 1 × 1 convolution and batch normalization, calculating the distance between the student and the teacher network feature, as shown in formula (1);
Figure FDA0003644429850000021
in the formula (1), N represents teacher 'S characteristics, S represents student' S characteristics, dp(N, S) represents a distance function, by dpThe distance loss function calculated by the (N, S) enables the output of the student network at a plurality of stages of the backbone network to be more and more similar to the output of the teacher network at corresponding stages, so that the human body characteristics extracted by the network are also more similar, r is used as a conversion function of 1x1 convolution and batch normalization after the characteristic diagram is extracted by the student backbone network, and the distillation loss function of the first part is defined as shown in a formula (2):
Ldistill1=dp(Fn,r(Fs)) (2)
in the formula (2), FnRepresenting teacher features, FsRepresenting the characteristics of students before the network conversion;
the second part of distillation in the step 2 is to distill the extracted human body characteristics, namely the network characteristics behind the full connection layer; the modified Softmax loss function proposed by Hinton is shown in equation (3):
Figure FDA0003644429850000022
in the formula (3), T refers to a temperature parameter, and is a standard softmax function when T is equal to 1; when T is increased, the probability distribution of softmax output becomes smoother, so that more information of the teacher network can be utilized;
when the student network is trained, the softmax function of the student network uses the same T as the teacher network, and the loss function takes the soft label output by the teacher network as a target; such a loss function is called "rejection loss"; the effect is better when the correct data label is used in the training process; the method specifically comprises the steps of calculating the rejection loss, and calculating the standard loss T which is 1 by using hard label at the same time, wherein the loss is called 'student loss'; and integrating the two losses to obtain a distillation loss function of the second part, wherein an integrated formula is shown as a formula (4):
Ldistill2(x;θ)=α*M(y,σ(zs;T=1))+β*M(σ(zt;T=τ),σ(zs,T=τ)) (4)
in the formula (4), x is input, theta is a parameter of the student model, M is a cross entropy loss function, y is a real label, sigma is a function of which the parameter has T, alpha and beta are hyper-parameters, and z iss,ztThe locations output for students and teachers, respectively;
the loss function of the knowledge distillation pedestrian re-identification finally obtained in the step 3 is shown as a formula (5):
Ldistill=λ*Ldistill1+μLdistill2 (5)
in the formula (5), λ and μ are constants.
2. The pedestrian re-identification method based on knowledge distillation as claimed in claim 1, wherein: and 4, comparing the feature vector of the image to be identified with the pedestrian feature vector of the image set, and searching out the pedestrian target image with the highest similarity.
CN202011431855.1A 2020-12-09 2020-12-09 Knowledge distillation-based pedestrian re-identification method Active CN112560631B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011431855.1A CN112560631B (en) 2020-12-09 2020-12-09 Knowledge distillation-based pedestrian re-identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011431855.1A CN112560631B (en) 2020-12-09 2020-12-09 Knowledge distillation-based pedestrian re-identification method

Publications (2)

Publication Number Publication Date
CN112560631A CN112560631A (en) 2021-03-26
CN112560631B true CN112560631B (en) 2022-06-21

Family

ID=75060078

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011431855.1A Active CN112560631B (en) 2020-12-09 2020-12-09 Knowledge distillation-based pedestrian re-identification method

Country Status (1)

Country Link
CN (1) CN112560631B (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113297906B (en) * 2021-04-20 2022-09-09 之江实验室 Knowledge distillation-based pedestrian re-recognition model compression method and evaluation method
CN113128460B (en) * 2021-05-06 2022-11-08 东南大学 Knowledge distillation-based multi-resolution pedestrian re-identification method
CN113344213A (en) * 2021-05-25 2021-09-03 北京百度网讯科技有限公司 Knowledge distillation method, knowledge distillation device, electronic equipment and computer readable storage medium
CN113269117B (en) * 2021-06-04 2022-12-13 重庆大学 Knowledge distillation-based pedestrian re-identification method
CN113281048B (en) * 2021-06-25 2022-03-29 华中科技大学 Rolling bearing fault diagnosis method and system based on relational knowledge distillation
CN113515656B (en) * 2021-07-06 2022-10-11 天津大学 Multi-view target identification and retrieval method and device based on incremental learning
CN113505719B (en) * 2021-07-21 2023-11-24 山东科技大学 Gait recognition model compression system and method based on local-integral combined knowledge distillation algorithm
CN113360701B (en) * 2021-08-09 2021-11-02 成都考拉悠然科技有限公司 Sketch processing method and system based on knowledge distillation
CN113673254B (en) * 2021-08-23 2022-06-07 东北林业大学 Knowledge distillation position detection method based on similarity maintenance
CN113487614B (en) * 2021-09-08 2021-11-30 四川大学 Training method and device for fetus ultrasonic standard section image recognition network model
CN113920540A (en) * 2021-11-04 2022-01-11 厦门市美亚柏科信息股份有限公司 Knowledge distillation-based pedestrian re-identification method, device, equipment and storage medium
CN114299442A (en) * 2021-11-15 2022-04-08 苏州浪潮智能科技有限公司 Pedestrian re-identification method and system, electronic equipment and storage medium
CN114549901B (en) * 2022-02-24 2024-05-14 杭州电子科技大学 Multi-network combined auxiliary generation type knowledge distillation method
CN115223117B (en) * 2022-05-30 2023-05-30 九识智行(北京)科技有限公司 Training and using method, device, medium and equipment of three-dimensional target detection model
CN115204394A (en) * 2022-07-05 2022-10-18 上海人工智能创新中心 Knowledge distillation method for target detection
CN116563642B (en) * 2023-05-30 2024-02-27 智慧眼科技股份有限公司 Image classification model credible training and image classification method, device and equipment
CN117612214B (en) * 2024-01-23 2024-04-12 南京航空航天大学 Pedestrian search model compression method based on knowledge distillation

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180268292A1 (en) * 2017-03-17 2018-09-20 Nec Laboratories America, Inc. Learning efficient object detection models with knowledge distillation
CN108764462A (en) * 2018-05-29 2018-11-06 成都视观天下科技有限公司 A kind of convolutional neural networks optimization method of knowledge based distillation
CN110837761B (en) * 2018-08-17 2023-04-07 北京市商汤科技开发有限公司 Multi-model knowledge distillation method and device, electronic equipment and storage medium
US11636337B2 (en) * 2019-03-22 2023-04-25 Royal Bank Of Canada System and method for knowledge distillation between neural networks
CN110059740A (en) * 2019-04-12 2019-07-26 杭州电子科技大学 A kind of deep learning semantic segmentation model compression method for embedded mobile end
CN111160409A (en) * 2019-12-11 2020-05-15 浙江大学 Heterogeneous neural network knowledge reorganization method based on common feature learning
CN111126573B (en) * 2019-12-27 2023-06-09 深圳力维智联技术有限公司 Model distillation improvement method, device and storage medium based on individual learning
CN111626330B (en) * 2020-04-23 2022-07-26 南京邮电大学 Target detection method and system based on multi-scale characteristic diagram reconstruction and knowledge distillation
CN112001278A (en) * 2020-08-11 2020-11-27 中山大学 Crowd counting model based on structured knowledge distillation and method thereof

Also Published As

Publication number Publication date
CN112560631A (en) 2021-03-26

Similar Documents

Publication Publication Date Title
CN112560631B (en) Knowledge distillation-based pedestrian re-identification method
CN111325111A (en) Pedestrian re-identification method integrating inverse attention and multi-scale deep supervision
CN109784258A (en) A kind of pedestrian's recognition methods again cut and merged based on Analysis On Multi-scale Features
CN107169117B (en) Hand-drawn human motion retrieval method based on automatic encoder and DTW
CN110516095A (en) Weakly supervised depth Hash social activity image search method and system based on semanteme migration
Yang et al. Cross-domain visual representations via unsupervised graph alignment
CN113743544A (en) Cross-modal neural network construction method, pedestrian retrieval method and system
CN113610046B (en) Behavior recognition method based on depth video linkage characteristics
CN115830637B (en) Method for re-identifying blocked pedestrians based on attitude estimation and background suppression
CN112232134A (en) Human body posture estimation method based on hourglass network and attention mechanism
CN110490028A (en) Recognition of face network training method, equipment and storage medium based on deep learning
CN113065409A (en) Unsupervised pedestrian re-identification method based on camera distribution difference alignment constraint
CN114170659A (en) Facial emotion recognition method based on attention mechanism
Schoneveld et al. Towards a general deep feature extractor for facial expression recognition
CN113592008B (en) System, method, device and storage medium for classifying small sample images
CN111144469B (en) End-to-end multi-sequence text recognition method based on multi-dimensional associated time sequence classification neural network
CN115017366B (en) Unsupervised video hash retrieval method based on multi-granularity contextualization and multi-structure preservation
CN113887653B (en) Positioning method and system for tight coupling weak supervision learning based on ternary network
CN116311504A (en) Small sample behavior recognition method, system and equipment
CN115100694A (en) Fingerprint quick retrieval method based on self-supervision neural network
CN114821632A (en) Method for re-identifying blocked pedestrians
Zhang et al. Research On Face Image Clustering Based On Integrating Som And Spectral Clustering Algorithm
CN111401519B (en) Deep neural network unsupervised learning method based on similarity distance in object and between objects
Yang et al. Robust feature mining transformer for occluded person re-identification
LU102992B1 (en) Siamese network target tracking method based on channel and spatial attention mechanisms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: No.727, Jingming South Road, Kunming, Yunnan 650500

Applicant after: Kunming University of Science and Technology

Address before: No.72, Jingming South Road, Chenggong District, Kunming, Yunnan 650000

Applicant before: Kunming University of Science and Technology

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant