CN117456480A

CN117456480A - Light vehicle re-identification method based on multi-source information fusion

Info

Publication number: CN117456480A
Application number: CN202311769679.6A
Authority: CN
Inventors: 曾焕强; 郑航杰; 施一帆; 朱建清; 陈婧; 沈剑楠
Original assignee: Xingchen Technology Co ltd; Huaqiao University
Current assignee: Xingchen Technology Co ltd; Huaqiao University
Priority date: 2023-12-21
Filing date: 2023-12-21
Publication date: 2024-01-26
Anticipated expiration: 2043-12-21
Also published as: CN117456480B

Abstract

The invention discloses a light vehicle re-identification method based on multi-source information fusion, which relates to the technical field of computer vision and machine learning and comprises the following steps: constructing a neural network; the neural network comprises a ResNet50 network, a local feature fusion network and a mixed attention module which are connected in sequence; performing combined training on the neural network by using the supervision comparison loss and the multisource information identification loss until convergence to obtain a teacher network; selecting a model with smaller calculation amount and smaller parameter amount than the teacher network as a student network; through knowledge distillation, supervising and training a student network until convergence to obtain a light vehicle re-identification model; and outputting a re-identification result based on the light-weight vehicle re-identification model. The invention coordinates different sensor data by utilizing a multi-source information fusion mode to improve the re-identification performance, and assists knowledge distillation to realize high-quality re-identification under limited computing resources, thereby providing more flexibility for various application scenes.

Description

Light vehicle re-identification method based on multi-source information fusion

Technical Field

The embodiment of the invention relates to the field of computer vision and pattern recognition, in particular to a lightweight vehicle re-recognition method based on multi-source information fusion.

Background

The vehicle re-identification refers to the problem of judging whether the vehicle images shot in the non-overlapping area belong to the same vehicle or not under the traffic monitoring scene in a specific range, and has very important practical application value. Vehicle re-identification has a very wide range of applications such as vehicle tracking, vehicle positioning, criminal detection, etc.

Besides factors such as illumination change, angle change and shielding, the vehicle re-identification still has special difficulties, and single sensor data can be limited by environmental conditions such as rainy days, foggy days and the like, and the existing vehicle re-identification method has the problem of insufficient identification precision for the environmental limitations. The data (such as images, point clouds and radar data) from different sensors have different characteristics, and the multi-source information fusion can fuse the characteristics to generate more comprehensive and rich characteristic representations, so that the recognition performance is improved. At present, complex depth network models are mostly adopted in a vehicle re-identification algorithm based on multi-source information, so that the quantity of calculation parameters in an inference stage is large, the vehicle re-identification inference speed is restricted, and the popularization and application of vehicle re-identification in an actual traffic monitoring system are limited.

Disclosure of Invention

Aiming at the problems that the identification accuracy is not high and the parameter quantity of a multi-sensor algorithm is overlarge when single sensor data is limited by environmental conditions in the existing vehicle re-identification technology, the invention provides a light-weight vehicle re-identification method based on multi-source information fusion, which comprises the steps of constructing a mixed attention module for multi-source information attributes, carrying out fusion learning on multi-source information obtained by different sensors to obtain more comprehensive and accurate information, and combining a more reasonable knowledge distillation method, thereby effectively improving the identification accuracy of vehicle re-identification and simultaneously considering the light weight and robustness of a model; high quality re-recognition can be achieved even with limited computing resources, thereby providing more flexibility for various application scenarios.

In order to achieve the above object, the embodiment of the present invention provides the following technical solutions:

a lightweight vehicle re-identification method based on multi-source information fusion comprises the following steps:

s11, constructing a neural network; the neural network comprises two ResNet50 networks, a local feature fusion network and a mixed attention module which are connected in sequence; each ResNet50 network is connected with a source diagram respectively; the source map comprises a vehicle depth map or an infrared map;

s12, performing joint training on the neural network by using the supervision comparison loss and the multisource information identification loss until convergence to obtain a teacher network; selecting a model with smaller calculation amount and smaller parameter amount than the teacher network as a student network;

s13, supervising the student network by using interlayer information of a teacher network through knowledge distillation, training until convergence, and obtaining a light vehicle re-identification model;

s14, calculating the feature similarity between the target vehicle image and the vehicle images in the candidate library based on the light vehicle re-identification model, and outputting a re-identification result according to the similarity.

Preferably, in S11, the local feature fusion network includes a feature channel stitching unit, a 1×1 convolution unit, a first semantic focusing unit, a second semantic focusing unit, and a feature fusion unit; the output characteristic diagram of one ResNet50 network and the output characteristic diagram of the other ResNet50 network are spliced through a characteristic channel splicing unit to obtain a spliced characteristic diagram; the spliced features are subjected to a 1 multiplied by 1 convolution unit to obtain a convolution feature diagram; the output feature map and the convolution feature map of the ResNet50 network are multiplied element by element through a first semantic focusing unit to obtain a first semantic focusing feature map; the output feature map and the convolution feature map of the other ResNet50 network are multiplied element by element through a second semantic focusing unit to obtain a second semantic focusing feature map; the first semantic focusing feature map and the second semantic focusing feature map are added element by element through a feature fusion unit to obtain a double-source information fusion feature map; and outputting the double-source information fusion feature map to a mixed attention module.

Preferably, in S11, the mixed attention module includes a channel attention and a spatial attention; the method comprises the steps that a double-source information fusion feature map output by a local feature fusion network is subjected to feature extraction through a convolution layer, the obtained feature map is operated through a channel attention module and multiplied by convolution output, and a result is used as an input feature map of a spatial attention module; the input feature map is multiplied by the output obtained through the spatial attention module, the semantic focusing is carried out, the attention force diagram is obtained, and finally the dual-source information fusion feature map and the attention force diagram are added element by element, so that the mixed attention feature is obtained.

Preferably, the channel attention and the spatial attention are expressed as follows:

wherein,representing channel attention; />Representing spatial attention; />An input feature map representing channel attention; />An input feature map representing spatial attention; />Representation->Activating a function; />Representing a convolution operation; />Representing a multi-layer perceptron; />And->Mean pooling and maximum pooling are indicated, respectively.

Preferably, in S11, the shared feature extraction network employs a res net50 with a network structure of [64, 'M', (64,64,256) ×3, (128,128,512) ×4, (256,256,1024) ×6, (512,512,2048) ×3], where each digit represents a three-layer structure of convolution+batch normalization+relu activation function, the digit size represents the number of channels of the convolution layer, 'M' represents maximum pooling, and x N represents the same number of convolutions block repetitions.

Preferably, in S12, the data set used for training and testing is SVMD, and MobileNet with a parameter number of res net50 below 20% is selected as the student network.

Preferably, in S12, the loss function of the contrast loss is monitoredThe following are provided:

wherein,an index set representing all positive examples in the multi-view batch that are different from i; />Representing the number of categories; />Is->Is a radix of (2); />Representing a temperature parameter; />Representing anchor characteristics; />Representing positive features; />Representing a negative feature; />An index set representing negative examples; />Representing the entire index set;

loss function for identifying loss of multi-source informationThe following are provided:

wherein,representing hyper-parameters->And->Expressed as:

wherein,representing identity classification loss; />Representing a loss of characteristics; />Representing the number of images; />Representing the number of vehicle IDs; />Indicating the use of a classifier; />And->Images representing different sources +.>Representing the image +.>Identifying as its identity tag->Is used for outputting the prediction probability of the (a); />Representing the image +.>Identifying as its identity tag->Is used for outputting the prediction probability of the (a); />And->Features of two kinds of multi-source information are represented respectively.

The loss function of the teacher network is expressed as:

wherein,is a super parameter.

8. The method for re-identifying a lightweight vehicle based on multi-source information fusion according to claim 1, wherein in S13, a loss function of knowledge distillation is:

wherein,representing a similarity matrix calculated based on the feature matrix of the student; />Representing a similarity matrix calculated based on the teacher's feature matrix; />Representing the Frobenius norm matrix.

Preferably, through knowledge distillation, the student network is supervised by interlayer information of the teacher network, and training is performed until convergence, so as to obtain a light vehicle re-identification model, which specifically comprises:

the learning rate is 0.01 by using a random gradient descent SGD optimizer; divided by 10 after every 5 epochs, the batch size used is 64; and performing supervision training on the selected student network through a loss function of knowledge distillation until convergence to obtain a light vehicle re-identification model.

Preferably, the step S14 specifically includes:

and based on the light vehicle re-identification model, respectively carrying out feature extraction on the multi-source image of the target vehicle and the vehicle images in the candidate library through the student network model, calculating the feature similarity of the multi-source image and the vehicle images in the candidate library through the Euclidean distance, and outputting the vehicle image with the highest similarity as a vehicle re-identification result.

The embodiment of the invention has the following advantages:

(1) According to the method, the mixed attention module is constructed through the multisource information attribute of the vehicle, the feature representation method is designed, the network model is optimized by combining richer multisource information, and the knowledge distillation method is combined, so that the lightweight vehicle re-identification model based on multisource information fusion obtained through training has high-precision vehicle re-identification capability, and meanwhile, the lightweight property and the robustness of the model are considered; even under the limited computing resources, high-quality re-identification can be realized, so that more flexibility is provided for various application scenes;

(2) The method can be widely applied to intelligent video monitoring scenes, such as vehicle positioning, vehicle track prediction, criminal tracking and the like.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It will be apparent to those of ordinary skill in the art that the drawings in the following description are exemplary only and that other implementations can be obtained from the extensions of the drawings provided without inventive effort.

Fig. 1 is a flowchart of a lightweight vehicle re-identification method based on multi-source information fusion according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a framework of a lightweight vehicle re-identification method based on multi-source information fusion according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a local feature fusion network according to an embodiment of the present invention.

Detailed Description

Other advantages and advantages of the present invention will become apparent to those skilled in the art from the following detailed description, which, by way of illustration, is to be read in connection with certain specific embodiments, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1 and fig. 2, a lightweight vehicle re-identification method based on multi-source information fusion specifically includes the steps of:

step S11, constructing a neural network; the neural network comprises two ResNet50 networks, a local feature fusion network and a mixed attention module which are connected in sequence; each ResNet50 network is connected with a source diagram respectively; the source map is a vehicle depth map or an infrared map.

Specifically, referring to fig. 3, the local feature fusion network includes a feature channel splicing unit, a 1×1 convolution unit, a first semantic focusing unit, a second semantic focusing unit, and a feature fusion unit; the output characteristic diagram of one ResNet50 network and the output characteristic diagram of the other ResNet50 network are spliced through a characteristic channel splicing unit to obtain a spliced characteristic diagram; the spliced features are subjected to a 1 multiplied by 1 convolution unit to obtain a convolution feature diagram; the output feature map and the convolution feature map of the ResNet50 network are multiplied element by element through a first semantic focusing unit to obtain a first semantic focusing feature map; the output feature map and the convolution feature map of the other ResNet50 network are multiplied element by element through a second semantic focusing unit to obtain a second semantic focusing feature map; the first semantic focusing feature map and the second semantic focusing feature map are added element by element through a feature fusion unit to obtain a double-source information fusion feature map; and outputting the double-source information fusion feature map to a mixed attention module.

The mixed attention module includes a channel attention and a spatial attention; the method comprises the steps that a double-source information fusion feature map output by a local feature fusion network is subjected to feature extraction through a convolution layer, the obtained feature map is operated through a channel attention module and multiplied by convolution output, and a result is used as an input feature map of a spatial attention module; the input feature map is multiplied by the output obtained through the spatial attention module, the semantic focusing is carried out, the attention force diagram is obtained, and finally the dual-source information fusion feature map and the attention force diagram are added element by element, so that the mixed attention feature is obtained.

The shared feature extraction network employs ResNet50 with a network structure of [64, 'M', (64,64,256) ×3, (128,128,512) ×4, (256,256,1024) ×6, (512,512,2048) ×3], where each number represents a convolution+batch normalization+ReLU activation function three-layer structure, the number size represents the number of convolution layer channels, 'M' represents maximum pooling, and×N represents the same number of convolution block repetitions.

Channel attention and spatial attention are expressed as follows:

In this embodiment, the method further includes performing data enhancement on a depth map or an infrared map of the vehicle in the vehicle re-identification dataset, and inputting the image after data enhancement into the neural network.

Specifically, the vehicle re-identification data set image is firstly turned horizontally and vertically, then random erasure enhancement is carried out on the image, wherein the random erasure enhancement refers to randomly selecting a rectangular area of the image, erasing pixels of the image by using random values, and finally inputting the data set image into a feature extraction network.

20,000 images (20 images per class) were randomly extracted from the training set as our validation set, the remainder as the training set. The final performance of each model was measured using the original validation set as the test set. The input image is firstly subjected to horizontal and vertical overturn, then is subjected to random erasure enhancement, noise in sample data is weakened, the stability of a model is improved, and the resolution ratio of the image is 224 multiplied by 224.

Step S12, performing joint training on the neural network by using the supervision comparison loss and the multi-source information identification loss until convergence to obtain a teacher network; and selecting a model with smaller calculation amount and smaller parameter amount than the teacher network as a student network.

Specifically, the data set used for training and testing is SVMD, and MobileNet with ResNet50 less than 20% of parameters is selected as the student network.

Loss function for supervision contrast lossThe following are provided:

wherein,representing hyper-parameters->And->Expressed as:

The loss function of the teacher network is expressed as:

wherein,is a super parameter.

And step S13, supervising the student network by using interlayer information of the teacher network through knowledge distillation, training until convergence, and obtaining a light vehicle re-identification model.

Specifically, the loss function of knowledge distillation is:

wherein,and->Representing similarity matrix calculated based on feature matrix of student and teacher respectively, < >>Is a Frobenius norm matrix; the learning rate was 0.01 using a random gradient descent (SGD) optimizer. Divided by 10 after every 5 epochs, the batch size used is 64. And performing supervision training on the selected student network through the loss function until convergence to obtain a light vehicle re-identification model.

Step S14, calculating the feature similarity between the target vehicle image and the vehicle images in the candidate library based on the light vehicle re-identification model, and outputting a re-identification result according to the similarity.

Specifically, based on a lightweight vehicle re-identification model, calculating Euclidean distances of features output by the feature learning network of the target vehicle image and the vehicle images in the candidate library respectively, sorting the calculated Euclidean distances from small to large, and outputting the vehicle image with the highest similarity as a vehicle re-identification result according to the Euclidean distance.

According to the embodiment, the hybrid attention module is built through the multi-source information attribute of the vehicle, the feature representation method is designed, the network model is optimized by combining richer multi-source information, and the knowledge distillation method is combined, so that the lightweight vehicle re-identification model based on multi-source information fusion obtained through training has high-precision vehicle re-identification capability, and meanwhile, the lightweight property and the robustness of the model are considered; high quality re-recognition can be achieved even with limited computing resources, thereby providing more flexibility for various application scenarios.

While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that the specific embodiments described are illustrative only and not intended to limit the scope of the invention, and that equivalent modifications and variations of the invention in light of the spirit of the invention will be covered by the claims of the present invention.

Claims

1. A lightweight vehicle re-identification method based on multi-source information fusion is characterized by comprising the following steps:

s11, constructing a neural network; the neural network comprises two ResNet50 networks, a local feature fusion network and a mixed attention module which are connected in sequence; each ResNet50 network is connected with a source diagram respectively; the source map is a vehicle depth map or an infrared map;

s14, calculating the feature similarity between the target vehicle image and the vehicle images in the candidate library based on the light vehicle re-identification model, and outputting a re-identification result according to the similarity;

in S11, the local feature fusion network comprises a feature channel splicing unit, a 1 multiplied by 1 convolution unit, a first semantic focusing unit, a second semantic focusing unit and a feature fusion unit; the output characteristic diagram of one ResNet50 network and the output characteristic diagram of the other ResNet50 network are spliced through a characteristic channel splicing unit to obtain a spliced characteristic diagram; the spliced features are subjected to a 1 multiplied by 1 convolution unit to obtain a convolution feature diagram; the output feature map and the convolution feature map of the ResNet50 network are multiplied element by element through a first semantic focusing unit to obtain a first semantic focusing feature map; the output feature map and the convolution feature map of the other ResNet50 network are multiplied element by element through a second semantic focusing unit to obtain a second semantic focusing feature map; the first semantic focusing feature map and the second semantic focusing feature map are added element by element through a feature fusion unit to obtain a double-source information fusion feature map; and outputting the double-source information fusion feature map to a mixed attention module.

2. The method for re-recognition of a lightweight vehicle based on multi-source information fusion according to claim 1, wherein in S11, the hybrid attention module comprises a channel attention and a spatial attention; the method comprises the steps that a double-source information fusion feature map output by a local feature fusion network is subjected to feature extraction through a convolution layer, the obtained feature map is operated through a channel attention module and multiplied by convolution output, and a result is used as an input feature map of a spatial attention module; the input feature map is multiplied by the output obtained through the spatial attention module, the semantic focusing is carried out, the attention force diagram is obtained, and finally the dual-source information fusion feature map and the attention force diagram are added element by element, so that the mixed attention feature is obtained.

3. The lightweight vehicle re-identification method based on multi-source information fusion of claim 2, wherein the channel attention and the spatial attention are expressed as follows:

；

wherein,representing channel attention; />Representing spatial attention; />An input feature map representing channel attention; />An input feature map representing spatial attention; />Representation->Activating a function; />Representing a convolution operation;representing a multi-layer perceptron; />And->Mean pooling and maximum pooling are indicated, respectively.

4. The method for re-identifying a lightweight vehicle based on multi-source information fusion according to claim 1, wherein in S11, the shared feature extraction network adopts a res net50 with a network structure of [64, 'M', (64,64,256) ×3, (128,128,512) ×4, (256,256,1024) ×6, (512,512,2048) ×3], wherein each number represents a three-layer structure of convolution+batch normalization+relu activation function, the number of channels of a convolution layer is represented by the number of numbers, 'M' represents maximum pooling, and x N represents the same number of repetitions of convolution blocks.

5. The method for re-identifying a lightweight vehicle based on multi-source information fusion according to claim 1, wherein in S12, the data set used for training and testing is SVMD, and MobileNet with a parameter number of ResNet50 lower than 20% is selected as the student network.

6. The method for re-identifying a lightweight vehicle based on multi-source information fusion according to claim 1, wherein in S12, a loss function of contrast loss is supervisedThe following are provided:

；

wherein,an index set representing all positive examples in the multi-view batch that are different from i; />Representing the number of categories; />Is->Is a radix of (2); />Representing a temperature parameter; />Representing anchor characteristics; />Representing positive features; />Representing a negative feature; />An index set representing negative examples;representing the entire index set;

；

wherein,representing hyper-parameters->And->Expressed as:

；

wherein,representing identity classification loss; />Representing a loss of characteristics; />Representing the number of images; />Representing the number of vehicle IDs;indicating the use of a classifier; />And->Images representing different sources +.>Representing the image +.>Identifying as its identity tag->Is used for outputting the prediction probability of the (a); />Representing the image +.>Identifying as its identity tag->Is used for outputting the prediction probability of the (a); />And->Respectively representing the characteristics of two kinds of multi-source information;

the loss function of the teacher network is expressed as:

；

wherein,is a super parameter.

7. The method for re-identifying a lightweight vehicle based on multi-source information fusion according to claim 1, wherein in S13, a loss function of knowledge distillation is:

；

8. The method for re-identification of a lightweight vehicle based on multi-source information fusion according to claim 1, wherein,

through knowledge distillation, the student network is supervised by the interlayer information of the teacher network, training is carried out until convergence, and a light vehicle re-identification model is obtained, which specifically comprises the following steps:

9. The method for re-identifying a lightweight vehicle based on multi-source information fusion according to claim 1, wherein S14 specifically comprises: