CN117636298A

CN117636298A - Vehicle re-identification method, system and storage medium based on multi-scale feature learning

Info

Publication number: CN117636298A
Application number: CN202311552917.8A
Authority: CN
Inventors: 熊凌宇; 胡志辉; 黄茜
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2023-11-21
Filing date: 2023-11-21
Publication date: 2024-03-01

Abstract

The invention discloses a vehicle re-identification method, a system and a storage medium based on multi-scale feature learning, wherein the method comprises the following steps: acquiring a training set and a query library of vehicle images and preprocessing; training the re-identification network model by utilizing the preprocessed training set; respectively inputting all the vehicle images in the preprocessed query library into a trained re-recognition network model to obtain feature vectors of each vehicle image; according to the feature vector, obtaining an average feature vector of each type of vehicle; inputting the preprocessed vehicle image to be detected into a trained re-recognition network model to obtain a feature vector of the vehicle to be detected; and calculating Euclidean distance according to the feature vector of the vehicle to be detected and the average feature vector of each type of vehicle, and selecting the vehicle type with the minimum distance as the recognition result. According to the invention, vehicles with different resolutions can be effectively identified through the re-identification network model, and meanwhile, the overall network parameter quantity is obviously reduced, so that the deployment cost is reduced.

Description

Vehicle re-identification method, system and storage medium based on multi-scale feature learning

Technical Field

The invention belongs to the field of computer vision, and relates to a vehicle re-identification method, a system, computer equipment and a computer readable storage medium based on multi-scale feature learning.

Background

Vehicle re-identification can be considered as a retrieval problem, specifically, a query image or a video is utilized to retrieve vehicles with the same identity from a gallery (gamma) or video library shot by other cameras, and the shot vehicles are affected by the weather, illumination and background at the time to cause different quality. How to overcome these difficulties and to increase the accuracy of re-identification is a major research hotspot and difficulty at present.

Early vehicle re-identification efforts often differentiated vehicles by manually designed features such as SIFT, HOG, etc. However, in many cases, the features cannot be stably extracted due to the influence of vehicle occlusion, low image resolution, or the like. In addition, the manually designed features have limited use and focus on a single feature, so that the manually designed features cannot be applied to different scenes.

Re-recognition networks based on convolutional neural networks have also made tremendous progress in the past decade since the neural network model first achieved optimal performance in the image classification task. The current re-identification method mainly uses a method based on local characteristics to divide a vehicle into different parts and learn the characteristics of the different parts. But the global semantic information is easily ignored. In addition, some vehicle re-identification networks based on deep learning methods often use multi-stage networks, such as a provider network. Because provider is a multi-stage network, each task module needs to be trained individually, it cannot be optimized end-to-end and the entire network is cumbersome.

Disclosure of Invention

In order to solve the defects in the prior art, the invention provides a vehicle re-identification method, a system, a computer device and a computer readable storage medium based on multi-scale feature learning, vehicles with different resolutions can be effectively identified through a built re-identification network model, and model parameters and calculation amount are lower (algorithm performance is also competitive); meanwhile, in the re-identification network model, the overall network parameter quantity is obviously reduced by using the lightweight convolution module and the inverse bottleneck block, so that the deployment cost is reduced.

A first object of the present invention is to provide a vehicle re-recognition method based on multi-scale feature learning.

A second object of the present invention is to provide a vehicle re-identification system based on multi-scale feature learning.

A third object of the present invention is to provide a computer device.

A fourth object of the present invention is to provide a computer-readable storage medium.

The first object of the present invention can be achieved by adopting the following technical scheme:

a vehicle re-identification method based on multi-scale feature learning, the method comprising:

acquiring a training set and a query library of vehicle images, and preprocessing the vehicle images in the training set and the query library;

training the re-identification network model by utilizing the preprocessed training set; the re-recognition network model comprises eleven layers, wherein each of the first eight layers comprises an inverse bottleneck block, the first, second, fourth, sixth and eighth layers are connected with a maximum pooling layer after the inverse bottleneck block, the ninth layer is a global average pooling layer, the tenth layer is a full-connection layer, and the last layer is a classification layer, wherein the output of each layer is used as the input of the next layer, and the tenth layer outputs the feature vector of an input image; the reverse bottleneck block adopts a multi-branch structure and a channel attention mechanism to extract features with different scales;

respectively inputting all the vehicle images in the preprocessed query library into a trained re-recognition network model to obtain feature vectors of each vehicle image; obtaining an average feature vector of each type of vehicle according to all feature vectors of each type of vehicle image;

inputting the preprocessed vehicle image to be detected into a trained re-recognition network model to obtain a feature vector of the vehicle to be detected;

and calculating the Euclidean distance according to the feature vector of the vehicle to be detected and the average feature vector of each type of vehicle in the query library, and selecting the vehicle type with the minimum distance as a final recognition result.

Further, the inverse bottleneck block adopts a multi-branch structure and a channel attention mechanism to extract features of different scales, and the method comprises the following steps:

firstly, carrying out convolution dimension lifting on input features, and then obtaining different receptive fields on a plurality of branches by using different numbers of Light 3 multiplied by 3 convolution blocks; meanwhile, the dimension of the feature graphs output by each path is the same through filling and step length adjustment so as to ensure that the feature graphs can be directly overlapped according to the channels;

after the multi-path feature graphs are overlapped, different weights are distributed to different channels through the attention mechanism layer, so that model parameters are concentrated into more useful channel features; performing residual connection with the input features after convolution dimension reduction; and finally, the connected feature map is subjected to a ReLU activation function to obtain an output feature map.

Further, the Light 3×3 convolution block is obtained by splitting a standard 3×3 convolution block into a depth-wise convolution and a point-wise convolution, so as to reduce the number of parameters and the calculation amount.

Further, the allocating different weights for different channels by the attention mechanism layer includes:

firstly, carrying out global average pooling on input features in channel dimension, then carrying out dimension reduction through a full connection layer, and taking a ReLU as an activation function; and then, carrying out dimension lifting through a full connection layer to ensure that the output dimension is consistent with the number of channels of the feature map, taking Sigmoid as an activation function, and finally, distributing the output value as a weight to different channels.

Further, when training the re-identification network model, each round of data enhancement is performed on the vehicle images in the preprocessed training set so as to improve the generalization capability of the model; the data enhancements include random horizontal flipping and random erasure.

Further, the loss function of the re-recognition network model uses a cross entropy loss function with label smoothing.

Further, the preprocessing includes scaling the vehicle image to a set size and performing normalization processing.

The second object of the invention can be achieved by adopting the following technical scheme:

a vehicle re-identification system based on multi-scale feature learning, the system comprising:

the acquisition module is used for acquiring a training set and a query library of the vehicle images and preprocessing the vehicle images in the training set and the query library;

the training module is used for training the re-identification network model by utilizing the preprocessed training set; the re-recognition network model comprises eleven layers, wherein each of the first eight layers comprises an inverse bottleneck block, the first, second, fourth, sixth and eighth layers are connected with a maximum pooling layer after the inverse bottleneck block, the ninth layer is a global average pooling layer, the tenth layer is a full-connection layer, and the last layer is a classification layer, wherein the output of each layer is used as the input of the next layer, and the tenth layer outputs the feature vector of an input image; the reverse bottleneck block adopts a multi-branch structure and a channel attention mechanism to extract features with different scales;

the first generation module is used for respectively inputting all the vehicle images in the preprocessed query library into the trained re-recognition network model to obtain the feature vector of each vehicle image; obtaining an average feature vector of each type of vehicle according to all feature vectors of each type of vehicle image;

the second generation module is used for inputting the preprocessed image of the vehicle to be detected into the trained re-recognition network model to obtain the feature vector of the vehicle to be detected;

the recognition module is used for calculating Euclidean distance according to the feature vector of the vehicle to be detected and the average feature vector of each type of vehicle in the query library, and selecting the vehicle type with the smallest distance as a final recognition result.

The third object of the present invention can be achieved by adopting the following technical scheme:

a computer device comprising a processor and a memory for storing a program executable by the processor, wherein the processor implements the vehicle re-identification method based on multi-scale feature learning when executing the program stored in the memory.

The fourth object of the present invention can be achieved by adopting the following technical scheme:

a computer-readable storage medium storing a program which, when executed by a processor, implements the above-described vehicle re-recognition method based on multi-scale feature learning.

Compared with the prior art, the invention has the following beneficial effects:

1. the built vehicle re-identification network model is a single-stage model, can be directly optimized end to end, reduces training difficulty and cost, introduces multi-scale feature learning into the model, fuses features of different scales at different stages of the model, and can effectively enhance feature extraction and generalization capabilities of the model so as to distinguish vehicles of different scales;

2. the invention uses the depth separable convolution and introduces a attention mechanism, so that the model parameters and the calculated amount are lower, the reasoning speed is faster, and the deployment is easier; the design of the inverse bottleneck block ensures the network depth and width and simultaneously keeps the model lightweight, and compared with the mainstream depth network model, the parameter quantity is reduced by 80%;

3. according to the invention, the inverse residual block is introduced to perform multi-scale feature learning, so that the recognition performance of the vehicle is relatively good. The indexes of mAP, rank-1, rank-5, rank-10 and the like are superior to the current networks of ResNet, shuffleNet, acceptance v2, provider and the like, and vehicles to be identified can be found out during actual testing.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the structures shown in these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a vehicle re-identification method based on multi-scale feature learning according to embodiment 1 of the present invention;

fig. 2 is a schematic diagram of a vehicle re-identification network structure according to embodiment 1 of the present invention;

FIG. 3 is a schematic diagram of the reverse bottleneck block according to embodiment 1 of the present invention;

fig. 4 is a schematic diagram of the structure of the Light 3×3 convolution of embodiment 1 of the present invention compared with a standard convolution, wherein: (A) is a standard convolution and (B) is a Light 3 x 3 convolution;

FIG. 5 is a schematic diagram of the structure of the attention mechanism layer of embodiment 1 of the present invention;

FIG. 6 is a block diagram showing the structure of a vehicle re-recognition system based on multi-scale feature learning according to embodiment 2 of the present invention;

fig. 7 is a block diagram showing the structure of a computer device according to embodiment 3 of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments, and all other embodiments obtained by those skilled in the art without making any inventive effort based on the embodiments of the present invention are within the scope of protection of the present invention. It should be understood that the description of the specific embodiments is intended for purposes of illustration only and is not intended to limit the scope of the present application.

Example 1:

as shown in fig. 1, the vehicle re-identification method based on multi-scale feature learning provided in this embodiment includes the following steps:

s101, acquiring a training set and a query library of vehicle images, and preprocessing the vehicle images in the training set and the query library.

The present embodiment obtains a vehicle dataset from public database 10.1109/icme.2016.7553002, where the training set contains 37778 vehicle images and the query library contains 1678 image vehicle images. If the data set is built by oneself, each type of vehicle in the training set is ensured to contain not less than 100 sample images, and each type of vehicle in the query library contains not less than 10 sample images. At the same time, each sample image should have a plurality of different shooting angles, times and places.

The data preprocessing section includes scaling the image to 256 x 128 and normalizing the data, specifically subtracting the mean and dividing the variance of all samples by the channel dimension, subject to a gaussian distribution with a mean of 0 and a variance of 1.

S102, building a re-identification network model.

As shown in fig. 2, the re-identification network model is a 11-layer network. The image is input into a first layer network, and the feature map output by each layer network is used as the input of a next layer network.

Specifically, the first layer first changes an input image of 256×128×3 (height×width×channel) into a feature map of 256×128×32 size using the inverse bottleneck block of fig. 3, and then changes the input image into a feature map of 128×64×32 through the max pooling layer; the second layer firstly uses the inverse bottleneck block to change the feature map with the input size of 128 multiplied by 64 multiplied by 32 into the feature map with the size of 128 multiplied by 64, and then changes the feature map with the size of 64 multiplied by 32 multiplied by 48 by the maximum pooling with the step length of 2; the third layer uses the inverse bottleneck block to still change the input size of 64 x 32 x 48 into a feature map of 64 x 32 x 48; the fourth layer firstly uses the inverse bottleneck block to change the characteristic diagram with the size of 64 multiplied by 32 multiplied by 48 into the characteristic diagram with the size of 64 multiplied by 32 multiplied by 64, and then uses the maximum pooling layer with the step length of 2 to change the size of the characteristic diagram into 32 multiplied by 16 multiplied by 64; the fifth layer uses the inverse bottleneck block to change the input size of 32×16×64 into a feature map of size 32×16×64; the sixth layer uses the inverse bottleneck block to change the feature map with the size of 32 multiplied by 16 multiplied by 64 into the feature map with the size of 32 multiplied by 16 multiplied by 96, and then changes the feature map into the feature map with the size of 16 multiplied by 8 multiplied by 96 through the maximum pooling with the step length of 2; the seventh layer changes a feature map of size 16×8×96 into a feature map of size 16×8×96 using an inverse bottleneck block; the eighth layer firstly uses the inverse bottleneck block to change the feature map with the size of 16 multiplied by 8 multiplied by 96 into the feature map with the size of 8 multiplied by 4 multiplied by 128, and then uses the maximum pooling layer with the step length of 2 to change the feature map into 8 multiplied by 4 multiplied by 128; the GAP layer is a global average pooling layer, and global average pooling flattening is carried out on the feature map with the size of 8 multiplied by 4 multiplied by 128 of the eighth layer according to the channel dimension to obtain a 128-dimensional feature vector; the tenth layer elevates features to 512 dimensions; the last layer is the classification layer, and the output features correspond to the class of the vehicle. 163 dimensions are used in this embodiment. The overall network parameters are 1.94M, far lower than some networks currently in mainstream, such as ResNet50 with parameters of 25.05M and lightweight network mobilenet v2 with parameters of 3.21M.

As shown in fig. 3, the inverse bottleneck module performs a 1×1 convolution on the input features to perform an up-scaling operation, and then extracts features of different scales using a multi-branch structure. Different receptive fields are obtained on four branches by using different numbers of Light 3 x 3 convolution blocks; meanwhile, the dimension of the feature graphs output by each path is the same through filling and step length adjustment so as to ensure that the feature graphs can be directly overlapped according to the channels; after the four-way feature graphs are overlapped, different weights are distributed to different channels through an attention mechanism, so that model parameters are concentrated into more useful channel features. And performing residual connection with the input features after the dimension reduction through 1X 1 convolution, and finally obtaining an output feature map through a ReLU nonlinear activation function.

The ReLU function is specifically defined as:

ReLU(x)＝max(0，x)

where x represents each eigenvalue in the input eigenvalue, the ReLU activation function maps eigenvalues less than zero to 0, and the remaining eigenvalues remain unchanged.

As shown in fig. 4, the Light 3 x 3 convolution splits the conventional convolution into a depth-wise convolution and a point-wise convolution, as compared to the conventional convolution. Before and after resolution, under the condition of the same receptive field, the latter is reduced to the reference quantity and calculated quantity compared with the former and latterOriginal 1/k ² Where k represents the convolution kernel size, typically taken as 3, so it is designated as a Light 3 x 3 convolution block; the normalization layer is used for normalizing the characteristics of different channels to Gaussian distribution with the mean value of 0 and the variance of 1, so that the Gaussian distribution can be regarded as a regularization means to avoid gradient explosion or gradient disappearance; finally, use ReLU as the activation function.

As shown in fig. 5, the attention mechanism layer performs global average pooling on the input features in the channel dimension, then performs dimension reduction through the first full-connection layer, uses ReLU as an activation function, and performs dimension lifting through the second full-connection layer to keep consistent with the number of the feature map channels; taking Sigmoid as an activation function, so that the output value range is between [0,1 ]; finally, the output values are distributed to different channels as weights, and the elements in each channel are multiplied by the weights distributed to the elements.

The specific definition of Sigmoid is:

wherein x is E R ^1×c Representing the feature vector output by the upper layer of fully connected network, the Sigmoid function maps the feature vector to [0,1]]Between them.

S103, training the re-recognition network model by utilizing the preprocessed training set.

And directly outputting 512-dimensional feature vectors through a tenth layer after the training of the network model is completed.

Specific training parameter details:

the training batch size is set to 64; the learning rate is set to 0.0015; the weight decay is set to 5e-4; the optimization algorithm uses an SGD optimizer to train 200 rounds altogether;

data enhancement, including random horizontal flipping and random erasure, is performed at each batch. The random horizontal overturn is to randomly overturn the image left and right, and the random erasure is to randomly select a certain part of the image to be blocked, and the pixel mean value of the image is used for replacing. Random flipping and random erasing are performed with a certain set probability value, which is adjustable, while the area of the blocked pixels is also adjustable, depending on the different scenes and tasks faced.

In this embodiment, 0.4 probability inversion and erasure area of 20×10 pixels are selected. The original training samples can be subjected to data enhancement in each round of batch in the training process, so that a plurality of new samples can be added in each training, and the generalization capability of the model is improved.

The loss function uses a cross entropy loss function with label smoothing, and the specific formula is:

wherein K is the number of categories; z _i The i-th output of the last layer, i.e. the score of the sample on the i-th class; p is p _i Probability of dividing the sample into the i-th class; q represents the class label vector of the sample, q _i Representing the i-th value in the vector, y represents the true classification of the sample, i.e. q when the sample is actually an i-th type sample _i Equal to 1- ε, otherwise q _i Equal to ε/(K-1); epsilon can be understood as the probability of sample tag error. When a sample marks an error with a certain probability epsilon, the epsilon probability is true for other terms as well. Therefore, all the predicted information is reserved to a certain extent, the tolerance of the model to the marking error data is improved, and the generalization capability of the model is improved.

Epsilon is 0.1 in this example; the cross entropy Loss number Loss characterizes the similarity between two vectors, the higher the similarity, the smaller the Loss.

S104, respectively inputting all the vehicle images in the query library into the trained re-recognition network model to obtain the feature vector of each image; and obtaining the average feature vector of each type of vehicle according to all the feature vectors of each type of vehicle.

And carrying out arithmetic average on all the feature vectors of each type of vehicle to obtain an average feature vector of each type of vehicle.

S105, preprocessing the image of the vehicle to be detected, and then inputting the image into the trained re-recognition network model to obtain the feature vector of the vehicle to be detected.

S106, calculating Euclidean distance between the feature vector of the vehicle to be detected and the average feature vector of each type of vehicle in the query library, and selecting the vehicle type with the smallest distance as a final recognition result.

The Euclidean distance calculation formula is:

wherein x is _t Is the t-th value, y in the feature vector of the vehicle to be detected _it Is the t-th value in the average feature vector of the i-th vehicle in the query library, and K is the total number of types of vehicles in the query library.

Those skilled in the art will appreciate that all or part of the steps in a method implementing the above embodiments may be implemented by a program to instruct related hardware, and the corresponding program may be stored in a computer readable storage medium.

It should be noted that although the method operations of the above embodiments are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in that particular order or that all illustrated operations be performed in order to achieve desirable results. Rather, the depicted steps may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.

Example 2:

as shown in fig. 6, the present embodiment provides a vehicle re-recognition system based on multi-scale feature learning, which includes an acquisition module 601, a training module 602, a first generation module 603, a second generation module 604, and a recognition module 605, wherein:

the acquiring module 601 is configured to acquire a training set and a query library of vehicle images, and perform preprocessing on the vehicle images in the training set and the query library;

the training module 602 is configured to train the re-recognition network model by using the preprocessed training set; the re-recognition network model comprises eleven layers, wherein each of the first eight layers comprises an inverse bottleneck block, the first, second, fourth, sixth and eighth layers are connected with a maximum pooling layer after the inverse bottleneck block, the ninth layer is a global average pooling layer, the tenth layer is a full-connection layer, and the last layer is a classification layer, wherein the output of each layer is used as the input of the next layer, and the tenth layer outputs the feature vector of an input image; the reverse bottleneck block adopts a multi-branch structure and a channel attention mechanism to extract features with different scales;

the first generating module 603 is configured to input all the vehicle images in the preprocessed query library into the trained re-recognition network model to obtain feature vectors of each vehicle image; obtaining an average feature vector of each type of vehicle according to all feature vectors of each type of vehicle image;

the second generating module 604 is configured to input the preprocessed image of the vehicle to be tested into a trained re-recognition network model to obtain a feature vector of the vehicle to be tested;

the recognition module 605 is configured to calculate the euclidean distance according to the feature vector of the vehicle to be detected and the average feature vector of each type of vehicle in the query library, and select the vehicle type with the smallest distance as the final recognition result.

Specific implementation of each module in this embodiment may be referred to embodiment 1 above, and will not be described in detail herein; it should be noted that, in the system provided in this embodiment, only the division of the above functional modules is used as an example, in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure is divided into different functional modules to perform all or part of the functions described above.

Example 3:

the present embodiment provides a computer apparatus, which may be a computer, as shown in fig. 7, and is connected through a system bus 701, where the processor is configured to provide computing and control capabilities, the memory includes a nonvolatile storage medium 706 and an internal memory 707, the nonvolatile storage medium 706 stores an operating system, a computer program, and a database, the internal memory 707 provides an environment for the operation of the operating system and the computer program in the nonvolatile storage medium, and when the processor 702 executes the computer program stored in the memory, the method for vehicle re-recognition based on multi-scale feature learning of the above embodiment 1 is implemented as follows:

Example 4:

the present embodiment provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the vehicle re-recognition method based on multi-scale feature learning of embodiment 1 described above, as follows:

The computer readable storage medium of the present embodiment may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

In summary, the present invention provides a vehicle re-recognition method, system, computer device and computer readable storage medium based on multi-scale feature learning, which performs data preprocessing and data enhancement by acquiring a vehicle re-recognition data set; training a vehicle re-identification network model by using a cross entropy loss function with label smoothing and an SGD (generalized gateway) optimizer; uniformly scaling the vehicle image to be queried into a fixed size (256 multiplied by 128), and sending the vehicle image to a trained vehicle re-recognition network to generate a 512-dimension feature vector; calculating Euclidean similarity between a vehicle feature vector to be detected and a vehicle feature vector pre-generated in a query library; and outputting similar vehicles according to the similarity sequence. According to the invention, the features extracted by the model have stronger resolving power through learning and fusing the features with different scales. In addition, the design of the low-parameter model makes the reasoning speed faster, reduces the parameter quantity by nearly 80% compared with the classical convolution network, and is easier to meet the actual deployment requirement.

The above-mentioned embodiments are only preferred embodiments of the present invention, but the protection scope of the present invention is not limited thereto, and any person skilled in the art can make equivalent substitutions or modifications according to the technical solution and the inventive concept of the present invention within the scope of the present invention disclosed in the present invention patent, and all those skilled in the art belong to the protection scope of the present invention.

Claims

1. A vehicle re-identification method based on multi-scale feature learning, the method comprising:

2. The vehicle re-identification method of claim 1, wherein the inverse bottleneck block extracts features of different scales using a multi-branch structure and a channel attention mechanism, comprising:

3. The vehicle re-recognition method according to claim 2, wherein the Light 3 x 3 convolution block is a standard 3 x 3 convolution block split into a depth-wise convolution and a point-wise convolution to reduce the number of parameters and the amount of computation.

4. The vehicle re-identification method of claim 2, wherein the assigning different weights to different channels by the attention mechanism layer comprises:

5. The vehicle re-recognition method according to claim 1, wherein, when training the re-recognition network model, each round of data enhancement is performed on the vehicle image in the preprocessed training set to improve the model generalization ability; the data enhancements include random horizontal flipping and random erasure.

6. The vehicle re-recognition method of claim 1, wherein the loss function of the re-recognition network model uses a cross entropy loss function with label smoothing.

7. The vehicle re-recognition method according to any one of claims 1 to 6, characterized in that the preprocessing includes scaling the vehicle image to a set size and performing normalization processing.

8. A vehicle re-identification system based on multi-scale feature learning, the system comprising:

9. A computer device comprising a processor and a memory for storing a program executable by the processor, wherein the processor, when executing the program stored in the memory, implements the vehicle re-identification method of any one of claims 1-7.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the vehicle re-identification method according to any one of claims 1-7.