CN114663861A

CN114663861A - Vehicle re-identification method based on dimension decoupling and non-local relation

Info

Publication number: CN114663861A
Application number: CN202210531995.9A
Authority: CN
Inventors: 王成; 孟庆兰; 田鑫; 郑艳丽; 姜刚武; 庞希愚; 栗士涛; 李曦; 周厚仁; 郑美凤; 孙珂
Original assignee: Shandong Jiaotong University
Current assignee: Shandong Jiaotong University
Priority date: 2022-05-17
Filing date: 2022-05-17
Publication date: 2022-06-24
Anticipated expiration: 2042-05-17
Also published as: CN114663861B

Abstract

The invention relates to the technical field of vehicle heavy identification, in particular to a vehicle heavy identification method based on dimension decoupling and non-local relation, which comprises the following steps: copying partial residual layers after ResNet-50 into three branches with the same structure, and sequentially introducing a global feature extraction mechanism, a non-local relation capture mechanism and a dimension decoupling mechanism into each branch behind the residual layer; the non-local relationship capturing mechanism comprises a non-local relationship capturing module based on a channel and a non-local relationship capturing module based on a space, wherein the non-local relationship capturing module and the non-local relationship capturing module respectively perform noise reduction on the non-local relationship between the channel and the space level, and meanwhile different weights are distributed for the relationship between the channel and the position on the channel and the space level; the dimension decoupling mechanism is used for decoupling the space and the channel, and a part of the characteristics are concentrated in a specific subspace. The invention solves the problems of large intra-class difference and small inter-class difference in vehicle weight identification.

Description

Vehicle re-identification method based on dimension decoupling and non-local relation

Technical Field

The invention relates to the technical field of vehicle re-identification, in particular to a vehicle re-identification method based on dimension decoupling and non-local relation.

Background

With the use and popularization of automobiles, a great number of technical problems related to vehicle management and scheduling are derived to be solved urgently. Vehicle weight recognition has been reported by researchers in the industry as one of the technical difficulties associated with vehicle management scheduling. The vehicle weight recognition aims to find vehicle images belonging to the same identity from images shot by different cameras. In recent years, the vehicle re-identification algorithm based on deep learning has the advantages of unique adaptivity, strong identification precision and the like, so that the deep learning theory is widely applied to the field of vehicle re-identification. At present, the problem of vehicle heavy identification is that the vehicle heavy identification network has low generalization capability and network accuracy due to the problem of large intra-class difference and small inter-class difference.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, provides a vehicle re-identification method based on dimension decoupling and non-local relation, adopts a global feature extraction mechanism, a non-local relation capture mechanism and a dimension decoupling mechanism to design, solves the problems of larger intra-class difference and smaller inter-class difference in the vehicle re-identification problem, and can improve the generalization capability and network precision of a vehicle re-identification network.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a vehicle re-identification method based on dimension decoupling and non-local relation comprises the following steps:

a convolutional neural network Resnet50 is used as a backbone, partial residual error layers (res _ conv4_ 2-res _ conv5) behind ResNet-50 are copied into three branches with the same structure, and a global feature extraction mechanism, a non-local relationship capture mechanism and a dimension decoupling mechanism are sequentially introduced into each branch behind the residual error layers;

the global feature extraction mechanism is used for extracting global features of the vehicle, and feature graphs generated by a res _ conv5 layer are sequentially input into a Global Average Pooling (GAP) and a channel dimension reduction module consisting of a 1 × 1 convolution, Batch Normalization (BN) and a ReLU activation function to obtain 256-dimensional feature representation;

the non-local relationship capturing mechanism comprises a non-local relationship capturing module based on a channel and a non-local relationship capturing module based on a space, wherein the non-local relationship capturing module and the non-local relationship capturing module respectively perform noise reduction on the non-local relationship between the channel and the space level, and meanwhile different weights are distributed for the relationship between the channel and the position on the channel and the space level; the mechanism aims to dig out useful non-local relations so as to improve the performance of the network;

the dimension decoupling mechanism is used for stripping mutual interference of information between the channel and the space, and decoupling the space and the channel, so that a part of features are concentrated in a specific subspace.

Further, the non-local relationship capture mechanism is specifically operative to: simultaneously inputting the tensors output by the res _ conv5 layer into a space-based non-local relation capturing module and a channel-based non-local relation capturing module; the feature size through the spatial and channel non-local relationship capture module is unchanged, and then the dimensionality is reduced from 2048 to 256 through Global Average Pooling (GAP), 1 × 1 convolution, Bulk Normalization (BN), and ReLU operations.

Further, the dimensional decoupling mechanism is specifically operative to: firstly, a feature map is divided into two parts from the horizontal direction in a space dimension to obtain two feature maps, and then the feature maps are decoupled on average in a channel dimension, namely the two obtained feature maps are divided into two parts on the channel dimension, aiming at cutting off redundant channels in the channel dimension in the two spaces, so that the calculated amount is reduced, and the relation between the spaces and the channels is stripped; in the third branch, the invention takes two gray features in the feature map generated by res _ conv5 as two decoupled subspaces; after the decoupling operation, independently performing feature extraction operations in two subspaces: the feature maps were subjected to Global Average Pooling (GAP), 1 × 1 convolution, Bulk Normalization (BN), and ReLU operations, reducing the dimensionality from 2048 to 256.

Further, the composition structure of the space-based non-local relationship capture module is as follows: is provided with

Is an input to a feature extraction module, wherein

And

respectively the width and the height of the input tensor,

is the number of channels; wherein the function

Is input as a pair

To carry out

A depth separable convolution operation with a convolution kernel of size

Then obtain

Each size is

And then deforming both of them into

Then spliced in the spatial dimension to obtain

The matrix is marked as A;

for the input tensor

Performing 1 × 1 convolution operation, and performing deformation operation to obtain

Then multiplying A by the B matrix to determine a non-local relationship and obtain a value of

A matrix of (a); using for each column on the basis of the derived non-local relation

Activating functions to obtain a suitable probability matrix

：

；

Has a size of

Then, the sum of each row of the probability matrix is calculated to obtain a weight vector

Each element of the vector

Represents the first

A weight of each spatial location;

expressed as:

；

space-based non-local relationship capture module final output characteristics

Comprises the following steps:

；

in the feature extraction module,

function is as

A convolution kernel of size

May separate the convolutions. A set of depth separable convolution operations is used to obtain a global set of key value distributions to measure the importance of each point, and thusAs the size of the convolution kernel is

This means that each convolution fuses all feature points to obtain a global feature representation, and different weights are assigned to different positions. Wherein

For adjustable hyper-parameters, it is found through experiments

The present network achieves optimal performance.

Furthermore, in order to supplement the function of the space-based non-local relationship capturing module and improve the performance of the network, the channel-based non-local relationship capturing module is added in the network, and the module has a similar principle with the space-based non-local relationship capturing module, and is different in that the module focuses on the relationship between the channels, and the performance of the network is improved by establishing the non-local relationship among a plurality of groups of channels.

The channel-based non-local relationship capture module comprises the following components: first, the original features are combined

Input to two 1 × 1 convolution operations; the first 1 x 1 convolution is used to remove the channel from

Is compressed to

Wherein

To be hyper-parametric and then deforming it into

Of size of

(ii) a After the second 1 × 1 convolution operation, the distortion is

Of size of

Will be

Is transposed with respect to

Matrix multiplication is carried out to obtain the value of

Is given as a global reference

(ii) a Will be provided with

Each column of

Activating the function to obtain a probability matrix; then, the sum of each row of the probability matrix is calculated, which is a function

Representation (this operation and solving in a space-based non-local relationship capture Module

Is similar) to obtain a size of

Global feature mask based on channel

，

；

Then will be

And input features

After element dot multiplication, and

adding to obtain a final output characteristic representation; channel-based non-local relationship capture module final output characteristics

Comprises the following steps:

。

the invention has the technical effects that:

compared with the prior art, the vehicle re-identification method based on the dimension decoupling and the non-local relation is designed by adopting a global feature extraction mechanism, a non-local relation capture mechanism and a dimension decoupling mechanism, the global feature extraction mechanism is used for capturing relatively complete and coarse-grained features, the non-local relation capture mechanism is used for respectively extracting significant information from a feature map obtained by a backbone network on the spatial dimension and the channel dimension, so that the network can extract the features with finer granularity, the space and the channel are thoroughly decoupled by the dimension decoupling mechanism, and a part of the features are concentrated on a specific subspace. According to the invention, different useful information is respectively extracted through the three branches, and the three branches can assist each other, so that the performance of the model is optimal, and the generalization capability and the network precision of the vehicle weight recognition network are greatly improved.

Drawings

FIG. 1 is a vehicle re-identification network architecture diagram based on dimensional decoupling and non-local relationships in accordance with the present invention;

FIG. 2 is a diagram of a space-based non-local relationship capture module architecture according to the present invention;

FIG. 3 is a diagram of a lane-based non-local relationship capture module architecture according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings of the specification.

Example 1:

the embodiment relates to a vehicle re-identification method based on dimension decoupling and non-local relation, which comprises the following steps:

the convolutional neural network Resnet50 is used as a backbone network to enhance the feature extraction capability of the network, the network structure is shown in FIG. 1, partial residual error layers (res _ conv4_ 2-res _ conv5) behind ResNet-50 are copied into three branches with the same structure, and a global feature extraction mechanism, a non-local relationship capture mechanism and a dimension decoupling mechanism are sequentially introduced into each branch behind the residual error layer;

the global feature extraction mechanism is used for extracting global features of the vehicle, called global branches; the global branch inputs the feature maps generated by the res _ conv5 layer to a Global Average Pooling (GAP) and a channel dimension reduction module consisting of 1 × 1 convolution, Batch Normalization (BN) and ReLU activation functions in sequence to obtain 256-dimensional feature representation;

the non-local relationship capture mechanism comprises a channel-based non-local relationship capture module and a space-based non-local relationship capture module; firstly, simultaneously inputting the tensor output by the res _ conv5 layer into a space-based non-local relation capturing module and a channel-based non-local relation capturing module, and capturing highly detailed space and channel correlation by adopting a parallel structure; the non-local relation capture module based on the channel and the non-local relation capture module based on the space respectively perform noise reduction on the non-local relation between the channel and the space level, and meanwhile different weights are distributed for the relation between the channels and the positions on the channel and the space level; the mechanism aims to dig out useful non-local relations and further improve the performance of the network. The feature size of the spatial and channel non-local relation capture module is unchanged, the subsequent operation is the same as the global branch, and the dimensionality is reduced from 2048 to 256 through Global Average Pooling (GAP), 1 × 1 convolution, Batch Normalization (BN) and ReLU operation. Because the non-local relations of the input features are not all useful, only the specific non-local relations have a remarkable effect on improving the network precision, but the redundant non-local relations can become interference factors for capturing the features with the identifiability by the network, therefore, the non-local relation capturing mechanism solves the problems that the calculation amount is too large due to the fact that all relations of all positions are calculated by the existing network, the redundant non-local relations occupy calculation resources and interfere the network to capture the features which really play the role of identification, and improves the network precision of the vehicle re-identification network.

The dimension decoupling mechanism is used for stripping mutual interference of information between the channel and the space and decoupling the space and the channel, so that a part of features are concentrated in a specific subspace; the specific operation is that the feature map is firstly divided into two parts from the horizontal direction in the space dimension to obtain two feature maps, and then the two feature maps are averagely decoupled in the channel dimension, namely the two feature maps are respectively divided into two parts in the channel direction, aiming at cutting off redundant channels in the channel dimension in the two spaces, so that the calculated amount is reduced, the relation existing between the spaces and the channels is stripped, and compared with the previous method that each subspace is associated with all the channels, the decoupling branch really extracts the local features from only partial channels and subspaces. In the third branch, as shown in fig. 1, the present invention takes the two gray features in the feature map generated by res _ conv5 as the two decoupled subspaces. And independently executing the feature extraction operation in the two subspaces after the decoupling operation, wherein the feature extraction operation is the same as the subsequent operation of the first two branches. Because the hard division strategies are used for simply dividing the input features on the spatial level, the feature information of each region subjected to hard division on the channel dimension is ignored. Because the features to be paid attention to are different among the partitions after hard partitioning, the region of interest in the channel dimension also changes among the partitions. Therefore, the invention adopts a more accurate characteristic capture mode, namely a dimension decoupling mechanism, can reduce most of calculated amount, and effectively breaks through the problem that the information between the subspace and the channel interferes with each other to influence the network precision when the existing hard division strategy extracts fine-grained characteristics.

As shown in fig. 2, the structure of the space-based non-local relationship capture module is as follows: is provided with

Input data for a feature extraction module, wherein

And

respectively the width and the height of the input tensor,

is the number of channels. Wherein the function

Is to input

To carry out

A (a)

For adjustable hyper-parameters) depth separable convolution operation with a convolution kernel size of

Then obtain

Each size is

And then deforming both of them into

Then splicing is carried out in the spatial dimension to obtain

Of the matrix of (a). Let the matrix be A.

For the input tensor

A matrix of (a); using for each column on the basis of the derived non-local relationship

Activating functions to obtain a suitable probability matrix

：

；

Has a size of

Each element of the vector

Represents the first

A weight of each spatial location;

expressed as:

；

space-based non-local relationship capture module final output characteristics

Comprises the following steps:

；

in the feature extraction module,

function is as

A convolution kernel of size

Can separate the convolutions. The importance of each point is measured by obtaining a global set of key-value distributions using a set of depth-separable convolution operations, and because the size of the convolution kernel is

For adjustable hyper-parameters, it is found through experiments

The present network achieves optimal performance.

In order to supplement the functions of the space-based non-local relationship capture module and improve the performance of the network, the invention adds the channel-based non-local relationship capture module in the network, and the module has a similar principle with the space-based non-local relationship capture module, and is different in that the module focuses on the relationship between channels, and the performance of the network is improved by establishing the non-local relationship among a plurality of groups of channels.

As shown in fig. 3, the component structure of the channel-based non-local relationship capture module is: first, the original features are combined

Is compressed to

（

For adjustable hyper-parameters, the network will

Set to 4) and then deformed into

(size

). After the second 1 × 1 convolution operation, the distortion is

(size

) Will be

Is transposed with respect to

Matrix multiplication is carried out to obtain the value of

Is given as a global reference

. Will be provided with

Each column of

The function is activated to obtain a probability matrix. Then, the sum of each row of the probability matrix is calculated, which is a function

Operation is similar) to obtain a dimension of

Based on the global feature mask of the channel

，

；

Then will be

And input features

After element dot multiplication, and

the addition yields the final output characteristic representation. Channel-based non-local relationship capture module final output characteristics

Comprises the following steps:

。

in order to improve the learning and identification capability of the network, the invention adopts a cross entropy loss function and a triple loss function to constrain the network:

the invention uses ResNet-50 as backbone network, setting batch size to be 16, training round number to be 450 rounds, and adjusting size of image to be 256 × 256 before inputting into network. In the training phase, the 256-dimensional features after dimensionality reduction are constrained by using the triple loss training. In addition, the reduced-dimension features are passed through a full connectivity layer (fc) to convert the 256-dimension features into the number of vehicle IDs for the dataset, and then cross-entropy loss is used for training constraints. In the testing stage, the Euclidean distance is used for carrying out similarity measurement on the vehicle images.

The invention provides a vehicle re-identification network based on dimension decoupling and non-local relation, which is used for a vehicle re-identification task. In both dimension-based decoupling and non-local relationship networks, three branches are used to learn a variety of useful information. The first branch captures relatively complete and coarse-grained features; for branch two, the significance information of the feature graph obtained by the backbone network is respectively extracted on the space dimension and the channel dimension, so that the network can extract features with finer granularity. The third branch makes a complete decoupling of the space and the channel, with some features dedicated to a particular one of the subspaces. In general, three branches respectively extract different useful information, and the three branches can assist each other to optimize the performance of the model.

The above embodiments are only specific examples of the present invention, and the scope of the present invention includes but is not limited to the above embodiments, and any suitable changes or modifications by those of ordinary skill in the art, which are consistent with the claims of the present invention, shall fall within the scope of the present invention.

Claims

1. A vehicle re-identification method based on dimension decoupling and non-local relation is characterized in that: the method comprises the following steps:

using a convolutional neural network Resnet50 as a backbone, copying partial residual error layers after ResNet-50 into three branches with the same structure, and sequentially introducing a global feature extraction mechanism, a non-local relationship capture mechanism and a dimension decoupling mechanism into each branch after the residual error layers;

the global feature extraction mechanism is used for extracting global features of the vehicle, and feature maps generated by a res _ conv5 layer are sequentially input into a global average pooling module and a channel dimension reduction module consisting of a 1 × 1 convolution, batch standardization and a ReLU activation function to obtain 256-dimensional feature representation;

2. The method of claim 1 for vehicle re-identification based on dimensional decoupling and non-local relationships, characterized by: the non-local relationship capture mechanism is specifically operative to: simultaneously inputting the tensors output by the res _ conv5 layer into a space-based non-local relation capturing module and a channel-based non-local relation capturing module; the feature size through the spatial and channel non-local relationship capture module is unchanged, and then the dimensionality is reduced from 2048 to 256 through global average pooling, 1 × 1 convolution, batch normalization and ReLU operation.

3. The method of claim 1 for vehicle re-identification based on dimensional decoupling and non-local relationships, characterized by: the dimension decoupling mechanism is specifically operative to: firstly, dividing a feature map into two parts in the horizontal direction in a space dimension to obtain two feature maps, and then equally dividing the two obtained feature maps into two parts in a channel dimension; after the decoupling operation, independently performing feature extraction operations in two subspaces: the feature map is subjected to global average pooling, 1 × 1 convolution, batch normalization and ReLU operation, and the dimensionality is reduced from 2048 to 256.

4. The method of claim 1 for vehicle re-identification based on dimensional decoupling and non-local relationships, characterized by: the composition structure of the space-based non-local relationship capture module is as follows: is provided with

Is an input to a feature extraction module, wherein

And

respectively is an input sheetThe width and the height of the volume are,

is the number of channels; wherein the function

Is to input

To carry out

A depth separable convolution operation with a convolution kernel of size

Then obtain

Each size is

And then deforming both of them into

Then splicing is carried out in the spatial dimension to obtain

The matrix is marked as A;

for the input tensor

Activating functions to obtain a suitable probability matrix

：

；

Has a size of

Each element of the vector

Represents the first

A weight of each spatial location;

expressed as:

；

space-based non-local relationship capture module final output characteristics

Comprises the following steps:

；

in the feature extraction module,

function is as

A convolution kernel of size

May separate the convolutions.

5. The method of claim 1 for vehicle re-identification based on dimensional decoupling and non-local relationships, characterized by: the channel-based non-local relationship capture module comprises the following components: first, the original features are combined

Is compressed to

In which

Is ultraParameter, then deforming it into

Of size of

(ii) a After the second 1 × 1 convolution operation, the distortion is

Of size of

Will be

Is transposed with respect to

Matrix multiplication is carried out to obtain the value of

Is given as a global reference

(ii) a Will be provided with

Each column of

Expressed, get the size of

Based on the global feature mask of the channel

，

；

Then will be

And input features

After element dot multiplication, and

Comprises the following steps:

。

6. the method for vehicle re-identification based on dimensional decoupling and non-local relationship as claimed in any one of claims 1-5, wherein: and (3) constraining the network by adopting a cross entropy loss function and a triplet loss function:

using ResNet-50 as a backbone network, setting the batch processing size to be 16, the number of training rounds to be 450 rounds, and adjusting the size of an image to be 256 multiplied by 256 before the image is input into the network; in the training stage, the 256-dimensional features after dimension reduction are constrained by utilizing triple loss training; in addition, the 256-dimensional features are changed into the number of vehicle IDs of the data set through a full connection layer after the features are subjected to dimension reduction, and then cross entropy loss is used for training and constraining; in the testing stage, the Euclidean distance is used for carrying out similarity measurement on the vehicle images.