CN113723356B

CN113723356B - Vehicle re-identification method and device with complementary heterogeneous characteristic relationships

Info

Publication number: CN113723356B
Application number: CN202111078976.7A
Authority: CN
Inventors: 李甲; 赵佳健; 赵一凡; 郭鑫; 赵沁平
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2021-09-15
Filing date: 2021-09-15
Publication date: 2023-09-19
Anticipated expiration: 2041-09-15
Also published as: CN113723356A

Abstract

The invention discloses a vehicle re-identification method and device with complementary heterogeneous characteristic relationships, comprising the following steps: acquiring a vehicle image, inputting the vehicle image into a convolutional neural network, and extracting to obtain heterogeneous characteristics of a plurality of different layers; constructing a graph relationship complementation module, and fusing a plurality of heterogeneous features of different layers from a low layer to a high layer based on the relationship by using the graph relationship complementation module to obtain a cross-layer complementation feature; extracting local features of the vehicle image through progressive central pooling operation, and carrying out hetero-relationship fusion on the local features and the highest-level complementary features in the cross-layer complementary features by utilizing a graph relationship complementary module to obtain hetero-complementary features; and splicing the cross-layer complementary features and the heterogeneous complementary features to obtain the vehicle image characterization features comprising multiple layers of semantic information and multiple layers of local area information. The invention can be widely applied to computer vision systems in the fields of urban traffic, public safety, automatic driving and the like.

Description

Vehicle re-identification method and device with complementary heterogeneous characteristic relationships

Technical Field

The invention relates to the field of computer vision and multimedia analysis, in particular to a vehicle re-identification method and device with complementary heterogeneous characteristic relations.

Background

Given a vehicle image, the purpose of vehicle re-identification is to be able to find images of the vehicle from different cameras in the vehicle database. Vehicle re-identification has gained more and more attention from researchers because of the wide application prospect in urban public safety and intelligent transportation systems. With the disclosure of numerous data sets and the application of deep learning in recent years, vehicle re-identification has made significant progress.

Disclosure of Invention

In light of the above-mentioned actual needs and key problems, the present invention aims to: the vehicle re-identification method based on the heterogeneous characteristic relation complementation is characterized in that a vehicle image is input, different heterogeneous characteristics are extracted through a depth network, then a graph relation complementation module is used for realizing complementation between the characteristics based on the relation, and finally the characteristic characteristics of the vehicle are output.

The invention comprises the following 4 steps:

step S100, acquiring a vehicle image, inputting the vehicle image into a convolutional neural network ResNet, and extracting to obtain a plurality of heterogeneous features of different layers, wherein the heterogeneous features of the different layers are heterogeneous features from low layers to high layers;

step S200, a graph relationship complementation module is constructed, and a plurality of heterogeneous features of different levels are fused from low level to high level by utilizing the graph relationship complementation module and based on the relationship, so as to obtain a cross-layer complementation feature, wherein the cross-layer complementation feature is a multi-layer complementation feature from low level to high level;

step S300, extracting local features of a vehicle image through progressive central pooling operation, and carrying out heterogeneous relation fusion on the local features and the complementary features of the highest level in the cross-layer complementary features by using a graph relation complementary module to obtain heterogeneous complementary features, wherein the local features comprise local region information;

and step S400, splicing the cross-layer complementary features and the heterogeneous complementary features to obtain vehicle image characterization features comprising multiple layers of semantic information and multiple layers of local area information, wherein a triple loss function and a cross entropy loss function are adopted to monitor and optimize a network in the training stage from step S100 to step S400.

The biggest difficulty in vehicle weight identification is that the same vehicle image features taken at different angles are significantly different, e.g. the front and rear of the vehicle have a large profile difference. For this difficulty, the current deep learning method can be divided into two types: data-driven and feature complementary. The data-driven type method considers that enough data is relied on to solve the difficulty, but the real data acquisition cost in reality is too high, and for this reason, the method uses a three-dimensional (3D) rendering model or generates a large amount of synthetic data in an anti-learning mode. Whereas current feature complementary methods mainly employ high-discrimination local area features to supplement global features. In order to accurately locate a local area with high recognition, the current method adopts additional labeling information such as a key point locating label, a detection frame label, a component segmentation label and the like to assist network learning of corresponding local features.

The method disclosed by the invention belongs to a vehicle re-identification method for carrying out feature complementation by utilizing heterogeneous features extracted from a deep network, and has two beneficial characteristics compared with the feature complementation network: 1) No additional image labeling information is needed, so that labor cost is saved, and the practicability of the method is improved; 2) The method has the advantages of supplementing key local area characteristics and complementing semantic information of different layers.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.

FIG. 1 is a general framework diagram of a vehicle re-identification method implementation with complementary heterogeneous feature relationships of the present invention;

FIG. 2 is a flow chart of some embodiments of a vehicle re-identification method of the present invention with heterogeneous signature complementation;

FIG. 3 is a block diagram of the complementary relationships of the graphs in the vehicle re-identification methods S200 and S300 of the present invention;

FIG. 4 is a flowchart of the steps of a method S200 for identifying a vehicle re-using the heterogeneous characteristic relationship complementation of the present invention;

fig. 5 is a flowchart of steps of a vehicle re-identification method S300 with complementary heterogeneous characteristics.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings. Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

FIG. 2 is a flow chart of some embodiments of a vehicle re-identification method of the present invention with heterogeneous signature complementation.

And step S100, acquiring a vehicle image, inputting the vehicle image into a convolutional neural network ResNet, and extracting to obtain a plurality of heterogeneous characteristics with different layers.

In some embodiments, the execution subject of the vehicle re-recognition method with complementary heterogeneous feature relationships may acquire a vehicle image, input the vehicle image into a convolutional neural network res net, and extract a plurality of heterogeneous features with different levels. Wherein the plurality of heterogeneous features of different levels are heterogeneous features from a low level to a high level. ResNet is a feature extractor that includes 4 stages. For convolutional neural network ResNet, the execution body selects the output characteristics of the last layer of the plurality of network blocks (the number of the plurality of network blocks can be dynamically adjusted according to specific situations) in the 3 stages after extraction as heterogeneous characteristics of a plurality of different layers. For this design, different architectures of ResNet and various variant networks (e.g. ResNeXt, SE-Net, etc.) can be used, and the last layer of features of the corresponding stage corresponding network block can be extracted.

As an example, the above-described execution subject may select 3 stages after extraction to be S2, S3, and S4 in fig. 1. S2 represents the second stage. S3 represents the third stage. S4 represents the fourth stage.

And step 200, constructing a graph relationship complementation module, and fusing a plurality of heterogeneous features of different layers from a low layer to a high layer based on the relationship by using the graph relationship complementation module to obtain a cross-layer complementation feature.

Fig. 4 is a flowchart showing steps of a vehicle re-identification method S200 with complementary heterogeneous characteristic relationships according to the present invention. The step flow of S200 is as follows:

in step S210, the features in the S2 stage in ResNet are subjected to graph relationship complementation. The execution subject builds a graph relationship complementation module, and utilizes the graph relationship complementation module to fuse heterogeneous features of a plurality of different layers from a low layer to a high layer and based on the relationship so as to obtain cross-layer complementation features. Wherein the cross-layer complementary feature is a multi-layer complementary feature from low level to high level. The method can comprise the following steps:

the first step, by carrying out dot multiplication operation and limitation of a preset threshold value alpha on the heterogeneous feature vectors V, a relation coefficient matrix A of the heterogeneous feature vectors is obtained by using the following formula:

where a represents a matrix of relationship coefficients. A is that _ij Representation about V _i and V_j Is a matrix of relationship coefficients of (a). V represents a heterogeneous feature vector. V (V) _i Representing the i-th heterogeneous feature vector. i represents a sequence number. V (V) _j Representing the j-th heterogeneous feature vector. j represents a sequence number.Representing the transpose of the j-th heterogeneous feature vector. Alpha represents a predetermined threshold.

And secondly, regularizing the relation coefficient matrix A to obtain a regularized relation coefficient matrix.

And thirdly, multiplying the regularized relation coefficient matrix by a heterogeneous feature vector V, and performing feature complementation based on the relation to obtain cross-layer complementation features.

As an example, feature maps extracted in S2 within the res net are compressed into a vector v by a global averaging pooling function (GAP, global average pooling) and a 1 x 1 convolution layer is used to reduce the dimension of the feature vector. Then, all the heterogeneous feature vectors are spliced into a heterogeneous feature vector V by a splicing operation C ():

V＝C(W ₁ V ₁ ，...，W _k V _k )。

where V represents a heterogeneous feature vector. C () represents a splicing operation. W represents a matrix of learnable parameters in a 1 x 1 convolutional layer. W (W) ₁ Representing the 1 st learnable parameter matrix. V (V) ₁ Representing the 1 st heterogeneous feature vector. W (W) _k Representing the kth learnable parameter matrix. k represents a sequence number. V (V) _k Representing the kth heterogeneous feature vector.

And then, through a graph relationship complementation module, each feature vector fuses the complementation information based on the relationship of other vectors. In the graph relationship complementation module, the relationship coefficient matrix A of the heterogeneous feature vectors is obtained by carrying out dot multiplication operation and limitation of a preset threshold value alpha on the heterogeneous feature vectors V. And then, carrying out L1 regularization on the relation coefficient matrix A, namely L1 norm regularization, wherein the L1 norm regularization is to add the L1 norm into the cost function, so that the learned result can meet the sparsification, and the feature extraction is convenient. The individual values of each row in the constraint relation matrix are between (0, 1). Then, graph regularization (2016, TN Kipf and MWelling) is carried out, so that the relation matrix approximates to the Laplace matrix, the regularized relation coefficient matrix is multiplied by the heterogeneous feature matrix V, and feature complementation based on the relation is carried out, so that cross-layer complementation features are obtained.

In some optional implementations of some embodiments, the above-mentioned fusing, from a low level to a high level, of heterogeneous features of different levels by using a graph relationship complementation module and based on a relationship, to obtain a cross-layer complementation feature may further include the following steps:

the cross-layer complementary features are multiplied by a matrix of learnable parameters W and processed through a neuron removal layer dropout, a Batch regularization layer Batch Norm, and an activation function ReLU, as shown in fig. 3. The cross-layer complementary feature is further enhanced using the following formula. At the same time, to prevent gradient vanishing, the method adds a residual connection:

wherein ,representing cross-layer complementary features of constant feature dimensions. ReLU () represents an activation function. BN () represents bulk regularization layer operations. Dropout () represents a neuron removal layer operation. A represents a matrix of relationship coefficients. V represents a heterogeneous feature vector. W (W) _a Representing a first matrix of learnable parameters. W (W) _b Representing a second matrix of learnable parameters. />Representing cross-layer complementary features of feature dimension compression.

In the graph relation complementation module, a two-time learnable parameter matrix is used, and the first parameter matrix is used for keeping the original properties of the cross-layer complementation feature without changing the feature vector dimension. The effect of using the parameter matrix a second time is to reduce the dimension of the heterogeneous feature vector for reducing the complexity of subsequent operations.

Optionally, the building graph relationship complementation module, using the graph relationship complementation module to fuse heterogeneous features of different layers from low layer to high layer and based on relationship, to obtain cross-layer complementation features, may include the following steps:

the first step, complementary semantic information is carried out on the low-level heterogeneous characteristics through a graph relation complementary module, and the complementary heterogeneous characteristics are obtained through the following formula:

wherein ,representing the heterogeneous character after complementation. G () represents the graph relationship complementation module process. C () represents a splicing operation. W represents a matrix of parameters that can be learned. W (W) ₁ Representing the 1 st learnable parameter matrix. V (V) ₁ Representing the 1 st heterogeneous feature vector. W (W) _k Representing the kth learnable parameter matrix. k represents a sequence number. V (V) _k Representing the kth heterogeneous feature vector.

Secondly, splicing the complementary heterogeneous characteristics into a characteristic vector, inputting the characteristic vector into a next layer, carrying out characteristic fusion with the heterogeneous characteristics of a higher layer, and obtaining the complementary heterogeneous characteristics in the next layer through the following formula:

where V' represents the complementary heterogeneous features in the next layer. C () represents a splicing operation. W'. ₁ Representing the 1 st learnable parameter matrix in the next layer. V'. ₁ Representing the 1 st heterogeneous feature vector in the next layer. W'. _u Representing the u-th learnable parameter matrix in the next layer. u represents a sequence number. V'. _u Representing the u-th heterogeneous feature vector in the next layer. W'. _u+1 Representing the u+1th learnable parameter matrix in the next layer.Representing the heterogeneous character after complementation.

Step S220, the features in the S3 stage and the complementary features of the step S210 are subjected to graph relation complementation. The feature vector is generated after the feature map extracted in S3 in ResNet is operated in the same way as in the step S210, and all the feature vectors in the S3 stage and the complementary heterogeneous feature output in the step S210The complementary heterogeneous features V' in the next layer (e.g., S4) are generated by stitching together. And then, through a graph relationship complementation module, each feature vector fuses the complementation information based on the relationship of other vectors and is used as a complementation vector to be transmitted into S4.

And step S230, carrying out relation complementation on the characteristics in the step S4 and the complementary characteristics in the step S220, separating the spliced characteristics, and transmitting the complementary characteristics of the highest level into the step S300. After the feature map extracted in S3 in ResNet is operated in the same way as in the step S220, a complementary feature vector fused with semantic information of different layers is obtained, then the feature vector representing information of different layers is separated through a separation operation, and the feature vector representing information of the highest layer is transmitted into the step S300 to be complementary with further feature of the feature expansion of the local area; and the other vectors are passed to step S400 as part of the final feature.

Step S300, extracting local features of the vehicle image through progressive central pooling operation, and carrying out heterogeneous relation fusion on the local features and the complementary features of the highest level in the cross-layer complementary features by utilizing a graph relation complementary module to obtain heterogeneous complementary features.

In some embodiments, the executing body may extract a local feature of the vehicle image through progressive central pooling operation, and perform hetero-relationship fusion on the local feature and a complementary feature of a highest level in the cross-layer complementary features by using a graph relationship complementary module, where the local feature includes local area information.

And based on the highest-level complementary feature in the cross-layer complementary features, carrying out information complementation on the local features and the highest-level complementary features under a graph relationship complementary module, so that the highest-level complementary features fused with low-level semantic information obtain the complementation of local region information, and the heterogeneous complementary features are obtained.

Fig. 5 is a flowchart of steps of a vehicle re-identification method S300 with complementary heterogeneous characteristics. The step flow of S300 is as follows:

step S310, a progressive center pooling operation and a mapping operation are adopted to obtain local area characteristics. The progressive center pooling operation may include the steps of:

the first step, based on priori knowledge, progressive center pooling operation is adopted, the center of an image with the size of X multiplied by Y is used as a fixed point, a sensing area is gradually enlarged, and mask tensors M of S local areas with different sizes and based on the image center are extracted. The prior knowledge is that in vehicle re-identification, the vehicle is located at the middle position of the image, and S mask tensors M of different sizes based on the local area of the center of the image are extracted through the following formula:

where M represents the mask tensor.Representing the kth mask tensor. x represents the abscissa of the position coordinates of the pixel point in the image. y represents the ordinate of the position coordinates of the pixel points in the image. k represents a sequence number. X represents the width of the image. Y represents the high of the image. />Representing the square of the radius of the kth local area. R represents the radius of the local area. R is R _k Representing the radius of the kth local area. R is in the range of->And->The value range of k is k less than or equal to S;

secondly, taking the position invariance of the convolutional neural network into consideration, extracting a corresponding regional characteristic map from global characteristics through mapping operation and global pooling operation, carrying out linear change through a learnable parameter matrix and a learnable bias vector, and obtaining local by using the following formulaFeature map F ^r ：

wherein ,the kth local feature map is represented. F (F) ^r A local feature map is shown. W represents a matrix of parameters that can be learned. W (W) _k Representing the kth learnable parameter matrix. Phi denotes a global pooling operation. P () represents a mapping operation. F (F) ^g Representing a global feature map. M is M ^k Representing the kth mask tensor. B (B) _k Representing the kth learnable bias vector. B represents a learnable bias vector. k represents a sequence number. S denotes the total number of mask tensors.

In step S320, the local features are complementary to the highest-level features in step S230, and the process goes to step S400. And (3) forming a feature vector by the same operation as the step (S210), splicing all the feature vectors and the feature vector representing the highest-layer information output in the step (S230) together to form a vector matrix, carrying out relationship-based fusion complementation on each feature vector by a graph relationship complementation module to form a complementation vector comprising local key information, and then transmitting the complementation vector into the step (S400).

And step S400, splicing the cross-layer complementary features and the heterogeneous complementary features to obtain vehicle image characterization features comprising multiple layers of semantic information and multiple layers of local area information.

In some embodiments, the executing entity may splice the cross-layer complementary feature and the heterogeneous complementary feature to obtain the vehicle image characterization feature including the multi-layer semantic information and the multi-layer local area information. In the training phase from step S100 to step S400, the triple loss function and the cross entropy loss function are adopted to monitor and optimize the network.

As an example, feature dimension compression is performed on all cross-layer complementary features and local region complementary features according to importance of the cross-layer complementary features, the feature dimension of a higher layer is higher, the feature dimension containing larger region information is higher, then the cross-layer complementary features and the local region complementary features are spliced into a final feature vector, and a triple loss function and a cross entropy loss function are adopted to monitor and optimize a network in a training stage.

It will be appreciated that the elements described in a vehicle re-identification apparatus with complementary heterogeneous characteristics correspond to the steps of the method described with reference to fig. 2. Thus, the operations, features and the beneficial effects described above for the method are equally applicable to a vehicle re-identification device with complementary heterogeneous feature relationships and the units contained therein, and are not described herein again.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above technical features, but encompasses other technical features formed by any combination of the above technical features or their equivalents without departing from the spirit of the invention. Such as the above-described features, are mutually substituted with (but not limited to) the features having similar functions disclosed in the embodiments of the present disclosure.

Claims

1. A vehicle re-identification method with complementary heterogeneous characteristic relationships comprises the following steps:

step S400, splicing cross-layer complementary features and heterogeneous complementary features to obtain vehicle image characterization features comprising multiple layers of semantic information and multiple layers of local area information, wherein in the training stage from step S100 to step S400, a triple loss function and a cross entropy loss function are adopted to monitor and optimize a network;

the progressive center pooling operation includes the steps of:

gradually expanding the sensing region by taking the center of an image with the size of X multiplied by Y as a fixed point based on priori knowledge, and extracting mask tensors M of local regions based on the image center with S different sizes, wherein the priori knowledge is that in vehicle re-identification, a vehicle is positioned at the middle position of the image, and the mask tensors M of the local regions based on the image center with S different sizes are extracted through the following formula:

where M represents the mask tensor,represents the kth mask tensor, X represents the abscissa of the position coordinates of the pixel points in the image, Y represents the ordinate of the position coordinates of the pixel points in the image, k represents the number, X represents the width of the image, Y represents the height of the image, and->Represents the square of the radius of the kth local area, R represents the radius of the local area, R _k The radius of the kth partial region is represented, and R is in the range of +.>And->The value range of k is k less than or equal to S;

taking the position invariance of the convolutional neural network into consideration, extracting a corresponding regional characteristic map from global characteristics through mapping operation and global pooling operation, carrying out linear change through a learnable parameter matrix and a learnable bias vector, and obtaining a local characteristic map F by using the following formula ^r ：

wherein ,represents the kth partial feature map, F ^r Representing a local feature map, W representing a matrix of learnable parameters, W _k Represents the kth learnable parameter matrix, phi represents the global pooling operation, P () represents the mapping operation, F ^g Representing global feature map, M ^k Represents the kth mask tensor, B _k Represents the kth learnable bias vector, B represents the learnable bias vector, k represents the sequence number, and S represents the total number of mask tensors.

2. The method of claim 1, wherein the fusing, from a low level to a high level and based on a relationship, the heterogeneous features of the plurality of different levels with the graph relationship complementation module to obtain the cross-layer complementation feature comprises:

the relation coefficient matrix A of the heterogeneous eigenvectors is obtained by carrying out dot multiplication operation and limitation of a preset threshold value alpha on the heterogeneous eigenvectors V in pairs and utilizing the following formula:

wherein A represents a relation coefficient matrix, A _ij Representation about V _i and V_j V represents heterogeneous eigenvectors, V _i Represents the i-th heterogeneous feature vector, i represents the sequence number, V _j Represents the j-th heterogeneous feature vector, j represents the sequence number,representing a transpose of the j-th heterogeneous feature vector, α representing a predetermined threshold;

regularizing the relation coefficient matrix A to obtain a regularized relation coefficient matrix;

and multiplying the regularized relation coefficient matrix by a heterogeneous feature vector V, and performing feature complementation based on the relation to obtain cross-layer complementation features.

3. The method of claim 2, wherein the fusing, from a low level to a high level and based on a relationship, the heterogeneous features of the plurality of different levels with the graph relationship complementation module to obtain the cross-layer complementation feature further comprises:

multiplying the cross-layer complementary features by a learnable parameter matrix W, and further enhancing the cross-layer complementary features by a neuron removal layer dropout, a Batch regularization layer Batch Norm, and an activation function ReLU by using the following formula:

wherein ,representing cross-layer complementary features with unchanged feature dimensions, reLU () represents an activation function, BN () represents a bulk regularization layer operation, dropout () represents a neuron removal layer operation, A represents a matrix of relationship coefficients, V represents a heterogeneous feature vector, W _a Representing a first matrix of learnable parameters, W _b Representing a second learnable parameter matrix, +.>Representing cross-layer complementary features of feature dimension compression.

4. The method of claim 3, wherein the constructing a graph-relationship complementation module, using the graph-relationship complementation module to fuse heterogeneous features of a plurality of different layers from a low layer to a high layer and based on a relationship, obtains a cross-layer complementation feature, comprises:

complementary semantic information is carried out on the low-level heterogeneous features through a graph relation complementary module, and the complementary heterogeneous features are obtained through the following formula:

wherein ,representing the complementary heterogeneous features, G () representing the graph relationship complementary module process, C () representing the concatenation operation, W representing the learnable parameter matrix, W ₁ Represents the 1 st learnable parameter matrix, V ₁ Represents the 1 st heterogeneous feature vector, W _k Represents the kth learnable parameter matrix, k represents the sequence number, V _k Representing a kth heterogeneous feature vector;

splicing the complementary heterogeneous characteristics into a characteristic vector, inputting the characteristic vector into the next layer, carrying out characteristic fusion with the heterogeneous characteristics of a higher layer, and obtaining the complementary heterogeneous characteristics in the next layer through the following formula:

wherein V 'represents the complementary heterogeneous characteristics in the next layer, C () represents the splicing operation, W' ₁ Representing the 1 st learnable parameter matrix in the next layer, V' ₁ Representing the 1 st heterogeneous eigenvector in the next layer, W' _u Representing the u-th learnable in the next layerU represents the sequence number, V' _u Representing the u-th heterogeneous eigenvector in the next layer, W' _u+1 Representing the u +1 th learnable parameter matrix in the next layer,representing the heterogeneous character after complementation.

5. The method of claim 4, wherein the performing heterostructure fusion between the local feature and the highest level complementary feature in the cross-layer complementary features by using the graph relationship complementary module to obtain a heterogeneous complementary feature comprises:

6. A vehicle re-identification device with complementary heterogeneous characteristic relationships, comprising:

step S100, an acquisition unit is configured to acquire a vehicle image, input the vehicle image into a convolutional neural network ResNet, and extract a plurality of heterogeneous features with different levels, wherein the heterogeneous features with different levels are heterogeneous features from low level to high level;

step 200, a fusion unit is configured to construct a graph relationship complementation module, and fusion is carried out from a low level to a high level on a plurality of heterogeneous features of different levels by using the graph relationship complementation module, so as to obtain a cross-layer complementation feature, wherein the cross-layer complementation feature is a multi-layer complementation feature from the low level to the high level;

step S300, a heterogeneous relation fusion unit is configured to extract local features of a vehicle image through progressive central pooling operation, and perform heterogeneous relation fusion on the local features and complementary features of the highest level in cross-layer complementary features by using a graph relation complementary module to obtain heterogeneous complementary features, wherein the local features comprise local area information;

step S400, a splicing unit is configured to splice the cross-layer complementary features and the heterogeneous complementary features to obtain vehicle image characterization features comprising multi-layer semantic information and multi-layer local area information, wherein in the training stage from step S100 to step S400, a triple loss function and a cross entropy loss function are adopted to monitor and optimize a network;

the progressive center pooling operation includes the steps of: