CN113128441A

CN113128441A - System and method for identifying vehicle weight by embedding structure of attribute and state guidance

Info

Publication number: CN113128441A
Application number: CN202110465670.0A
Authority: CN
Inventors: 郑爱华; 高亚飞; 李洪潮; 李成龙; 汤进; 罗斌
Original assignee: Anhui University
Current assignee: Anhui University
Priority date: 2021-04-28
Filing date: 2021-04-28
Publication date: 2021-07-16
Anticipated expiration: 2041-04-28
Also published as: CN113128441B

Abstract

An attribute and state guided structure embedded vehicle weight recognition system and method belongs to the technical field of computer vision, solves the problems of intra-class difference between the same vehicles and inter-class difference between different vehicles in vehicle weight recognition, and learns the characteristics with discrimination in vehicle weight recognition through attribute-based enhancement and state-based weakening, and comprises the following steps: a residual network module, an attribute-based enhancement and expansion module, a state-based attenuation and contraction module and a global structure embedding module; the method comprises two stages: a training stage and a testing stage; the characteristic spacing between different vehicles is increased by the enhancement and expansion of the attributes, and the characteristic spacing between the same vehicles is reduced by the weakening and contraction of the state, so that the characteristics of the vehicles are better learned by a network.

Description

System and method for identifying vehicle weight by embedding structure of attribute and state guidance

Technical Field

The invention belongs to the technical field of computer vision, and relates to an attribute and state guided structure embedded vehicle weight recognition system and method.

Background

The task of vehicle weight recognition is to match vehicle images in non-overlapping surveillance cameras. Because of its wide application in the fields of video surveillance, social security, smart cities and smart traffic, it is an active and challenging computer vision task and has attracted extensive attention. Despite recent breakthroughs in vehicle re-identification, it still faces two serious challenges.

As shown in fig. 1(a) and (b), there is a large intra-class difference between the same vehicle pictures in different states (e.g., different camera angles, vehicle viewing angles, and shooting times). As shown in fig. 1(b), (c) and (d), there are slight inter-class differences between different vehicles, and especially two vehicles have the same or similar attributes, such as color, model, manufacturer, etc. Therefore, solving the intra-class differences between the same vehicles and the inter-class differences between different vehicles has become an important task in vehicle weight recognition.

Existing solutions present different approaches to address the two challenges described above. Representative methods are mainly classified into four categories: 1) extracting global manual/depth features of the vehicle image through a specific metric learning method based on a global feature method; 2) the method based on the path generally adopts space-time information to remove unreasonable vehicles in the inference stage, and refines the retrieval result; 3) perspective-based methods, the objective of which is to handle perspective changes through metric learning and learn the multi-perspective features of the vehicle Re-ID; 4) the method based on local information enhancement enhances the inter-class difference in vehicle weight recognition by providing some stable and discriminative clues.

However, in the two major challenges of the vehicle weight recognition that there is a large intra-class difference between the same vehicles and a small inter-class difference between different vehicles, the existing solutions have the following disadvantages:

1) global feature-based approaches only consider the appearance of the vehicle image, and it is often difficult to capture intra-class similarities and inter-class differences;

2) the route-based method ignores appearance changes of the vehicle caused by space-time changes in the learning stage of vehicle characteristics;

3) perspective-based approaches, while significantly reducing intra-class variation, they ignore intrinsic state factors of the vehicle (such as camera perspective and capture time), ignoring the challenge of subtle inter-class variation;

4) the method based on local information enhancement needs local region extraction operation, however, the local region extraction model usually needs a large amount of labeled data, and is time-consuming and labor-consuming.

In the prior art, a document 'vehicle re-identification method research based on multiple attributes' (Xiamen university, Like) with a publication date of 2019 discloses a vehicle re-identification algorithm fusing vehicle angles, provides a vehicle re-identification algorithm fusing vehicle colors and vehicle types, and constructs a vehicle re-identification demonstration system based on a webpage platform; however, although this document considers the attribute information of the vehicle, it only uses the attribute information directly spliced to the vehicle feature as the auxiliary information, and thus obtains a good vehicle weight recognition performance, but does not consider optimizing the attribute information and the vehicle feature at the same time. The document ' research on a multi-view sparse fusion and multi-scale attention vehicle re-identification method ' (university of Anhui, Dong's King), published in 2020, designs a vehicle re-identification network based on a multi-scale attention mechanism, extracts a vehicle feature map by using a skeleton network, generates vehicle feature maps of other scales by using a quadratic interpolation method, and sends the vehicle feature maps to corresponding sub-networks containing spatial channel attention modules respectively. After training the sub-networks separately, the multi-scale attention feature maps are fused using the joining layer, and the entire network is fine-tuned. The network obtains more robust features by fusing multi-scale complementary information and mining local details of discriminability by using an attention mechanism; but this document does not take into account the problem of information of attributes and states.

Disclosure of Invention

The invention aims to design a structure embedded vehicle weight recognition method and system for attribute and state guidance, and solve the problems of intra-class difference between the same vehicles and inter-class difference between different vehicles in vehicle weight recognition.

The invention solves the technical problems through the following technical scheme:

an attribute and state guided structure embedded vehicle weight recognition system that learns a feature having discriminative power in vehicle weight recognition through attribute-based enhancement and state-based weakening, comprising: a residual network module, an attribute-based enhancement and expansion module, a state-based attenuation and contraction module and a global structure embedding module;

the residual error network module is used for extracting a visible light characteristic diagram, copying the characteristic diagram into three parts, inputting one part of the characteristic diagram into the attribute-based enhancement and expansion module, inputting the other part of the characteristic diagram into the state-based weakening and contraction module, and obtaining category scores through global average pooling and full connection operation;

the attribute-based enhancement and expansion module is used for enhancing the recognition of vehicle features through attribute information related to identities and increasing feature distances among different vehicles by setting an attribute-based expansion loss function;

the state-based weakening and shrinking module is used for weakening state information interfering identification and reducing the inter-class feature distance through a state-based shrinking loss function;

the global structure embedding module is used for obtaining a final vehicle feature vector based on the enhancement of the attribute and the weakening operation based on the state, calculating a category score through the final feature vector, transmitting the category score to a search library and judging whether the vehicle appears in other cameras.

The technical scheme of the invention provides a method for identifying the weight of the vehicle by embedding an attribute and state-guided structure into the vehicle, which increases the characteristic distance between different vehicles by using the enhancement and expansion of the attribute, and reduces the characteristic distance between the same vehicles by using the weakening and contraction of the state, thereby better helping the network to learn the characteristics of the vehicle.

A method for embedding a structure for guiding the attribute and the state into a vehicle weight recognition system is characterized by comprising two stages: a training stage and a testing stage;

the training phase comprises the following steps:

step 1): acquiring a visible light image of a vehicle, and inputting the visible light image into the system;

step 2): extracting a visible light characteristic diagram through a residual error network, copying the characteristic diagram into three parts, inputting one part of the characteristic diagram into an enhancement and expansion module based on attributes, inputting the other part of the characteristic diagram into a weakening and contraction module based on states, and obtaining category scores through global average pooling and full connection operation; calculating the category constraint loss values of the predicted category and the true value by using a cross entropy function, sharing the category loss values of three constraints of attributes, states and identities, and adding the three according to the proportion to form a multi-label category loss function to predict the categories of the three;

step 3): the feature distribution of the attributes is expanded through the expansion loss function of the attributes, so that the difference of the attributes among the classes is increased; narrowing the distribution of states by a shrinkage loss function of the states, thereby reducing the difference of the states in the class;

step 4): using a global structure embedding loss function, training all samples through heterogeneous sample separation and homogeneous sample disturbance, so that the vehicle characteristic distance of the heterogeneous sample is increased along with the expansion of the attribute until the upper boundary is reached, and the vehicle characteristic distance of the homogeneous sample is disturbed along with the contraction of the state until the lower boundary is smaller than the lower boundary;

step 5): the final loss of the model is defined as the sum of multi-label classification loss, attribute-based expansion loss, state-based contraction loss and global structure embedding loss; in each training, the attribute and state-guided structure is embedded into the system to reversely transmit a final loss value, the loss value is reduced as a target, model parameters of the residual block and parameters of the attribute enhancing module and the state weakening module are updated by a self-adaptive momentum random gradient descent method, after multiple iterations, the loss value of the system is not reduced any more, all the parameters are optimal, and the training of the system is finished;

the testing stage comprises the following steps:

step a): loading a visible light image of a vehicle and inputting the visible light image into a trained system;

step b): extracting a visible light characteristic diagram through a trained residual error network;

step c): respectively passing the visible light characteristic diagram through a trained attribute-based enhancing and expanding module and a state-based weakening and contracting module, and obtaining an attribute enhanced characteristic tensor and a state weakened characteristic tensor;

step d): obtaining a final vehicle feature vector through attribute-based enhancement and state-based weakening operation;

step e): and calculating a category score through the final feature vector, transmitting the category score to a search library, and judging whether the vehicle appears in other cameras.

As a further improvement of the technical solution of the present invention, the method for increasing the difference between the attributes in the classes by expanding the feature distribution of the attributes through the attribute expansion loss function in step 3) comprises:

inputting a vehicle image I with an image size of 256 × 256, and acquiring a feature map T of the vehicle image through a 50-layer residual network, where T is ResNet50 (I);

copying three parts of a feature map T of the vehicle image, wherein the first part is transmitted into an attribute-based enhancement and expansion module;

then, inputting the feature map T into 1 × 1 volume blocks corresponding to different attributes to obtain a feature map related to the attributes:

and (3) carrying out batch regularization BN operation and modified linear unit ReLU operation on the feature graph related to the attribute:

carrying out global average pooling operation on the feature graph related to the attribute to obtain the feature related to the attribute:

introducing attribute related tags

Adding a full connection layer FC, so that the characteristics related to the attributes are constrained by the labels in the end-to-end training, and the constraint of the labels related to the attributes is formulated as:

then, the average value of the attributes of the current vehicle relative to all the vehicles is calculated

Is distributed over a plurality of features, | |)₂Represents l₂Norm square root of sum of squared features:

designing an extended loss function based on attributes to force the network to continuously extend the feature distribution related to the attributes in end-to-end training, wherein the extended formula is as follows:

wherein i represents the type of attribute, and in this embodiment, the following are respectively: a color attribute of the vehicle, a type attribute of the vehicle, a manufacturer attribute of the vehicle;

the system is trained to perform back transmission loss function towards the descending direction of the expansion loss, so that the attribute feature distribution is continuously expanded, and the feature distances of different vehicles are driven to be continuously far away;

using sigmoid functions to correlate feature maps with attributes

Normalizing to 0-1, and performing element-level product operation with the vehicle characteristic diagram T to obtain an enhanced diagram T related to the attributes^eThe formula is as follows:

in the iterative process of the system, the expansion of the attribute is continuously executed, and the feature diagram related to the attribute is utilized

The calculation formula of the enhanced vehicle characteristic diagram T' is as follows:

T′＝T+T^e

wherein, T^eAnd (3) showing an enhancement map related to the attribute, T showing a characteristic map of the vehicle image, and T' showing a characteristic map of vehicle enhancement.

As a further improvement of the technical solution of the present invention, the method for reducing the difference of the intra-class states by narrowing down the distribution of the states through the shrinkage loss function of the states in step 3) comprises:

copying three parts of a characteristic diagram T of the vehicle image, wherein the first part is transmitted into a weakening and shrinking module based on the state;

then, inputting the feature map T into the 1 × 1 volume blocks corresponding to different states to obtain a feature map independent of the states:

performing batch regularization BN operation and modified linear unit ReLU operation on state-independent feature maps

And carrying out global average pooling operation on the feature graph irrelevant to the state to obtain features irrelevant to the state:

introducing state independent tags

Adding a full connectivity layer (FC) so that the state-independent features are constrained by labels in the end-to-end training, the state-independent label constraint being formulated as:

then calculating the different state mean value of the current vehicle relative to all vehicles

Is distributed over a plurality of features, | |)₂Represents l₂The norm calculates the square root of the sum of squares of the features;

designing a contraction loss function based on the state to force the network to make different state feature distributions contract continuously in end-to-end training, wherein the contraction formula of the state is as follows:

the training of the network can carry out back transmission loss function towards the descending direction of shrinkage loss, so that the state characteristic distribution is continuously shrunk, and the characteristic distance of the same vehicle is driven to be continuously shrunk;

using sigmoid functions to make state-independent profiles

Normalizing to 0-1, and multiplying element level with the vehicle characteristic diagram T to obtain a weakening diagram T independent of state^wThe formula is as follows:

in the iterative process of the system, the contraction of the state is continuously executed, and the characteristic diagram which is irrelevant to the state is utilized

Weakening the characteristic diagram T 'of vehicle reinforcement to obtain a final vehicle characteristic diagram T', wherein the calculation formula is as follows:

T″＝T′-T^w

wherein, T 'represents the final vehicle characteristic diagram, and T' represents the enhanced characteristic diagram of the vehicle.

As a further improvement of the technical scheme of the present invention, the method for training all samples by heterogeneous sample separation and homogeneous samples in step 4) comprises:

for the input image I, obtaining a final vehicle feature map T 'through an attribute-based enhancement and expansion module and a state-based weakening and contraction module, and performing global average pooling operation on the final vehicle feature map T' to obtain a feature vector of the image I:

f＝GAP(T″)

for any one in the training setA pair of heterogeneous samples (I)_i，I_j) Corresponding vehicle characteristics (f) can be obtained_i，f_j) Corresponding attribute features

The designed attribute-driven heterogeneous sample separation constraint is as follows:

wherein y is_ij0 means that this is a heterogeneous sample pair,

represents a heterogeneous sample pair (I)_i，I_j) Of the attribute features of (1), d_ijRepresenting the Euclidean distance of the vehicle characteristics of the heterogeneous sample pairs, wherein the heterogeneous sample separation driven by the attributes enables the characteristics of the heterogeneous samples to be learned to have a weight related to the attributes in the metric learning stage, and the vehicle characteristic distance of the heterogeneous samples is increased along with the expansion of the attributes until an upper boundary 1 is reached;

for any one homogeneous sample pair (I) in the training set_i，I_j) Corresponding vehicle characteristics (f) can be obtained_i，f_j) Corresponding attribute features

wherein y is_ijMeaning that this is a heterogeneous sample pair,

represents a heterogeneous sample pair (I)_i，I_j) Is characteristic of the state of (a), d_ijRepresenting pairs of homogeneous samplesThe characteristic Euclidean distance of the vehicle and the similar samples driven by the states are close together, so that in the measurement learning stage, the characteristic learning of the similar samples has a weight related to the states; so that the vehicle characteristic distance of the same and different samples is disturbed along with the contraction of the state until the distance is less than the lower boundary 0.3.

As a further improvement of the technical solution of the present invention, the function of the global structure embedding loss in step 5) is:

the training of the system carries out back propagation loss function towards the descending direction of the embedding loss of the global structure, so that the vehicle characteristic distance of the heterogeneous sample is increased along with the expansion of the attribute until the distance reaches an upper boundary 1, and the vehicle characteristic distance of the homogeneous sample is disturbed along with the contraction of the state until the distance is less than a lower boundary 0.3; and finally, reversely transmitting the multi-label classification loss value, updating and iterating 120 times to obtain the optimal network parameter based on the expansion loss value of the attribute, the weakening loss value of the state and the global structured embedding loss value.

The invention has the advantages that:

(1) the technical scheme of the invention increases the feature distance between different vehicles by enhancing and expanding the attributes, and reduces the feature distance between the same vehicles by weakening and contracting the state, thereby better helping the network to learn the features of the vehicles.

(2) According to the technical scheme, when the vehicle re-identification task is executed in the testing stage, the attribute information of the vehicle is directly acquired by using the attribute label in the training stage, most of the vehicles without the same attribute can be filtered out in the re-identification process by enhancing the attribute information of the vehicle, and the vehicles with the same identity under the same attribute can be better excavated in the re-identification process of the vehicle by expanding the attribute information of the vehicle; meanwhile, the camera environment, the visual angle information and the time information of the tested vehicle are judged by utilizing the state label in the training stage, the vehicles with the same identity can be more easily distinguished by the re-recognition model by weakening the state information of the vehicle, and the vehicles with the same identity can be better retrieved when the vehicles are re-recognized by contracting the state information of the vehicles.

(3) The technical scheme of the invention starts from a single vehicle test image, associates the attribute information and the state information of the vehicle, is easier to find the vehicle with the same identity in a huge vehicle search library, and provides a simple and effective strategy for deploying the rapid re-identification system for intelligent transportation.

Drawings

FIG. 1 is a block diagram of the system of the present invention;

FIG. 2 is a block diagram of the system of the present invention;

FIG. 3 is a block diagram of the global structure embedded module of the present invention;

FIG. 4 is a flow chart of a training phase of the method of the present invention;

FIG. 5 is a flow chart of a testing phase of the method of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The technical scheme of the invention is further described by combining the drawings and the specific embodiments in the specification:

as shown in fig. 2, an attribute and state-guided structure embedded in a vehicle weight recognition system for learning a feature having discriminative power in vehicle weight recognition by attribute-based enhancement and state-based weakening includes: a residual network module, an attribute-based enhancement and expansion module, a state-based attenuation and contraction module and a global structure embedding module;

as shown in fig. 3, the global structure embedding module is configured to obtain a final vehicle feature vector based on the attribute enhancement and the state-based weakening operation, calculate a category score through the final feature vector, transmit the category score to the search library, and determine whether the vehicle appears in another camera.

The calculating the category score through the final feature vector specifically includes: and calculating the category constraint loss values of the predicted category and the true value by using a cross entropy function, sharing the category loss values of three constraints of attributes, states and identities, and adding the three according to a proportion to form a multi-label category loss function to predict the categories of the three.

The characteristic spacing between different vehicles is increased by the enhancement and expansion of the attributes, and the characteristic spacing between the same vehicles is reduced by the weakening and contraction of the state, so that the characteristics of the vehicles are better learned by a network.

As shown in fig. 4-5, a structure-embedded vehicle weight identification method for attribute and status guidance includes two stages: a training phase and a testing phase.

1. The training phase comprises the following steps

Step 1: a visible light image of a vehicle is acquired and transmitted as input to the network.

Step 2: extracting a visible light characteristic diagram through a residual error network commonly used in the field of vehicle weight identification, copying the characteristic diagram into three parts, inputting one part of the characteristic diagram into an enhancement and expansion module based on attributes, inputting the other part of the characteristic diagram into a weakening and contraction module based on states, and obtaining category scores through global average pooling and full connection operation.

And calculating the category constraint loss values of the predicted category and the true value by using a cross entropy function, sharing the category loss values of three constraints of attributes, states and identities, and adding the three according to a proportion to form a multi-label category loss function to predict the categories of the three.

And step 3: in addition to attribute-based enhancement and state-based weakening, attribute-based expansion loss and state-based contraction loss are further proposed. Attribute-based expansion loss is to expand the feature distribution of attributes, thereby increasing the differences in attributes between classes. State-based shrinkage loss is to narrow the distribution of states and thereby reduce the variance of states within a class.

And 4, step 4: it is proposed that the use of state and attribute guided global structure embedding loss allows negative samples with less attribute difference and positive samples with larger state to obtain larger gradient amplitude.

And 5: the final penalty of the model is defined as the sum of the multi-label classification penalty, the attribute-based expansion penalty, the state-based contraction penalty, and the global embedding penalty. In each training, the structure embedded network guided by the attributes and the states reversely transmits a final loss value, and updates model parameters of the residual block, the attribute enhancing module and the state weakening module parameters by using a self-adaptive momentum random gradient descent method with the loss value reduced as a target. After a certain number of iterations, the loss value of the network is not reduced any more, and the training of the network is finished when all the parameters reach the optimal values.

The specific training mode comprises the following steps:

(1) the network batch size was set to 64 in each training, i.e., the network read 64 images at a time. Each image was resized to 256 × 256 pixels and the 10 pixels filling the periphery of the image were 0. It is then randomly cropped to a 256 × 256 rectangular image. Each image is flipped with a probability level of 0.5 and the R, G, B channels of the image are normalized with a mean of 0.485, 0.456, 0.406 and a standard deviation of 0.229, 0.224, 0.225, respectively, and each image is decoded as 32-bit floating-point raw data, i.e., floating-point numbers with pixel values between [0, 1] of the image.

(2) For a vehicle image input into the network, firstly, a corresponding feature map is extracted through a trained 50-layer residual error network. The feature map is copied into three parts, one part is input into an enhancement and expansion module based on attributes, the other part is input into a weakening and contraction module based on states, and the other part obtains category scores through global average pooling and full connection operation. In one aspect, multi-label classification penalties are obtained by combining classification penalties for attributes, states, and identities to ensure that the prediction class from each module is consistent with a true value. At the same time, differences in intra-class states are reduced by increasing the differences in attributes between classes based on extended penalties of attributes and by reducing the differences in states based on contracted penalties of states. On the other hand, larger gradient amplitudes are obtained for negative samples with smaller attribute differences and positive samples with larger states using attribute and state-guided global structure embedding loss. The final penalty value for the attribute-and state-directed structure embedding network model is the sum of the multi-label classification penalty, the attribute-based expansion penalty, the state-based contraction penalty, and the attribute-and state-directed global structure embedding penalty.

(3) The loss value is transmitted back and the parameters are updated. And repeating the iteration until the network converges. And transmitting the loss obtained by the last step back to the network, and updating model parameters by using an optimizer, wherein the optimizer adopts a random gradient descent method, the learning rate is dynamically set along with the network training times, the initial learning rate is set to be 0.00035, and the initial learning rate is reduced by 10 times in the 40 th iteration and the 70 th iteration respectively. There were a total of 120 training sessions.

2. The testing phase comprises the following steps

Step 1: a visible light image of the vehicle is loaded. The network that has been trained is input.

Step 2: and extracting a visible light characteristic diagram through a trained residual error network.

And step 3: the visible light characteristic diagram is respectively subjected to a trained attribute-based enhancing and expanding module and a state-based weakening and contracting module, and an attribute enhanced characteristic tensor and a state weakened characteristic tensor are obtained.

And 4, step 4: and obtaining a final vehicle feature vector through attribute-based enhancement and state-based weakening operation.

And 5: and calculating a category score through the final feature vector, transmitting the category score to a search library, and judging whether the vehicle appears in other cameras.

The specific test mode comprises the following steps:

1) a vehicle image is input. Each image was resized to 256 x 256 pixels into the trained network.

2) And the network calculates to obtain the category scores after the attribute enhancement and the state reduction.

3) And after the category score of the vehicle is obtained, the vehicle characteristics are put into an image retrieval library for comparison, and whether the vehicle appears in other cameras or not is judged.

3. Specific workflow of extension phase

(1) A vehicle image I with an image size of 256 × 256 is input, and a feature map T of the vehicle image T, which is ResNet50(I), can be obtained through a 50-layer residual network.

(2) The feature map T of the vehicle image is copied in triplicate, with the first copy passing into the attribute-based enhancement and expansion module.

(3) Then, inputting the feature map T into a 1 × 1 volume block related to the color attribute to obtain a feature map related to the color attribute:

(4) performing batch regularization BN operation and modified linear unit ReLU operation on feature graph related to color attribute

(5) Carrying out global average pooling operation on the feature map related to the color attribute to obtain the features related to the color attribute

(6) Introducing color attribute dependent labels

Adding a full connection layer (FC) so that the color attribute related features are constrained by the label in end-to-end training, wherein the label constraint of the color attribute is formulated as:

(7) label constraint of multiple (M) attributes is performed simultaneously, and the formula is as follows:

(8) then, the color attribute mean value of the current vehicle relative to all vehicles is calculated

Is distributed over a plurality of features, | |)₂Represents l₂The norm takes the square root of the sum of the squares of the features.

(9) An extended loss function based on attributes is designed to force the network to continuously extend the attribute-related feature distribution in end-to-end training, and the extended formula of the attributes is as follows:

(10) simultaneously extending various attributes (M), wherein the formula is as follows:

the training of the network can carry out back transmission loss function towards the descending direction of the expansion loss, so that the distribution of the correlation characteristics is continuously expanded (the black and gray attribute characteristics are far away), the characteristic distances of different vehicles are driven to be continuously far away, and the images of the different vehicles are more easily distinguished by the network.

4. Detailed workflow of enhancement phase

Since the expansion of the color properties is performed continuously in iterations of the network. Will obtain a more discriminative color feature map

It is desirable to utilize a more discriminating color profile

Enhancing the vehicle characteristic map T;

(1) and normalizing the color characteristic diagram into a product operation of 0-1 and the vehicle characteristic diagram at an element level by using an S-type function, wherein the formula is as follows:

(2) the enhancement map of the plurality of attributes (M) may be formulated as:

(3) and (3) performing element level accumulation on the enhanced graph with various attributes and the original vehicle characteristic graph:

T′＝T+T^e

5. specific workflow of the contraction phase

(3) Then inputting the feature map T into a 1 × 1 volume block related to the camera state to obtain a feature map related to the camera state:

(5) Carrying out global average pooling operation on the feature map related to the color attribute to obtain the features related to the camera state

(6) Introducing camera state dependent tags

Adding a full connection layer (FC) so that the camera state related features are constrained by the label in the end-to-end training, wherein the label constraint of the camera state is formulated as:

(7) label constraint of multiple (N) states is performed simultaneously, the formula is:

(8) then calculating the mean value of the camera states of the current vehicle relative to all vehicles

(9) Designing a contraction loss function based on a state to force the network to make the state-related feature distribution continuously contract in end-to-end training, wherein a contraction formula of the state is as follows:

(10) the contraction of the various states (N) is performed simultaneously, and the formula is as follows:

the training of the network carries out a back transmission loss function towards the descending direction of contraction loss, so that the distribution of state-related features (camera 139 and camera 79) is continuously contracted, the feature distance of the same vehicle is driven to be continuously contracted, and images of the same vehicle in different states are more easily distinguished by the network.

6. Specific workflow of the weakening phase

Since the shrinking of the camera state is performed continuously in iterations of the network. Will obtain the camera feature map irrelevant to the vehicle discrimination

It is therefore desirable to utilize camera profiles that are irrelevant for vehicle identification

Weakening vehicle characteristic diagram T

(2) the de-emphasis graph for multiple states (N) can be formulated as:

(3) performing element level cumulative difference on the enhanced graphs with various attributes and the enhanced vehicle characteristic graph to obtain a final vehicle characteristic graph:

T″＝T′-T^w

7. attribute-driven heterogeneous sample separation

(1) For the input image I, a final vehicle feature map T 'is obtained through an attribute-based enhancement and expansion module and a state-based weakening and contraction module, a global average pooling operation is carried out on the vehicle feature map T', and a feature vector of the image I can be obtained:

f＝GAP(T″)

(2) for any one heterogeneous sample pair (I) in the training set_i，I_j) Corresponding vehicle characteristics (f) can be obtained_i，f_j) Corresponding attribute features

wherein y is_ij0 means that this is a heterogeneous sample pair,

represents a heterogeneous sample pair (I)_i，I_j) Of the attribute features of (1), d_ijVehicle characteristic Euclidean distance representing heterogeneous sample pair and attribute-driven heterogeneous sampleThe separation is such that in the metric learning phase, the features of the heterogeneous samples are learned with a weight associated with the attribute. Such that the vehicle feature distance of the heterogeneous sample increases as the attribute expands until the upper boundary 1 is reached.

8. State-driven homogeneous sample closedown

f＝GAP(T″)

(2) for any one homogeneous sample pair (I) in the training set_i，I_j) Corresponding vehicle characteristics (f) can be obtained_i，f_j) Corresponding attribute features

wherein y is_ijMeaning that this is a heterogeneous sample pair,

represents a heterogeneous sample pair (I)_i，I_j) Is characteristic of the state of (a), d_ijAnd representing the Euclidean distance of the vehicle characteristics of the similar sample pairs, and enabling the similar samples driven by the states to be close together, so that the characteristic learning of the similar samples has a weight related to the states in the metric learning stage. So that the vehicle characteristic distance of the same and different samples is disturbed along with the contraction of the state until the distance is less than the lower boundary 0.3.

9. Global structure embedding

(1) Since heterogeneous sample separation and homogeneous sample perturbation are operations on all samples of the training set, this whole constraint is named as global structure embedding module, and the global structure embedding loss function is:

the training of the network carries out back propagation loss function towards the descending direction of the embedding loss of the global structure, so that the vehicle characteristic distance of the heterogeneous sample is increased along with the expansion of the attribute until the upper boundary 1 is reached, and the vehicle characteristic distance of the homogeneous sample is disturbed along with the contraction of the state until the lower boundary is less than 0.3. Images of different vehicles are easier to distinguish by the network, and images of the same vehicle in different states are easier to distinguish by the network.

(2) And finally, reversely transmitting the multi-label classification loss value, updating and iterating 120 times to obtain the optimal network parameter based on the expansion loss value of the attribute, the weakening loss value of the state and the global structured embedding loss value.

In the embodiment, feature learning and metric learning are trained by using an end-to-end deep neural network, and attribute information (color, vehicle type and manufacturer) and state information (the number of a camera usually means the arrangement place of the camera and the number of how many shooting places there will be) and the embedding of the viewpoints of a vehicle (five viewpoints of the head, the tail, the side, the front side and the rear side) and shooting time (0-23, including 24 hours of a day) are considered in both stages of the feature learning and the metric learning.

The attribute information of the vehicle is optimized, so that the characteristic distances of different vehicles are far away due to the fact that different attribute information is continuously expanded in end-to-end learning, and the fact that the characteristic distances of different vehicles are far away is more meaningful compared with direct characteristic splicing, and better vehicle re-identification performance can be obtained.

In addition to this, the status information of the vehicle is also taken into account, which is also original, and the information of the vehicle is divided into two groups, one group being considered helpful for the vehicle weight identification (attribute of the vehicle) and one group being considered to have an influence on the vehicle discrimination (status of the vehicle).

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. An attribute and state guided structure embedded vehicle weight recognition system characterized by learning features having discriminative power in vehicle weight recognition by attribute-based enhancement and state-based weakening, comprising: a residual network module, an attribute-based enhancement and expansion module, a state-based attenuation and contraction module and a global structure embedding module;

2. A method for application of the attribute and state guided structure embedded vehicle weight recognition system of claim 1, characterized by two stages: a training stage and a testing stage;

the training phase comprises the following steps:

the testing stage comprises the following steps:

3. The method for embedding an attribute and state guided structure in a vehicle re-identification system according to claim 2, wherein the method for expanding the feature distribution of the attribute by the expansion loss function of the attribute in the step 3) so as to increase the difference of the attribute between the classes is as follows:

introducing attribute related tags

using sigmoid functions to correlate feature maps with attributes

T′＝T+T^e

4. The method for embedding attribute and state guided structure in vehicle re-identification system according to claim 2, wherein the method for narrowing the distribution of the states by the shrinkage loss function of the states in step 3) so as to reduce the difference of the states in the class is:

T_j ^st＝ReLU(BN(T_j ^st))

f_j ^st＝GAP(T_j ^st)

introducing state independent tags

using sigmoid functions to make the characteristic diagram T independent of state_j ^stNormalizing to 0-1, and multiplying element level with the vehicle characteristic diagram T to obtain a weakening diagram T independent of state^wThe formula is as follows:

in the iterative process of the system, the contraction of the state is continuously executed, and the characteristic diagram T which is irrelevant to the state is utilized_j ^stWeakening the characteristic diagram T 'of vehicle reinforcement to obtain a final vehicle characteristic diagram T', wherein the calculation formula is as follows:

T″＝T′-T^w

5. The method for embedding the attribute-and-state-guided structure in the vehicle re-identification system according to claim 2, wherein the method for training all the samples by heterogeneous sample separation and homogeneous sample perturbation in the step 4) comprises the following steps:

f＝GAP(T″)

for any one heterogeneous sample pair (I) in the training set_i，I_j) Corresponding vehicle characteristics (f) can be obtained_i，f_j) Corresponding attribute features

wherein y is_ij0 means that this is a heterogeneous sample pair,

wherein y is_ijMeaning that this is a heterogeneous sample pair,

represents a heterogeneous sample pair (I)_i，I_j) Is characteristic of the state of (a), d_ijRepresenting the characteristic Euclidean distance of the vehicle of the same type sample pair, the state-driven same type samples are closed to makeIn the metric learning stage, the feature learning of the similar sample has a weight related to the state; so that the vehicle characteristic distance of the same and different samples is disturbed along with the contraction of the state until the distance is less than the lower boundary 0.3.

6. The method for attribute-and-state-guided structure-embedded vehicle re-identification system according to claim 2, wherein the function of global structure embedding loss in step 5) is: