WO2021174863A1

WO2021174863A1 - Method for training vehicle model-year recognition model and method for recognizing vehicle model year

Info

Publication number: WO2021174863A1
Application number: PCT/CN2020/121514
Authority: WO
Inventors: 叶丹丹; 晋兆龙; 邹文艺
Original assignee: 苏州科达科技股份有限公司
Priority date: 2020-03-05
Filing date: 2020-10-16
Publication date: 2021-09-10
Also published as: CN111340026B; CN111340026A

Abstract

A method for training a vehicle model-year recognition model and a method for recognizing a vehicle model year, relating to the technical field of vehicle recognition. The training method comprises: acquiring a vehicle sample image having labeling information (S11); inputting the vehicle sample image into a feature extraction module, so as to obtain at least two groups of features (S12); on the basis of the at least two groups of features, obtaining a region of interest corresponding to each group of features and a score value thereof (S13); fusing the regions of interest and the vehicle sample image and then inputting same into a classification module, so as to obtain whole image classification features, features of the regions of interest, and features of the whole image fused with the regions of interest (S14); calculating loss function values according to the described three kinds of features and the score value of each region of interest (S15); and on the basis of the labeling information and the loss function values, updating parameters of the feature extraction module and the classification module, so as to optimize a vehicle model year recognition model (S16). The training method improves the accuracy of a recognition model, and provides a foundation for subsequent application in a recognition method.

Description

Training method of vehicle model recognition model and vehicle model recognition method

This application requires the prior patent application of the State Intellectual Property Office of China with the application number of CN202010137345.7 and the filing date of 20200305 as priority, and the content of the prior patent application text is fully incorporated into this patent application by reference.

Technical field

The present invention relates to the technical field of vehicle recognition, in particular to a training method of a vehicle year model recognition model and a vehicle year model recognition method.

Background technique

Vehicles have become an indispensable means of transportation in modern life. As an important carrier and alternative behavior tool, the monitoring and recognition of vehicle information has also become an important issue for intelligent transportation and safe cities. Intelligent analysis of vehicle data, on the one hand, can facilitate traffic management, such as license plate recognition at parking lot bayonet; on the other hand, it can effectively assist in traffic control, such as the capture and information recording of illegal vehicles, licensed vehicles, and traffic accidents and crimes. Vehicle tracking, etc.

Convolutional Neural Networks (CNN) has been widely used in image pattern recognition, including vehicle attribute recognition, because it is not affected by target translation, zoom, tilt, and other deformations to a certain degree. Many experts and scholars also use this technology. Many books have been published.

Among them, in order to overcome the problem of low learning efficiency and ineffective improvement of accuracy due to the increase in the depth of Convolutional Neural Networks (CNN), Deep Residual Networks (ResNet for short) were introduced in 2015. It was proposed and quickly applied to the field of vehicle model recognition. Generally speaking, during recognition, ResNet will be used twice. The first time ResNet is used to input the sample image to get the whole image feature and regional feature, and the second time ResNet is used to input the regional feature to get the region of interest. Features, and finally classify the features of the region of interest to obtain the classification of the sample image. The inventor found in the research process of the deep residual network that using ResNet twice to obtain the features of the region of interest will increase the training time and the amount of calculation, and only use the features of the region of interest to obtain the classification of the sample image, ignoring Features at other levels cause inaccurate recognition.

Summary of the invention

In view of this, the embodiments of the present invention provide a training method for a vehicle year recognition model and a vehicle year recognition method to solve the problem of insufficient recognition.

According to the first aspect, an embodiment of the present invention provides a training method for a vehicle model recognition model, including:

Acquiring a sample vehicle image with label information; wherein the label information includes the vehicle brand and year model in the vehicle sample image;

Input the vehicle sample image into a feature extraction module to obtain at least two sets of features of the vehicle sample image;

Based on the at least two sets of features, obtain the region of interest corresponding to each set of features and its score value;

The fusion of the region of interest and the vehicle sample image is input into the classification module to obtain the entire image classification feature of the sample image, the feature of the region of interest, and the fusion of the entire image and the region of interest Features; wherein, the vehicle year model recognition model includes the feature extraction module and the classification module;

Calculating a loss function value according to the entire image classification feature of the sample image, the feature of the region of interest, the feature after the entire image is fused with the region of interest, and the score value of each region of interest;

Based on the annotation information of the vehicle sample image and the loss function value, the parameters of the feature extraction module and the classification module are updated to optimize the vehicle year recognition model.

In the method for training a vehicle model recognition model provided by the embodiment of the present invention, at least two sets of features of the vehicle sample image are extracted through a feature extraction module, and the regions of interest corresponding to the at least two sets of features and their score values are obtained; The fusion of the region of interest and the vehicle sample image is input into the classification module to obtain the classification feature of the entire image of the sample image, the feature of the region of interest, and the fusion of the entire image and the region of interest Feature; and based on the label information of the vehicle sample image and the loss function value, the feature extraction module and the parameters of the classification module are updated to optimize the vehicle year recognition model. The method extracts at least two sets of features, and inputs the region of interest and the vehicle sample image into the classification module, and optimizes the recognition model according to the loss function, which not only improves the sense of hierarchy of feature extraction, but also The corresponding loss function is used to update the parameters of the recognition model, thereby improving the accuracy of recognition.

With reference to the first aspect, in the first implementation manner of the first aspect, the obtaining the region of interest corresponding to each set of features and its score value based on the at least two sets of features includes:

Use each set of features to generate multiple candidate regions corresponding to each set of features;

Based on the multiple candidate regions, a region of interest corresponding to each set of features and its score value are generated.

The training method of the vehicle model recognition model provided by the embodiment of the present invention generates a plurality of candidate regions corresponding to each set of features by using each set of features, and generates a sense corresponding to each set of features based on the multiple candidate regions. The region of interest and its score value can accurately screen out the region of interest corresponding to each set of features, which provides a basis for subsequent training.

With reference to the first implementation manner of the first aspect, in the second implementation manner of the first aspect, the generating a region of interest corresponding to each set of features and a score value based on the multiple candidate regions includes:

Calculating the score value of each candidate region;

It is determined that the candidate region with the highest score value is the region of interest.

The training method of the vehicle model recognition model provided by the embodiment of the present invention determines that the region of interest is the candidate region with the highest score value by calculating the score value of each candidate region, and further improves the feeling. The accuracy of the region of interest provides a basis for subsequent training.

With reference to the first aspect, in a third implementation manner of the first aspect, the fusion of the region of interest and the vehicle sample image is input into a classification module to obtain the entire image classification feature of the sample image, the The features of the region of interest and the features of the entire image fused with the region of interest include:

The fusion of the region of interest and the vehicle sample image is input into a classification module; wherein the output of the classification module is the annual model classification of the vehicle sample image;

Extracting the output of the last pooling layer of the classification module to obtain the overall characteristics of the vehicle sample image;

The overall feature of the vehicle sample image is segmented to obtain the overall image classification feature of the sample image and the feature of the region of interest.

According to the training method of the vehicle model recognition model provided by the embodiment of the present invention, the region of interest and the vehicle sample image are fused and then input into a classification module to obtain the model year classification of the vehicle sample image, wherein the feeling The region of interest is the candidate region with the highest score for each group of features. The vehicle sample image is fused with the candidate region with the highest score for each group of features and then input into the classification module, which can improve the accuracy of the classification module and shorten the classification time.

With reference to the first aspect, in a fourth implementation manner of the first aspect, the classification feature of the entire image according to the sample image, the feature of the region of interest, the feature of the entire image fused with the region of interest, and each Calculating the loss function value for each of the score values of the regions of interest, including:

Fusing the entire image feature of the sample image with the feature of the region of interest to obtain a fusion feature;

Calculating the value of the fusion loss function based on the fusion feature;

Use the features of the region of interest to calculate the component loss function value;

Calculate the value of the overall image loss function by using the entire image classification feature of the sample image;

Calculating the level loss function value corresponding to each region of interest by using the feature of each region of interest and its corresponding score value;

Calculate the loss function value based on each of the level loss function value, the component loss function value, the fusion loss function value, and the entire graph loss function value.

With reference to the fourth implementation manner of the first aspect, in the fifth implementation manner of the first aspect, the loss function value is calculated using the following formula:

Among them, Loss ₁ is the component loss function, Loss ₂ is the fusion loss function, Loss ₃ is the entire graph loss function,

as well as

Respectively are the level loss functions corresponding to the region of interest.

The training method of the vehicle model recognition model provided by the embodiment of the present invention calculates the loss function and the grade loss function by using the fusion feature, all the features of the region of interest, and the classification of the sample image, and calculates the loss function according to a certain amount. The sum of the weights can accurately reflect the difference between the classification of the vehicle year recognition model and the actual classification. Through the gap, the parameters of the vehicle year recognition model can be further optimized, which further improves the classification accuracy of the model. Spend.

According to the second aspect, an embodiment of the present invention provides a method for identifying the year of a vehicle, including:

Obtain an image of the target vehicle;

The target vehicle image is input into the vehicle year recognition model to obtain the year of the target vehicle image; wherein the vehicle year recognition model is the vehicle year recognition model according to any one of claims 1-6. The training method of this model is obtained.

According to the method for recognizing the vehicle year model provided by the embodiment of the present invention, the year model of the target vehicle image is obtained by inputting the target vehicle image into the vehicle year model recognition model for classification, wherein the vehicle year model recognition model is The use of at least two sets of features of the sample image and the sample image for joint training and parameter optimization with the loss function value can ensure the accuracy of the year recognition of the target vehicle image.

According to the third aspect, an embodiment of the present invention provides a training device for a vehicle model recognition model, including:

The first acquisition module is configured to acquire a sample vehicle image with label information; wherein the label information includes the vehicle brand and year model in the vehicle sample image;

The first feature extraction module is configured to input the vehicle sample image into the feature extraction module to obtain at least two sets of features of the vehicle sample image;

The scoring module is configured to obtain the region of interest corresponding to each group of features and its score value based on the at least two groups of features;

The second feature extraction module is used to combine

The calculation module is used to calculate the entire image classification feature of the sample image, the feature of the region of interest, the feature after the entire image is fused with the region of interest, and the score value of each region of interest. Loss function value;

The parameter optimization module is configured to update the parameters of the feature extraction module and the classification module based on the annotation information of the vehicle sample image and the loss function value, so as to optimize the vehicle year recognition model.

According to the training device for the vehicle model recognition model provided by the embodiment of the present invention, at least two sets of features of the vehicle sample image are extracted through a feature extraction module and the region of interest corresponding to the at least two sets of features and its score value are obtained; The fusion of the region of interest and the vehicle sample image is input into the classification module to obtain the classification feature of the entire image of the sample image, the feature of the region of interest, and the feature of the fusion of the entire image and the region of interest And based on the labeling information of the vehicle sample image and the loss function value, the parameters of the feature extraction module and the classification module are updated to optimize the vehicle year recognition model. The method extracts at least two sets of features, and inputs the region of interest and the vehicle sample image into the classification module, and optimizes the recognition model according to the loss function, which not only improves the sense of hierarchy of feature extraction, but also The corresponding loss function is used to update the parameters of the recognition model, thereby improving the accuracy of recognition.

According to the fourth aspect, an embodiment of the present invention provides a vehicle year model recognition device, including:

The second acquisition module is used to acquire the target vehicle image;

The recognition module is used to input the target vehicle image into a vehicle year recognition model to obtain the year of the target vehicle image; wherein the vehicle year recognition model is based on the first aspect or any one of the first aspects It is obtained by training the training method of the vehicle model year model described in the item implementation mode.

According to the vehicle year recognition device provided by the embodiment of the present invention, the year of the target vehicle image is obtained by inputting the target vehicle image into the vehicle year recognition model for classification, wherein the vehicle year recognition model is The use of at least two sets of features of the sample image and the sample image for joint training and parameter optimization with the loss function value can ensure the accuracy of the year recognition of the target vehicle image.

According to the fifth aspect, an embodiment of the present invention provides an electronic device, including:

A memory and a processor, the memory and the processor are communicatively connected to each other, and computer instructions are stored in the memory, and the processor executes the first aspect or any one of the first aspects by executing the computer instructions The training method of the vehicle year recognition model according to the embodiment, or the vehicle year recognition method according to the second aspect or any one of the second aspects.

Description of the drawings

In order to more clearly illustrate the specific embodiments of the present invention or the technical solutions in the prior art, the following will briefly introduce the drawings that need to be used in the specific embodiments or the description of the prior art. Obviously, the appendix in the following description The drawings are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative work.

Fig. 1 is a flowchart of a training method for a vehicle model recognition model according to an embodiment of the present invention;

2 is a complete flowchart of a training method for a vehicle model recognition model according to an embodiment of the present invention;

Fig. 3 is a flowchart of a method for recognizing the year of a vehicle according to an embodiment of the present invention;

4 is a structural block diagram of a training device for a vehicle model recognition model according to an embodiment of the present invention;

Fig. 5 is a structural block diagram of a vehicle year model recognition device according to an embodiment of the present invention;

6 is a schematic diagram of the hardware structure of an electronic device provided by an embodiment of the present invention;

Fig. 7 is a schematic diagram of the composition of a vehicle model year recognition model according to an embodiment of the present invention.

Detailed ways

In order to make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments These are a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative work shall fall within the protection scope of the present invention.

According to the embodiment of the present invention, a training method of a vehicle model year recognition model and an embodiment of a vehicle model recognition method are provided. The instructions are executed in a computer system that executes the instructions, and, although a logical sequence is shown in the flowchart, in some cases, the steps shown or described may be executed in a different order than here.

In this embodiment, a method for training a vehicle year recognition model is provided, which can be used in the above-mentioned electronic equipment. FIG. 1 is a flowchart of a training method for a vehicle year recognition model according to an embodiment of the present invention, as shown in FIG. 1 , The process includes the following steps:

S11: Obtain a vehicle sample image with annotated information.

Wherein, the labeling information includes the vehicle brand and year model in the vehicle sample image.

Specifically, about 1.2 million vehicle images of 10,216 year types are collected from vehicle bayonet surveillance videos and highway cameras, and information is marked on the vehicle images, where the vehicle images include cars, trucks, and buses. Three types, the label information includes the front and back of the vehicle in the image, the major brand, the sub-brand, the manufacturer, and the model year; the sample image is scaled to 256*256, then cropped to 224*224, and the mean variance is processed As the vehicle sample image with the labeled information.

S12: Input the vehicle sample image into a feature extraction module to obtain at least two sets of features of the vehicle sample image.

In a specific embodiment, three sets of features of the vehicle sample images are extracted; as shown in Figure 7, the lightweight neural network SqueezeNet is selected as the feature extraction module, and the features are extracted from the Fire2 module of the SqueezeNet network (SqueezeNet network Three sets of features of different scales are extracted from the Fire5 module (the fifth module of the SqueezeNet network), and the Fire9 module (the ninth module of the SqueezeNet network); among them, due to the large number of vehicle sample images, In order to save time, the output size of the last convolutional layer of the SqueezeNet network is changed from 512*13*13 to 1024*7*7.

Optionally, the three sets of features with different scales can also be extracted from other Fire modules of the lightweight neural network SqueezeNet, preferably from the Fire2 module, Fire5 module, and Fire9 module; optionally, you can also select Back Propagation (abbreviated as BP) neural network or Learning Vector Quantization (abbreviated as LVQ) neural network or Hopfield neural network is used as the feature extraction module to perform feature extraction on the vehicle sample image; optionally, the feature extraction of the vehicle sample image The number of groups can also be selected according to actual needs, such as 4 groups, 5 groups, etc. In specific embodiments, 3 groups are preferred.

S13, based on the at least two sets of features, obtain a region of interest corresponding to each set of features and a score value thereof.

In a specific embodiment, as shown in FIG. 7, the three sets of features with different scales are input into the Region Proposal Network (RPN), and a series of sizes of 24*24, 32*32, 86*86 rectangular frame, the rectangular frame is scaled according to the ratio of 1:3, 2:3, 1:1, that is, 9 rectangular frames can be obtained for each group of features, and the rectangular frame contains the corresponding corresponding to each group of features. Information volume and information volume score; the 9 rectangular boxes corresponding to each group of features are used non-maximum suppression algorithm (NMS) to retain the rectangular box with the highest information volume score of each group and its score value as The region of interest corresponding to each set of features and its score value.

Optionally, the information volume scores corresponding to the rectangular boxes can also be sorted by the method of sorting and screening, so as to obtain the rectangular box with the highest information volume score; optionally, you can also select Region-CNN (referred to as R The CNN) network generates the region of interest for each set of features.

S14. The fusion of the region of interest and the vehicle sample image is input into a classification module to obtain the entire image classification feature of the sample image, the feature of the region of interest, and the fusion of the entire image and the region of interest After the characteristics.

Wherein, the vehicle year model recognition model includes the feature extraction module and the classification module.

In a specific embodiment, as shown in FIG. 7, the deep residual network Resnet50 is selected as the classification module; the region of interest, that is, the rectangular box with the highest score corresponding to the three groups of features is bilinearly interpolated To 224*224 size, and input the deep residual network Resnet50 together with the sample image; acquire the sample image features and the interest from the fully connected (FC) layer of the deep residual network Resnet50 The overall feature of the region, and cut out from the overall feature the entire image classification feature of the sample image, the feature of each region of interest, and the feature after the entire image is fused with the region of interest. Optionally, ResNeXt network or Resnet101 or other residual networks of the same type can also be selected as the classification module.

S15: Calculate a loss function value according to the entire image classification feature of the sample image, the feature of the region of interest, the feature after the entire image is fused with the region of interest, and the score value of each region of interest .

In a specific embodiment, as shown in FIG. 7, the overall features obtained before the fully connected (FC) layer are segmented into the classification features of the entire image, and the overall features of the region of interest are fused and input into the fully connected (FC) layer. And calculate the loss Loss ₂ to obtain the fusion loss function value after the fusion of the overall image classification feature and the overall feature of the region of interest; the overall image feature of the sample image is obtained from the fully connected (FC) layer , And by accessing the softmax layer, the overall image loss function value Loss ₃ corresponding to the sample image is obtained; the components of the region of interest segmented from the overall features obtained before the fully connected (FC) layer After the features are input to the fully connected (FC) layer and the softmax layer, the component loss function value Loss ₁ corresponding to the region of interest is obtained; the overall feature obtained before the fully connected (FC) layer is segmented into the feeling The overall feature of the region of interest is input to the fully connected (FC) layer and processed by log softmax, and the loss function values of the regions of interest (three groups) are obtained respectively, and the level loss is calculated together with the corresponding amount of information to obtain the region of interest (Three groups) Corresponding grade loss function value

Optionally, the log softmax can be replaced by other loss calculation methods, such as NLLLoss or Cross Entropy softmax.

S16, based on the annotation information of the vehicle sample image and the loss function value, update the parameters of the feature extraction module and the classification module to optimize the vehicle year recognition model.

In a specific embodiment, as shown in FIG. 7, the fusion loss function value, the overall image loss function value corresponding to the sample image, the component loss function value corresponding to the overall feature of the region of interest, and the feeling The level loss function corresponding to the region of interest (three groups) is summed with a certain weight, and the parameters of the SqueezeNet network and the Resnet50 network are updated until the number of updates of the SqueezeNet network and the Resnet50 network reaches the threshold. , Or, the loss function value of the SqueezeNet network and the Resnet50 network stops after a certain preset range, so as to obtain the vehicle year recognition model. Wherein, the vehicle model recognition model is composed of SqueezeNet network and Resnet50 network.

In the method for training a vehicle model recognition model provided by the embodiment of the present invention, at least two sets of features of the vehicle sample image are extracted through a feature extraction module, and the regions of interest corresponding to the at least two sets of features and their score values are obtained; The fusion of the region of interest and the vehicle sample image is input into the classification module to obtain the classification feature of the entire image of the sample image, the feature of the region of interest, and the feature of the fusion of the entire image and the region of interest And based on the labeling information of the vehicle sample image and the loss function value, the parameters of the feature extraction module and the classification module are updated to optimize the vehicle year recognition model. The method extracts at least two sets of features, and inputs the region of interest and the vehicle sample image into the classification module, and optimizes the recognition model according to the loss function, which not only improves the sense of hierarchy of feature extraction, but also The corresponding loss function is used to update the parameters of the recognition model, thereby improving the accuracy of recognition.

Fig. 2 is a complete flowchart of a training method for a vehicle model recognition model according to an embodiment of the present invention. As shown in Fig. 2, the method includes the following steps:

S21: Obtain a vehicle sample image with annotated information.

For details, please refer to S11 described in Figure 1, which will not be repeated here.

S22: Input the vehicle sample image into a feature extraction module to obtain at least two sets of features of the vehicle sample image.

For details, please refer to S12 shown in Figure 1, which will not be repeated here.

S23, based on the at least two sets of features, obtain a region of interest corresponding to each set of features and a score value thereof.

For details, please refer to S13 shown in Figure 1, which will not be repeated here.

Optionally, the step S23 may include the following steps:

S231: Use each group of features to generate multiple candidate regions corresponding to each group of features.

Specifically, each set of features is input into the Region Proposal Network (RPN) to obtain multiple rectangular boxes corresponding to each set of features, and each rectangular box corresponds to a score value. The corresponding area is the candidate area.

S232: Based on the multiple candidate regions, generate a region of interest corresponding to each group of features and a score value thereof.

Specifically, the plurality of rectangular boxes and their score values are subjected to non-maximum value suppression (NMS) processing or sorting and screening processing to obtain the rectangular box with the highest value of each component, and the rectangular box with the highest score is Is the region of interest.

Optionally, the step S232 may include:

(1) Calculate the score value of each candidate region.

(2) Determine that the candidate area with the highest score is the region of interest.

S24. The fusion of the region of interest and the vehicle sample image is input into a classification module to obtain the classification feature of the entire image of the sample image, the feature of the region of interest, and the fusion of the entire image and the region of interest After the characteristics.

For details, please refer to S14 shown in Figure 1, which will not be repeated here.

Optionally, the step S24 may include the following steps:

S241: The region of interest and the vehicle sample image are fused and input into a classification module.

Wherein, the output of the classification module is the annual model classification of the vehicle sample image;

Specifically, the region of interest and the vehicle sample image are input into Resnet 50 together.

S242: Extract the output of the last pooling layer of the classification module to obtain the overall characteristics of the vehicle sample image.

Specifically, the characteristics of the sample image and the overall characteristics of the region of interest are obtained from the output of the last pooling layer before the Fully Connected (FC) layer of the deep residual network Resnet50.

S243: Perform segmentation on the overall feature of the vehicle sample image to obtain the overall image classification feature of the sample image and the feature of the region of interest.

Specifically, the entire image classification feature of the sample image and the feature corresponding to each region of interest are cut out from the overall feature.

S25: Calculate a loss function value based on the entire image classification feature of the sample image, the feature of the region of interest, the feature after the entire image is fused with the region of interest, and the score value of each region of interest. .

For details, please refer to S15 shown in Figure 1, which will not be repeated here.

Optionally, the step S25 may include the following steps:

S251: Fusion the whole image classification feature of the sample image with the feature of the region of interest to obtain a fusion feature.

Specifically, the overall feature obtained before the Fully Connected (FC) layer is segmented into the overall image classification feature and the overall feature of the region of interest, and the overall image classification feature is merged with the overall feature to obtain the result. The fusion characteristics.

S252: Calculate a fusion loss function value based on the fusion feature.

Specifically, the fusion feature is input to a fully connected (FC) layer and a loss function is calculated to obtain the fusion loss function value.

S253: Calculate a component loss function value using all the features of the region of interest.

Specifically, after the features of the region of interest segmented from the overall features obtained before the fully connected (FC) layer are input into the fully connected (FC) layer and the softmax layer, the component loss function value corresponding to the region of interest is obtained .

S254: Calculate the overall image loss function value by using the entire image classification feature of the sample image.

Specifically, in order to reduce the complexity of calculation, the overall image feature corresponding to the sample image is segmented from the common classification features of the vehicle sample image and the region of interest obtained after the Fully Connected (FC) layer, and The entire image feature is used as the entire image classification feature, and the entire image loss function corresponding to the entire image classification feature is obtained after accessing the softmax layer.

S255: Calculate the level loss function value corresponding to each region of interest by using the feature of each region of interest and its corresponding score value.

Specifically, the features of the region of interest segmented from the overall features obtained before the fully connected (FC) layer are input to the fully connected (FC) layer and log softmax processing is performed to obtain the regions of interest ( The level loss is calculated together with the loss function values of the three groups) and the corresponding amount of information to obtain the level loss function corresponding to the region of interest (three groups).

S256: Calculate the loss function value based on each of the level loss function value, the component loss function value, the fusion loss function value, and the entire graph loss function value.

Specifically, the value of the level loss function, the value of the component loss function, the value of the fusion loss function, and the value of the entire graph loss function are summed by a certain weight to obtain the loss function value.

As an optional implementation manner of the embodiment of the present invention, the loss function is calculated using the following calculation formula:

Among them, Loss ₁ is the component loss function value, Loss ₂ is the fusion loss function value, and Loss ₃ is the overall image loss function value.

as well as

Respectively are the level loss function values corresponding to the region of interest.

In a specific embodiment, the regions of interest are in three groups. Therefore, the calculation formula of the loss function is:

And points

It is the level loss function value corresponding to the region of interest.

Fig. 3 is a flowchart of a method for recognizing the year of a vehicle according to an embodiment of the present invention. As shown in Fig. 3, the method includes the following steps:

S31: Acquire an image of the target vehicle.

Specifically, the target vehicle image can be obtained from a vehicle bayonet or a road camera, and the vehicle image can be any type of car, truck, and bus.

S32, inputting the target vehicle image into a vehicle year model recognition model to obtain the year model of the target vehicle image.

Specifically, the vehicle model recognition model includes a feature extraction module and a classification module. Preferably, a lightweight neural network SqueezeNet is selected as the feature extraction module, and a deep residual network Resnet50 is selected as the classification module; After the target vehicle image is input to the vehicle model year recognition model, the lightweight neural network SqueezeNet is used to perform feature extraction, and then the deep residual network Resnet50 is used for classification to obtain the model year of the target vehicle image.

Fig. 4 is a training device for a vehicle model recognition model according to an embodiment of the present invention, as shown in Fig. 4, including:

The first obtaining module 41 is configured to obtain a sample vehicle image with label information; wherein the label information includes the vehicle brand and year model in the vehicle sample image;

The first feature extraction module 42 is configured to input the vehicle sample image into the feature extraction module to obtain at least two sets of features of the vehicle sample image;

The scoring module 43 is configured to obtain the region of interest corresponding to each group of features and its score value based on the at least two groups of features;

The second feature extraction module 44 is configured to merge the region of interest and the vehicle sample image and input it into the classification module to obtain the entire image classification feature of the sample image, the feature of the region of interest, and the entire image Features fused with the region of interest; wherein, the vehicle year model recognition model includes the feature extraction module and the classification module;

The calculation module 45 is configured to classify the features of the entire image of the sample image, the features of the region of interest, the features after the entire image is fused with the region of interest, and the score value of each region of interest, Calculate the value of the loss function;

The parameter optimization module 46 is configured to update the parameters of the feature extraction module and the classification module based on the annotation information of the vehicle sample image and the loss function value, so as to optimize the vehicle year recognition model.

The training device for the vehicle model recognition model provided by the embodiment of the present invention extracts at least two sets of features of the vehicle sample image through a feature extraction module, and obtains the region of interest corresponding to the at least two sets of features and the score value thereof; The fusion of the region of interest and the vehicle sample image is input into the classification module to obtain the entire image classification feature of the sample image, the feature of each region of interest, and the fusion feature of the entire image and the region of interest And based on the label information of the vehicle sample image and the loss function value, the parameters of the feature extraction module and the classification module are updated to obtain the vehicle year recognition model. The method extracts at least two sets of features, and inputs the region of interest and the vehicle sample image into the classification module, and optimizes the recognition model according to the loss function, which not only improves the sense of hierarchy of feature extraction, but also The corresponding loss function is used to update the parameters of the recognition model, thereby improving the accuracy of recognition.

Fig. 5 is a vehicle year model recognition device according to an embodiment of the present invention, as shown in Fig. 5, including:

The second acquisition module 51 is used to acquire an image of a target vehicle;

The recognition module 52 is used to input the target vehicle image into a vehicle year recognition model to obtain the year of the target vehicle image; wherein, the vehicle year recognition model is shown in FIG. 1 or FIG. 2 Trained on the training method of the vehicle model year.

The embodiment of the present invention also provides an electronic device having the training device for the vehicle year model recognition model shown in FIG. 4 and the vehicle year model recognition device shown in FIG. 5.

Please refer to FIG. 6. FIG. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. As shown in FIG. 6, the electronic device may include: at least one processor 61, such as a CPU (Central Processing Unit, central processing unit) , At least one communication interface 63, memory 64, and at least one communication bus 62. Among them, the communication bus 62 is used to implement connection and communication between these components. The communication interface 63 may include a display screen (Display) and a keyboard (Keyboard), and the optional communication interface 63 may also include a standard wired interface and a wireless interface. The memory 64 may be a high-speed RAM memory (Random Access Memory, volatile random access memory), or a non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory 64 may also be at least one storage device located far away from the aforementioned processor 61. The processor 61 may be combined with the devices described in FIG. 4 and FIG. 5, the memory 64 stores application programs, and the processor 61 calls the program code stored in the memory 64 to execute any of the above method steps.

The communication bus 62 may be a peripheral component interconnect standard (peripheral component interconnect, PCI for short) bus or an extended industry standard architecture (EISA for short) bus, etc. The communication bus 62 can be divided into an address bus, a data bus, a control bus, and so on. For ease of representation, only one thick line is used in FIG. 6, but it does not mean that there is only one bus or one type of bus.

The memory 64 may include a volatile memory (English: volatile memory), such as a random access memory (English: random-access memory, abbreviation: RAM); the memory may also include a non-volatile memory (English: non-volatile memory). memory), such as flash memory (English: flash memory), hard disk (English: hard disk drive, abbreviation: HDD) or solid-state hard disk (English: solid-state drive, abbreviation: SSD); memory 64 may also include the above types The combination of memory.

The processor 61 may be a central processing unit (English: central processing unit, abbreviation: CPU), a network processor (English: network processor, abbreviation: NP), or a combination of CPU and NP.

Wherein, the processor 61 may further include a hardware chip. The aforementioned hardware chip may be an application-specific integrated circuit (English: application-specific integrated circuit, abbreviation: ASIC), a programmable logic device (English: programmable logic device, abbreviation: PLD) or a combination thereof. The above-mentioned PLD can be a complex programmable logic device (English: complex programmable logic device, abbreviation: CPLD), field programmable logic gate array (English: field-programmable gate array, abbreviation: FPGA), general array logic (English: generic array) logic, abbreviation: GAL) or any combination thereof.

Optionally, the memory 64 is also used to store program instructions. The processor 71 can call program instructions to implement the vehicle year model training method shown in the embodiments of FIG. 1 to FIG. 2 of the present application and/or the vehicle year model recognition method shown in FIG. 3.

The embodiment of the present invention also provides a non-transitory computer storage medium, the computer storage medium stores computer-executable instructions, and the computer-executable instructions can execute the training method of the vehicle year recognition model in any of the above-mentioned method embodiments And/or the method of identifying the year of the vehicle. Wherein, the storage medium can be a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), a random access memory (RAM), a flash memory (Flash Memory), a hard disk (Hard Disk Drive, abbreviation: HDD) or solid-state drive (Solid-State Drive, SSD), etc.; the storage medium may also include a combination of the foregoing types of memories.

Although the embodiments of the present invention have been described with reference to the accompanying drawings, those skilled in the art can make various modifications and variations without departing from the spirit and scope of the present invention, and such modifications and variations fall within the scope of the appended claims. Within the limited range.

Claims

A training method for a vehicle model recognition model, which is characterized in that it includes:

Acquiring a sample vehicle image with label information; wherein the label information includes the vehicle brand and year model in the vehicle sample image;

Input the vehicle sample image into a feature extraction module to obtain at least two sets of features of the vehicle sample image;

Based on the at least two sets of features, obtain the region of interest corresponding to each set of features and its score value;

The fusion of the region of interest and the vehicle sample image is input into the classification module to obtain the classification feature of the entire image of the sample image, the feature of the region of interest, and the fusion of the entire image and the region of interest Features; wherein, the vehicle year model recognition model includes the feature extraction module and the classification module;

Calculating a loss function value according to the entire image classification feature of the sample image, the feature of the region of interest, the fused feature of the entire image and the region of interest, and the score value of each region of interest;

Based on the annotation information of the vehicle sample image and the loss function value, the parameters of the feature extraction module and the classification module are updated to optimize the vehicle year recognition model.
The method according to claim 1, wherein the obtaining a region of interest corresponding to each set of features and a score value thereof based on the at least two sets of features comprises:

Use each set of features to generate multiple candidate regions corresponding to each set of features;

Based on the multiple candidate regions, a region of interest corresponding to each set of features and its score value are generated.
The method according to claim 2, wherein the generating the region of interest corresponding to each set of features and its score value based on the plurality of candidate regions comprises:

Calculating the score value of each candidate region;

It is determined that the candidate region with the highest score value is the region of interest.
The method according to claim 1, wherein the fusion of the region of interest and the vehicle sample image is input into a classification module to obtain the whole image classification feature of the sample image and the interest The characteristics of the region and the characteristics of the fusion of the whole image and the region of interest include:

The fusion of the region of interest and the vehicle sample image is input into a classification module; wherein the output of the classification module is the annual model classification of the vehicle sample image;

Extracting the output of the last pooling layer of the classification module to obtain the overall characteristics of the vehicle sample image;

The overall feature of the vehicle sample image is segmented to obtain the overall image classification feature of the sample image and the feature of the region of interest.
The method according to claim 1, wherein the classification feature of the entire image according to the sample image, the feature of the region of interest, the fused feature of the entire image and the region of interest, and each The score value of the region of interest and the calculation of the loss function value include:

Fusing the entire image feature of the sample image with the feature of the region of interest to obtain a fusion feature;

Calculating the value of the fusion loss function based on the fusion feature;

Use the features of the region of interest to calculate the component loss function value;

Calculate the value of the overall image loss function by using the entire image classification feature of the sample image;

Calculating the level loss function value corresponding to each region of interest by using the feature of each region of interest and its corresponding score value;

Calculate the loss function value based on each of the level loss function value, the component loss function value, the fusion loss function value, and the entire graph loss function value.
The method according to claim 5, wherein the loss function value is calculated using the following formula:

Among them, Loss 1 is the component loss function value, Loss 2 is the fusion loss function value, and Loss 3 is the overall image loss function value.
as well as
Respectively are the level loss function values corresponding to the region of interest.
A method for identifying the year of a vehicle, which is characterized in that it includes:

Obtain an image of the target vehicle;

The target vehicle image is input into the vehicle year recognition model to obtain the year of the target vehicle image; wherein the vehicle year recognition model is the vehicle year recognition model according to any one of claims 1-6. The training method of this model is obtained.
A training device for a vehicle model recognition model, which is characterized in that it comprises:

The first acquisition module is configured to acquire a sample vehicle image with label information; wherein the label information includes the vehicle brand and year model in the vehicle sample image;

The first feature extraction module is configured to input the vehicle sample image into the feature extraction module to obtain at least two sets of features of the vehicle sample image;

The scoring module is configured to obtain the region of interest corresponding to each group of features and its score value based on the at least two groups of features;

The second feature extraction module is used to merge the region of interest and the vehicle sample image and input it into the classification module to obtain the entire image classification feature of the sample image, the feature of the region of interest, and the entire image and The features after the fusion of the region of interest; wherein, the vehicle year recognition model includes the feature extraction module and the classification module;

The calculation module is used to calculate the entire image classification feature of the sample image, the feature of the region of interest, the feature after the entire image is fused with the region of interest, and the score value of each region of interest. Loss function value;

The parameter optimization module is configured to update the parameters of the feature extraction module and the classification module based on the annotation information of the vehicle sample image and the loss function value, so as to optimize the vehicle year recognition model.
A vehicle year model recognition device, which is characterized in that it comprises:

The second acquisition module is used to acquire the target vehicle image;

The recognition module is configured to input the target vehicle image into a vehicle year recognition model to obtain the year payment of the target vehicle image; wherein the vehicle year recognition model is according to any one of claims 1-6 It is obtained by training the training method of the vehicle model year model.
An electronic device, characterized in that it comprises:

A memory and a processor, the memory and the processor are in communication connection with each other, the memory is stored with computer instructions, and the processor executes any one of claims 1-6 by executing the computer instructions The training method of the vehicle year recognition model or the vehicle year recognition method of claim 7.