CN117765485A - Vehicle type recognition method, device and equipment based on improved depth residual error network - Google Patents

Vehicle type recognition method, device and equipment based on improved depth residual error network Download PDF

Info

Publication number
CN117765485A
CN117765485A CN202311777578.3A CN202311777578A CN117765485A CN 117765485 A CN117765485 A CN 117765485A CN 202311777578 A CN202311777578 A CN 202311777578A CN 117765485 A CN117765485 A CN 117765485A
Authority
CN
China
Prior art keywords
vehicle
image
model
feature
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311777578.3A
Other languages
Chinese (zh)
Inventor
王向辉
陈楠
宋梦雄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Digital City Technology Co ltd
Original Assignee
China Telecom Digital City Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Digital City Technology Co ltd filed Critical China Telecom Digital City Technology Co ltd
Priority to CN202311777578.3A priority Critical patent/CN117765485A/en
Publication of CN117765485A publication Critical patent/CN117765485A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Analysis (AREA)

Abstract

The application provides a vehicle type recognition method, device and equipment based on an improved depth residual error network, and relates to the technical field of vehicle type recognition, wherein the method comprises the following steps: acquiring a vehicle image to be identified, and performing image enhancement processing through a pre-trained image quality enhancement model to obtain a target vehicle image; performing vehicle detection and positioning processing on the target vehicle image through a pre-trained vehicle detection and positioning model to obtain a vehicle sub-image in the target vehicle image; classifying and identifying each vehicle sub-graph in the image through a vehicle type classification model, and determining the type of the vehicle; the model is an improved depth residual error network, and the improved depth residual error comprises a target residual error fusion module which is used for carrying out channel splicing on the feature images before and after convolution in the channel dimension. The method and the device solve the technical problem that in the prior art, the accuracy of vehicle type classification is reduced due to poor image quality and low resolution, and improve the accuracy of vehicle type identification.

Description

Vehicle type recognition method, device and equipment based on improved depth residual error network
Technical Field
The application relates to the technical field of vehicle type recognition, in particular to a vehicle type recognition method, device and equipment based on an improved depth residual error network.
Background
With the development and application of deep learning technology in the intelligent traffic field, the vehicle model recognition technology based on deep learning has become an important part in the intelligent traffic system. The existing vehicle type recognition method based on deep learning generally extracts features from an acquired vehicle image directly to classify the vehicle type, so that the type of the vehicle is determined. However, the influence of the acquired image quality on the accuracy of vehicle type recognition is ignored in the prior art, and the accuracy of vehicle type recognition is low due to the fact that the acquired image quality is poor in the fact that the influence of factors such as hardware equipment performance, weather environment and illumination intensity is often caused.
Disclosure of Invention
The purpose of the application is to provide a vehicle type recognition method, device and equipment based on an improved depth residual error network, so as to solve the technical problem of reduced vehicle type classification accuracy caused by poor image quality and low resolution in the prior art, and improve the recognition accuracy of the vehicle type.
In a first aspect, the present invention provides a vehicle model identification method based on an improved depth residual error network, the method comprising:
acquiring a vehicle image to be identified, and performing image enhancement processing on the vehicle image to be identified through a pre-trained image quality enhancement model to obtain a target vehicle image;
performing vehicle detection and positioning processing on the target vehicle image through a pre-trained vehicle detection and positioning model to obtain a vehicle sub-image in the target vehicle image;
classifying and identifying each vehicle sub-graph in the image through a vehicle type classification model, and determining the type of the vehicle; the vehicle type classification model is an improved depth residual error network, and the improved depth residual error comprises a target residual error fusion module which is used for performing channel splicing on the feature images before and after convolution in the channel dimension.
In an alternative embodiment, the pre-trained image quality enhancement model includes a low-level feature information extraction layer, a high-level feature information extraction layer, and an upsampling layer;
the low-level characteristic information extraction layer comprises a convolution layer with the convolution kernel size of 1 multiplied by 1, the step length of 1 and the filling of 1, and is used for extracting low-frequency information in an image; the high-layer feature extraction layer comprises a plurality of residual block structures with BN layers removed, wherein each residual block structure consists of two convolution layers, a Relu layer and jump connection and is used for extracting high-frequency information in an image; the upsampling layer includes deconvolution.
In an alternative embodiment, the training step of the pre-trained vehicle detection positioning model includes:
labeling the target vehicle images, marking the vehicle position in each target vehicle image, and generating an XML format labeling file;
dividing training samples of the marked XML format mark files according to a preset proportion to obtain training sets and test sets;
and inputting the vehicle images in the training set into the vehicle detection positioning model for training, continuously updating the model parameters to minimize the loss function, inputting the vehicle images in the testing set into the vehicle detection positioning model after updating the model parameters, and completing the training of the vehicle detection positioning model when the preset recognition accuracy rate is reached.
In an alternative embodiment, the vehicle detection positioning model comprises a feature extraction network, a region suggestion network, a region of interest pooling module and a prediction module;
the regional suggestion network is of a double-branch structure formed by convolution, and is used for carrying out feature extraction operation on the feature images through a convolution sliding window with a preset convolution kernel size after the feature images extracted by the feature extraction network are obtained, so as to obtain a feature image with a preset dimension, generating a plurality of anchor frames on the feature image by taking the central point of each window as a coordinate, and then carrying out convolution operation on the feature image with the preset dimension respectively to obtain classification scores and coordinate parameters; judging the generated anchor frames according to the classification score, determining whether the anchor frames contain vehicles or not, and determining the confidence level of the vehicles contained in the anchor frames, so as to screen the anchor frames, and finally obtaining candidate frames containing the vehicles;
The input of the interesting region pooling module comprises a first input unit and a second input unit, wherein the first input unit inputs a feature map of the whole image obtained through a feature extraction network, and the second input unit inputs a candidate frame obtained after the processing of the regional suggestion network;
the prediction module is composed of a fully connected layer.
In an alternative embodiment, the target residual fusion module comprises a preset number of first convolution layers, a channel splicing layer and a preset number of second convolution layers;
when the input of the target residual error fusion module is a feature mapWherein W, H, C represents the width, height and channel number of the feature map respectively, the output of the residual fusion module is:
y=K m ((K 1 (x),K 2 (K 1 (x)),......K n (K n-1 (K......(K 1 (x))))))+x
wherein K is 1 (·)、K 2 (·)……K n (. Cndot.) represents n first convolution layers, K m (·) represents m second convolution layers.
In an alternative embodiment, the improved depth residual network further comprises a target self-attention module comprising a jump connection structure.
In an alternative embodiment, the target self-attention module is configured to:
after a feature map is input, channel expansion is performed through a convolution layer, then the input feature map is weighted through two parallel multi-head self-attention mechanism modules, the corresponding positions of the weighted feature maps calculated by the two branches are added, the channel is laminated through the convolution layer, finally the feature map after the channel is compressed and the input feature map are taken as output through a jump connection structure, and the output is shown in the following formula:
y=K 2 (f 1 (K 1 (x))+f 2 (K 1 (x)))+x
Where x represents the input of the self-attention module, y represents the output of the self-attention module, f 1 (·)、f 2 (. Cndot.) represents a two-branch multi-headed self-attention mechanism operation, K 1 (·)、K 2 (. Cndot.) represents two convolution layers, respectively.
In a second aspect, the present invention provides a vehicle model recognition device based on an improved depth residual network, the device comprising:
the image enhancement processing module is used for acquiring a vehicle image to be identified, and carrying out image enhancement processing on the vehicle image to be identified through a pre-trained image quality enhancement model to obtain a target vehicle image;
the vehicle detection positioning module is used for carrying out vehicle detection positioning processing on the target vehicle image through a pre-trained vehicle detection positioning model to obtain a vehicle sub-image in the target vehicle image;
the vehicle type recognition module is used for classifying and recognizing each vehicle sub-image in the image through the vehicle type classification model to determine the type of the vehicle; the vehicle type classification model is an improved depth residual error network, and the improved depth residual error comprises a target residual error fusion module which is used for performing channel splicing on the feature images before and after convolution in the channel dimension.
In a third aspect, the present invention provides an electronic device comprising a processor and a memory, the memory storing computer executable instructions executable by the processor to implement the improved depth residual network based vehicle model identification method of any of the previous embodiments.
In a fourth aspect, the present invention provides a computer-readable storage medium storing computer-executable instructions that, when invoked and executed by a processor, cause the processor to implement the improved depth residual network-based vehicle model identification method of any of the preceding embodiments.
According to the vehicle type recognition method, device and equipment based on the improved depth residual error network, firstly, a vehicle image to be recognized at present is obtained; obtaining a high-quality vehicle image by performing image enhancement processing on the vehicle image; inputting a high-quality vehicle image into a vehicle detection positioning model to position the vehicle in the image and intercepting the image corresponding to each vehicle in the image; and finally, classifying and identifying the cut-out vehicle images through a vehicle type classification model, so as to determine the vehicle type. The method converts the acquired low-quality image into a high-quality image through a deep learning super-resolution reconstruction algorithm, then positions and identifies the image, improves the precision of a subsequent positioning and classifying model, and reduces the false identification rate of the vehicle; because the model classification model comprises the target residual fusion module, the characteristics extracted by convolution of each layer can be more fully utilized, the utilization rate of the image characteristics is improved, and the accuracy of model identification is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a vehicle type recognition method based on an improved depth residual error network according to an embodiment of the present application;
fig. 2 is a flowchart of a specific vehicle type recognition method provided in an embodiment of the present application;
fig. 3 is a schematic structural diagram of a residual fusion module provided in an embodiment of the present application;
FIG. 4 is a schematic structural diagram of a self-attention module according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of an improved depth residual network according to an embodiment of the present application;
fig. 6 is a block diagram of a vehicle type recognition device based on an improved depth residual error network according to an embodiment of the present application;
fig. 7 is a block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present application, as provided in the accompanying drawings, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
The embodiment of the application provides a vehicle type recognition method based on an improved depth residual error network, which is shown in fig. 1, and mainly comprises the following steps:
Step S110, a vehicle image to be identified is obtained, and image enhancement processing is carried out on the vehicle image to be identified through a pre-trained image quality enhancement model, so that a target vehicle image is obtained.
The vehicle image to be identified can be acquired through an image device (generally a camera) arranged at the intersection, and the camera can provide real-time video stream or offline picture data. In one embodiment, the key parameters of the camera are color RGB three channels, pixels 1920×1080, and video stream frame rate 25. If the video stream acquired by the camera is acquired, the video is converted into an image format, and the conversion method can be used for capturing a picture every several frames of the video, so that the vehicle image to be identified is obtained.
In the image enhancement processing of the vehicle image to be identified, the function of image enhancement can be completed by a deep learning super-resolution reconstruction algorithm, such as a typical EDSR algorithm. In an alternative embodiment, the pre-trained image quality enhancement model includes a low-level feature information extraction layer, a high-level feature information extraction layer, and an upsampling layer; the low-level characteristic information extraction layer comprises a convolution layer with the convolution kernel size of 1 multiplied by 1, the step length of 1 and the filling of 1, and is used for extracting low-frequency information in an image; the high-layer feature extraction layer comprises a plurality of residual block structures with BN layers removed, wherein each residual block structure consists of two convolution layers, a Relu layer and jump connection and is used for extracting high-frequency information in an image; the upsampling layer includes deconvolution.
And step S120, carrying out vehicle detection and positioning processing on the target vehicle image through a pre-trained vehicle detection and positioning model to obtain a vehicle sub-image in the target vehicle image.
In the above-mentioned vehicle detection positioning model, the function of vehicle positioning is completed by a deep learning target detection algorithm, in an alternative embodiment, the vehicle detection positioning model may employ, for example, a Fater RCNN, including a feature extraction network, a region suggestion network, a region of interest pooling module, and a prediction module; wherein,
the regional suggestion network is of a double-branch structure formed by convolution, and is used for carrying out feature extraction operation on the feature images through a convolution sliding window with a preset convolution kernel size after the feature images extracted by the feature extraction network are obtained, so as to obtain a feature image with a preset dimension, generating a plurality of anchor frames on the feature image by taking the central point of each window as a coordinate, and then carrying out convolution operation on the feature image with the preset dimension respectively to obtain classification scores and coordinate parameters; judging the generated anchor frames according to the classification score, determining whether the anchor frames contain vehicles or not, and determining the confidence level of the vehicles contained in the anchor frames, so as to screen the anchor frames, and finally obtaining candidate frames containing the vehicles;
The input of the region of interest pooling module comprises a first input unit and a second input unit, wherein the first input unit inputs a feature map of the whole image obtained through a feature extraction network, and the second input unit inputs a candidate frame obtained after the processing of a region suggestion network;
the prediction module is composed of a full connection layer.
In one embodiment, the training step of the vehicle detection positioning model may include:
labeling the target vehicle images, marking the vehicle position in each target vehicle image, and generating an XML format labeling file;
dividing training samples of the marked XML format mark files according to a preset proportion to obtain training sets and test sets;
and inputting the vehicle images in the training set into the vehicle detection positioning model for training, continuously updating the model parameters to minimize the loss function, inputting the vehicle images in the testing set into the vehicle detection positioning model after updating the model parameters, and completing the training of the vehicle detection positioning model when the preset recognition accuracy rate is reached.
Step S130, classifying and identifying each vehicle sub-graph in the image through a vehicle type classification model to determine the type of the vehicle; the vehicle type classification model is an improved depth residual error network, and the improved depth residual error comprises a target residual error fusion module which is used for performing channel splicing on the feature images before and after convolution in the channel dimension.
In one embodiment, the target residual fusion module of the present application is improved over a conventional residual module, and includes a preset number of first convolution layers, a channel splice layer, and a preset number of second convolution layers. In one embodiment, the preset number of the first convolution layers may be 2, and the number of the second convolution layers may be 1. In practical application, the number of the first convolution layers and the second convolution layers can be adjusted adaptively.
When the input of the target residual error fusion module is a feature mapWherein W, H, C represents the width, height and channel number of the feature map respectively, the output of the residual fusion module is:
y=K m ((K 1 (x),K 2 (K 1 (x)),......K n (K n-1 (K......(K 1 (x))))))+x
wherein K is 1 (·)、K 2 (·)……K n (. Cndot.) represents n first convolution layers, K m (·) represents m second convolution layers.
In one example, when the preset number of the first convolution layers is 2, and the number of the second convolution layers may be 1, the output of the residual fusion module is:
y=K 3 ((K 1 (x),K 2 (K 1 (x))))+x
wherein K is 1 (·)、K 2 (. Cndot.) represents 2 first convolution layers (such as 2 3 x 3 convolutions), K, respectively 3 (. Cndot.) represents a second convolution layer (such as 1 x 1 convolution).
Further, in order to acquire more global features of the vehicle image and improve the effectiveness of the attention weight, the improved depth residual network further comprises a target self-attention module, wherein the target self-attention module comprises a jump connection structure. The target self-attention module is used for:
After a feature map is input, channel expansion is performed through a convolution layer, then the input feature map is weighted through two parallel multi-head self-attention mechanism modules, the corresponding positions of the weighted feature maps calculated by the two branches are added, the channel is laminated through the convolution layer, finally the feature map after the channel is compressed and the input feature map are taken as output through a jump connection structure, and the output is shown in the following formula:
y=K 2 (f 1 (K 1 (x))+f 2 (K 1 (x)))+x
where x represents the input of the self-attention module, y represents the output of the self-attention module, f 1 (·)、f 2 (. Cndot.) represents a two-branch multi-headed self-attention mechanism operation, K 1 (·)、K 2 (. Cndot.) represents two convolution layers, respectively.
Further, the present application also provides a specific vehicle type recognition method, as shown in fig. 2, including the following steps S1 to S4:
step S1: the image acquisition device acquires the current vehicle image to be identified.
The image device is generally a camera arranged at the intersection, the camera can provide real-time video stream or offline picture data, and key parameters of the camera are color RGB three channels, pixels 1920 x 1080 and video stream frame rate 25. If a video stream is provided, the video is converted into an image format, and the conversion method can be used for capturing a picture every 5 frames of the video, so as to obtain a vehicle image.
Step S2: the vehicle image is subjected to image enhancement processing by the image quality enhancement model to obtain a high-quality vehicle image.
1) Constructing a data set: the method comprises the steps of constructing a training set and constructing a testing set;
a public vehicle model identification dataset may be employed, wherein the training set comprises two types of vehicle images: the test set has only low-quality vehicle images, namely low-quality images and high-quality images corresponding to the low-quality images one by one.
2) Building an image quality enhancement model:
the image quality enhancement module completes the function of image enhancement by a deep learning super-resolution reconstruction algorithm, such as a typical EDSR algorithm can be adopted, and the algorithm comprises; low-level feature information extraction, high-level feature information extraction and upsampling layers. The low-level characteristic information extraction is composed of a convolution layer with a convolution kernel size of 1 multiplied by 1, a step size of 1 and a padding of 1, and the main function of the low-level characteristic information extraction is to extract low-frequency information in an image.
The high-level feature extraction consists of a plurality of residual block structures with BN layers removed, and the structure consists of two convolution layers, one Relu layer and jump connection, and is mainly used for extracting high-frequency information in an image.
The up-sampling layer is formed by deconvolution.
3) Network model training
The low-dose image and the high-quality image in the training set are input into the model for training, model parameters are updated continuously to minimize a loss function, the low-quality image in the testing set is input into the network, and the model can automatically enhance the low-quality image and obtain an image close to the high-quality image, so that the training of the model is completed.
4) Vehicle image enhancement
And (2) inputting the low-quality image obtained in the step (S1) into an image quality enhancement module, and processing the low-quality image through a trained deep learning super-resolution reconstruction algorithm so as to obtain a high-quality vehicle image.
Step S3: and inputting the high-quality vehicle image into a vehicle detection positioning model to position the vehicle in the image and intercepting the image corresponding to each vehicle in the image.
1) Constructing a data set: the method comprises the steps of constructing a training set and constructing a testing set:
and (3) marking the high-quality images obtained through the processing in the step (S2), marking the positions of vehicles in each image, generating an XML format marking file, and enabling a marking tool to use LabelImg. And then randomly dividing the data into a training set and a testing set according to the proportion of 3:1, and expanding the data of the training set, wherein the expansion modes comprise horizontal overturning, vertical overturning, rotating and the like.
2) Building a vehicle detection positioning model:
the vehicle detection and positioning module performs the function of vehicle positioning by a deep learning target detection algorithm, for example, a Fater RCNN algorithm can be adopted, and the algorithm comprises: a feature extraction network, RPN, ROIPooling module, and a prediction module. After a high-quality vehicle image is input into a fast RCNN, image features are extracted by a feature extraction network to generate a feature image, the feature image is input into an RPN, a suggestion frame is generated through the RPN, the feature image and the suggestion frame are input into a ROIPooling together, features of a region of interest are extracted and fixed to be uniform in size, and finally a prediction module is used for judging vehicle positioning and intercepting images corresponding to each vehicle in the image.
The feature extraction network is composed of VGG16, and the VGG16 network further comprises 13 3×3 convolution layers and 3 full connection layers.
The RPN is a dual-branch structure formed by convolution, when the feature map extracted by the feature extraction network is input into the RPN, the RPN performs feature extraction operation on the feature map through a 3×3 convolution sliding window to obtain a feature map with 256 channels, and generates 9 anchors by taking the central point of each window as coordinates, and generates k anchors for the feature map in total, and then performs 1×1 convolution on the 256-dimensional feature map to obtain 2k classification scores and 4k coordinate parameters. And judging the generated k anchors by using the 2k classification scores to determine whether the frames of the anchors contain vehicles or not, and knowing the confidence level of the anchors containing the vehicles at the same time, so as to screen the anchors and finally obtain the anchors containing the vehicles.
The input of the roiling layer consists of two parts, one is a feature map of the whole image obtained through a feature extraction network, and the other is a candidate frame position obtained after RPN. The ROI Pooling layer may be divided into 3 steps, in which, in the first step, the position of the candidate frame is mapped to the feature map of the whole image according to the candidate frame, and the feature of the corresponding position in the candidate frame is extracted. And secondly, dividing the extracted candidate frame feature map into rectangular blocks with the same size. And thirdly, carrying out maximum pooling operation on each part in the rectangular block, and converting the candidate frame feature graphs with different sizes of the input ROIPooling layer into uniform sizes.
The prediction module is composed of a full connection layer.
3) Training a network model:
the vehicle images in the training set are input into the model for training, model parameters are updated continuously to enable the loss function to be minimum, the vehicle images in the testing set are input into the network, and the model can automatically position the vehicle, so that training of the model is completed.
4) Vehicle positioning:
and (2) inputting the high-quality image obtained in the step (S2) into a vehicle positioning module, positioning the vehicle through a trained deep learning target detection algorithm, and intercepting the image according to the detected vehicle position so as to obtain an image only comprising an independent vehicle.
Step S4: and classifying and identifying the cut-out vehicle images through the vehicle type classification model, so as to determine the vehicle type.
1) Build data set (including build training set and build test set):
and (3) randomly dividing the image which is processed in the step (S3) and only contains the single vehicle into a training set and a testing set according to the proportion of 3:1, and expanding the data of the training set in a mode of horizontal overturning, vertical overturning, rotating and the like.
2) Building a model classification model of the vehicle type:
the vehicle type classification module is used for completing the function of vehicle type classification by a deep learning classification algorithm, for example, a Resnet34 algorithm can be adopted, the traditional Resnet34 algorithm comprises 5 stages (Stage 0, 1, 2, 3 and 4), a global average pooling layer and a full connection layer, wherein Stage 0 comprises 1 7×7 convolution layer and 1 3×3 maximum pooling layer, stage 1-4 is respectively composed of 3, 4, 6 and 3 residual block structures, and the residual block structures are respectively composed of two 3×3 convolutions and jump connection.
In order to further relieve the influence caused by image quality and improve the accuracy of vehicle type recognition, the invention improves the traditional Resnet34 algorithm and provides a residual fusion module. Different from the traditional multi-scale residual error module, the traditional multi-scale residual error module performs characteristic extraction on the same characteristic image through convolutions with different convolution kernel sizes, so as to increase the receptive field of a model. In addition, compared with the residual block structure before improvement, the residual fusion module is added with 1 channel splice and 1 1×1 convolution layer, so that the feature images extracted by 2 3×3 convolutions in the residual block are spliced and fused in the channel dimension, and then the channel number is compressed through the 1×1 convolution layer, so that the size of the fused feature images is changed into the same size as that of the original input, and the two feature images are added through jump connection, as shown in fig. 3.
Assuming that the input of the residual fusion module is a feature map x e R (w×h×c), wherein W, H, C represents the width, height and channel number of the feature map, respectively, the output of the residual fusion module is:
y=K 3 ((K 1 (x),K 2 (K 1 (x))))+x
wherein K is 1 (·)、K 2 (. Cndot.) represents 2 3X 3 convolutions, K, respectively 3 (. Cndot.) represents a 1X 1 convolution.
In addition, the invention designs a self-attention module to acquire more global features of the vehicle image and improve the effectiveness of attention weight, and the module is added with a jump connection structure, as shown in fig. 4. After a feature map is input, the channel is expanded through 1×1 convolution, then the input feature map is weighted through two parallel MHSA modules, then the corresponding positions of the weighted feature maps calculated by the two branches are added, the channel is compressed through 1×1 convolution, finally the feature map after the channel is compressed and the input feature map are taken as output through jump connection, and the output is shown in the following formula:
y=K 2 (f 1 (K 1 (x))+f 2 (K 1 (x)))+x
where x represents the input of the self-attention module, y represents the output of the self-attention module, f 1 (·)、f 2 (. Cndot.) represents a two-branch MHSA operation, K 1 (·)、K 2 (. Cndot.) represents two 1X 1 convolutions, respectively.
3) Network model training
The images in the training set are input into the model for training, model parameters are updated continuously, so that the loss function is minimized, and the vehicle images in the testing set are input into the network, so that the model can automatically classify the vehicle model, and the training of the model is completed.
4) Vehicle model identification
The image which is obtained in the step S3 and only contains the single vehicle is input into a vehicle type classification module, and the vehicle type is classified through a trained improved Resnet34 algorithm, so that vehicle type recognition is completed.
Based on the improved residual fusion module and the self-attention module, fig. 5 shows an overall schematic diagram of a depth residual network improved in the present application.
In practical application, the training process in all the steps can be performed on a server with NVIDIA RTX2080Ti, and the deep learning network is realized based on a pytorch framework, and all the training networks use an adam optimizer.
In summary, the conventional Resnet34 algorithm is improved in step S4, and a residual fusion module is provided, so that the features extracted by convolution of each layer can be utilized more fully, the utilization rate of image features is improved, a self-attention module is designed to obtain more global features of a vehicle image, the effectiveness of attention weight is improved, the influence caused by image quality is further relieved, and the accuracy of vehicle type recognition is improved.
In step S2, the acquired vehicle image is subjected to an image preprocessing method, namely, the acquired low-quality image is converted into a high-quality image through a deep learning super-resolution reconstruction algorithm, and then the image is positioned and the vehicle type is identified, so that the accuracy of a subsequent positioning and classifying model is improved, and the false identification rate of the vehicle type is reduced.
Based on the above method embodiment, the embodiment of the present application further provides a vehicle type recognition device based on an improved depth residual error network, as shown in fig. 6, the device mainly includes the following parts:
the image enhancement processing module 610 is configured to obtain an image of a vehicle to be identified, and perform image enhancement processing on the image of the vehicle to be identified through a pre-trained image quality enhancement model to obtain an image of a target vehicle;
the vehicle detection positioning module 620 is configured to perform vehicle detection positioning processing on the target vehicle image through a pre-trained vehicle detection positioning model, so as to obtain a vehicle sub-graph in the target vehicle image;
the vehicle type recognition module 630 is configured to perform classification recognition on each vehicle sub-graph in the image through a vehicle type classification model, and determine a vehicle type; the vehicle type classification model is an improved depth residual error network, and the improved depth residual error comprises a target residual error fusion module which is used for performing channel splicing on the feature images before and after convolution in the channel dimension.
In a possible implementation, the pre-trained image quality enhancement model includes a low-level feature information extraction layer, a high-level feature information extraction layer, and an upsampling layer;
The low-level characteristic information extraction layer comprises a convolution layer with the convolution kernel size of 1 multiplied by 1, the step length of 1 and the filling of 1, and is used for extracting low-frequency information in an image; the high-layer feature extraction layer comprises a plurality of residual block structures with BN layers removed, wherein each residual block structure consists of two convolution layers, a Relu layer and jump connection and is used for extracting high-frequency information in an image; the upsampling layer includes deconvolution.
In a possible embodiment, the above device further comprises: training module for:
labeling the target vehicle images, marking the vehicle position in each target vehicle image, and generating an XML format labeling file;
dividing training samples of the marked XML format mark files according to a preset proportion to obtain training sets and test sets;
and inputting the vehicle images in the training set into the vehicle detection positioning model for training, continuously updating the model parameters to minimize the loss function, inputting the vehicle images in the testing set into the vehicle detection positioning model after updating the model parameters, and completing the training of the vehicle detection positioning model when the preset recognition accuracy rate is reached.
In a possible implementation, the vehicle detection positioning model comprises a feature extraction network, a region suggestion network, a region of interest pooling module and a prediction module;
The regional suggestion network is of a double-branch structure formed by convolution, and is used for carrying out feature extraction operation on the feature images through a convolution sliding window with a preset convolution kernel size after the feature images extracted by the feature extraction network are obtained, so as to obtain a feature image with a preset dimension, generating a plurality of anchor frames on the feature image by taking the central point of each window as a coordinate, and then carrying out convolution operation on the feature image with the preset dimension respectively to obtain classification scores and coordinate parameters; judging the generated anchor frames according to the classification score, determining whether the anchor frames contain vehicles or not, and determining the confidence level of the vehicles contained in the anchor frames, so as to screen the anchor frames, and finally obtaining candidate frames containing the vehicles;
the input of the interesting region pooling module comprises a first input unit and a second input unit, wherein the first input unit inputs a feature map of the whole image obtained through a feature extraction network, and the second input unit inputs a candidate frame obtained after the processing of the regional suggestion network;
the prediction module is composed of a fully connected layer.
In a possible implementation manner, the target residual error fusion module comprises a preset number of first convolution layers, a channel splicing layer and a preset number of second convolution layers;
When the input of the target residual error fusion module is a feature mapWherein W, H, C represents the width, height and channel number of the feature map respectively, the output of the residual fusion module is:
y=K m ((K 1 (x),K 2 (K 1 (x)),......K n (K n-1 (K......(K 1 (x))))))+x
wherein K is 1 (·)、K 2 (·)……K n (. Cndot.) represents n first convolution layers, K m (·) represents m second convolution layers.
In a possible embodiment, the improved depth residual network further comprises a target self-attention module comprising a jump connection structure.
In one possible implementation, the target self-attention module is to:
after a feature map is input, channel expansion is performed through a convolution layer, then the input feature map is weighted through two parallel multi-head self-attention mechanism modules, the corresponding positions of the weighted feature maps calculated by the two branches are added, the channel is laminated through the convolution layer, finally the feature map after the channel is compressed and the input feature map are taken as output through a jump connection structure, and the output is shown in the following formula:
y=K 2 (f 1 (K 1 (x))+f 2 (K 1 (x)))+x
where x represents the input of the self-attention module, y represents the output of the self-attention module, f 1 (·)、f 2 (. Cndot.) represents a two-branch multi-headed self-attention mechanism operation, K 1 (·)、K 2 (. Cndot.) represents two convolution layers, respectively.
The vehicle type recognition device based on the improved depth residual network provided in the embodiment of the present application has the same implementation principle and technical effects as those of the foregoing method embodiment, and for a brief description, reference may be made to corresponding contents in the foregoing vehicle type recognition method embodiment based on the improved depth residual network, where the embodiment part of the vehicle type recognition device based on the improved depth residual network is not mentioned.
The embodiment of the present application further provides an electronic device, as shown in fig. 7, which is a schematic structural diagram of the electronic device, where the electronic device 100 includes a processor 71 and a memory 70, where the memory 70 stores computer executable instructions that can be executed by the processor 71, and the processor 71 executes the computer executable instructions to implement any one of the above-mentioned vehicle type recognition methods based on the improved depth residual network.
In the embodiment shown in fig. 7, the electronic device further comprises a bus 72 and a communication interface 73, wherein the processor 71, the communication interface 73 and the memory 70 are connected by the bus 72.
The memory 70 may include a high-speed random access memory (RAM, random Access Memory), and may further include a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory. The communication connection between the system network element and the at least one other network element is achieved via at least one communication interface 73 (which may be wired or wireless), which may use the internet, a wide area network, a local network, a metropolitan area network, etc. Bus 72 may be an ISA (Industry Standard Architecture ) bus, PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus, or EISA (Extended Industry Standard Architecture ) bus, among others. The bus 72 may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, only one bi-directional arrow is shown in FIG. 7, but not only one bus or type of bus.
The processor 71 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuitry in hardware or instructions in software in the processor 71. The processor 71 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processor, DSP for short), application specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), field-programmable gate arrays (Field-Programmable Gate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor 71 reads the information in the memory, and in combination with its hardware, performs the steps of the vehicle model identification method based on the improved depth residual network of the foregoing embodiment.
The embodiment of the application further provides a computer readable storage medium, where the computer readable storage medium stores computer executable instructions, where the computer executable instructions, when being called and executed by a processor, cause the processor to implement the vehicle type identification method based on the improved depth residual error network, and the specific implementation can refer to the foregoing method embodiment and will not be repeated herein.
The computer program product of the vehicle type recognition method, device and equipment based on the improved depth residual network provided in the embodiments of the present application includes a computer readable storage medium storing program codes, where the instructions included in the program codes may be used to execute the method described in the foregoing method embodiment, and specific implementation may refer to the method embodiment and will not be repeated herein.
The relative steps, numerical expressions and numerical values of the components and steps set forth in these embodiments do not limit the scope of the present application unless specifically stated otherwise.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In the description of the present application, the terms "first," "second," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.
In the description of the present application, it should also be noted that, unless explicitly specified and limited otherwise, the terms "disposed," "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the terms in this application will be understood by those of ordinary skill in the art in a specific context.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present application.

Claims (10)

1. A vehicle model identification method based on an improved depth residual error network, the method comprising:
acquiring a vehicle image to be identified, and performing image enhancement processing on the vehicle image to be identified through a pre-trained image quality enhancement model to obtain a target vehicle image;
performing vehicle detection and positioning processing on the target vehicle image through a pre-trained vehicle detection and positioning model to obtain a vehicle sub-graph in the target vehicle image;
classifying and identifying each vehicle sub-graph in the image through a vehicle type classification model, and determining the type of the vehicle; the vehicle type classification model is an improved depth residual error network, the improved depth residual error comprises a target residual error fusion module, and the target residual error fusion module is used for performing channel splicing on feature images before and after convolution in a channel dimension.
2. The improved depth residual network-based vehicle model recognition method of claim 1, wherein the pre-trained image quality enhancement model comprises a low-level feature information extraction layer, a high-level feature information extraction layer, and an upsampling layer;
the low-level characteristic information extraction layer comprises a convolution layer with the convolution kernel size of 1 multiplied by 1, the step length of 1 and the filling of 1, and is used for extracting low-frequency information in an image; the high-layer feature extraction layer comprises a plurality of residual block structures with BN layers removed, wherein the residual block structures are formed by two convolution layers, a Relu layer and jump connection and are used for extracting high-frequency information in an image; the upsampling layer includes deconvolution.
3. The improved depth residual network based vehicle model identification method of claim 1, wherein the training step of the pre-trained vehicle detection positioning model comprises:
marking the target vehicle images, marking the vehicle position in each target vehicle image, and generating an XML format marking file;
dividing training samples of the marked XML format mark files according to a preset proportion to obtain training sets and test sets;
and inputting the vehicle images in the training set into a vehicle detection positioning model for training, continuously updating model parameters to minimize a loss function, inputting the vehicle images in the testing set into the vehicle detection positioning model after updating the model parameters, and completing the training of the vehicle detection positioning model when the preset recognition accuracy rate is reached.
4. The improved depth residual network based vehicle model identification method of claim 3, wherein the vehicle detection positioning model comprises a feature extraction network, a region suggestion network, a region of interest pooling module and a prediction module;
the regional suggestion network is of a double-branch structure formed by convolution, and is used for carrying out feature extraction operation on the feature images through a convolution sliding window with a preset convolution kernel size after the feature images extracted by the feature extraction network are obtained, so as to obtain a feature image with a preset dimension, generating a plurality of anchor frames on the feature image by taking the central point of each window as a coordinate, and then carrying out convolution operation on the feature image with the preset dimension respectively to obtain classification scores and coordinate parameters; judging the generated anchor frames according to the classification scores, determining whether the anchor frames contain vehicles or not, and determining the confidence level of the vehicles contained in the anchor frames, so as to screen the anchor frames, and finally obtaining candidate frames containing the vehicles;
The input of the region of interest pooling module comprises a first input unit and a second input unit, wherein the first input unit inputs a feature map of the whole image obtained through a feature extraction network, and the second input unit inputs a candidate frame obtained after the region suggestion network processing;
the prediction module is composed of a full connection layer.
5. The vehicle model identification method based on the improved depth residual network according to claim 1, wherein the target residual fusion module comprises a preset number of first convolution layers, a channel splicing layer and a preset number of second convolution layers;
when the input of the target residual fusion module is a feature graph x epsilon R (W×H×C) Wherein W, H, C represents the width, height and channel number of the feature map, respectively, the output of the residual fusion module is:
y=K m ((K 1 (x),K 2 (K 1 (x)),……K n (K n-1 (K……(K 1 (x))))))+x
wherein K is 1 (·)、K 2 (·)……K n (. Cndot.) represents n first convolution layers, K m (·) represents m second convolution layers.
6. The improved depth residual network based vehicle model identification method of claim 1, further comprising a target self-attention module comprising a jump connection structure.
7. The improved depth residual network based vehicle model identification method of claim 6, wherein the target self-attention module is configured to:
after a feature map is input, channel expansion is performed through a convolution layer, then the input feature map is weighted through two parallel multi-head self-attention mechanism modules, the corresponding positions of the weighted feature maps calculated by the two branches are added, the channel is laminated through the convolution layer, finally the feature map after the channel is compressed and the input feature map are taken as output through a jump connection structure, and the output is shown in the following formula:
y=K 2 (f 1 (K 1 (x))+f 2 (K 1 (x)))+x
where x represents the input of the self-attention module, y represents the output of the self-attention module, f 1 (·)、f 2 (. Cndot.) represents a two-branch multi-headed self-attention mechanism operation, K 1 (·)、K 2 (. Cndot.) represents two convolution layers, respectively.
8. A vehicle model identification device based on an improved depth residual network, the device comprising:
the image enhancement processing module is used for acquiring a vehicle image to be identified, and carrying out image enhancement processing on the vehicle image to be identified through a pre-trained image quality enhancement model to obtain a target vehicle image;
the vehicle detection positioning module is used for carrying out vehicle detection positioning processing on the target vehicle image through a pre-trained vehicle detection positioning model to obtain a vehicle sub-image in the target vehicle image;
The vehicle type recognition module is used for classifying and recognizing each vehicle sub-image in the image through the vehicle type classification model to determine the type of the vehicle; the vehicle type classification model is an improved depth residual error network, the improved depth residual error comprises a target residual error fusion module, and the target residual error fusion module is used for performing channel splicing on feature images before and after convolution in a channel dimension.
9. An electronic device comprising a processor and a memory, the memory storing computer executable instructions executable by the processor to implement the improved depth residual network-based vehicle model identification method of any one of claims 1 to 7.
10. A computer readable storage medium storing computer executable instructions which, when invoked and executed by a processor, cause the processor to implement the improved depth residual network based vehicle model identification method of any one of claims 1 to 7.
CN202311777578.3A 2023-12-21 2023-12-21 Vehicle type recognition method, device and equipment based on improved depth residual error network Pending CN117765485A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311777578.3A CN117765485A (en) 2023-12-21 2023-12-21 Vehicle type recognition method, device and equipment based on improved depth residual error network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311777578.3A CN117765485A (en) 2023-12-21 2023-12-21 Vehicle type recognition method, device and equipment based on improved depth residual error network

Publications (1)

Publication Number Publication Date
CN117765485A true CN117765485A (en) 2024-03-26

Family

ID=90319493

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311777578.3A Pending CN117765485A (en) 2023-12-21 2023-12-21 Vehicle type recognition method, device and equipment based on improved depth residual error network

Country Status (1)

Country Link
CN (1) CN117765485A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118097321A (en) * 2024-04-29 2024-05-28 济南大学 Vehicle image enhancement method and system based on CNN and transducer

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118097321A (en) * 2024-04-29 2024-05-28 济南大学 Vehicle image enhancement method and system based on CNN and transducer

Similar Documents

Publication Publication Date Title
CN110738125B (en) Method, device and storage medium for selecting detection frame by Mask R-CNN
CN109840556B (en) Image classification and identification method based on twin network
CN111681273B (en) Image segmentation method and device, electronic equipment and readable storage medium
CN111027493B (en) Pedestrian detection method based on deep learning multi-network soft fusion
CN112132156B (en) Image saliency target detection method and system based on multi-depth feature fusion
JP6129987B2 (en) Text quality based feedback to improve OCR
CN112348787B (en) Training method of object defect detection model, object defect detection method and device
CN112329881B (en) License plate recognition model training method, license plate recognition method and device
CN110781980B (en) Training method of target detection model, target detection method and device
CN114266794B (en) Pathological section image cancer region segmentation system based on full convolution neural network
CN111783819B (en) Improved target detection method based on region of interest training on small-scale data set
CN111914654B (en) Text layout analysis method, device, equipment and medium
CN113052170B (en) Small target license plate recognition method under unconstrained scene
CN113343985B (en) License plate recognition method and device
CN113807361B (en) Neural network, target detection method, neural network training method and related products
CN112364873A (en) Character recognition method and device for curved text image and computer equipment
CN117765485A (en) Vehicle type recognition method, device and equipment based on improved depth residual error network
WO2021147055A1 (en) Systems and methods for video anomaly detection using multi-scale image frame prediction network
CN115131797A (en) Scene text detection method based on feature enhancement pyramid network
CN111724342A (en) Method for detecting thyroid nodule in ultrasonic image
CN112597918A (en) Text detection method and device, electronic equipment and storage medium
CN112800978A (en) Attribute recognition method, and training method and device for part attribute extraction network
CN112052907A (en) Target detection method and device based on image edge information and storage medium
CN112183542A (en) Text image-based recognition method, device, equipment and medium
CN111435445A (en) Training method and device of character recognition model and character recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination