CN114298909A

CN114298909A - Super-resolution network model and application thereof

Info

Publication number: CN114298909A
Application number: CN202111654294.6A
Authority: CN
Inventors: 李波; 杭陶阳; 赵齐贤; 周丹; 周鑫烨; 高陈诚; 朱芸海; 倪曦; 吴凡; 贝绍轶
Original assignee: Jiangsu University of Technology
Current assignee: Jiangsu University of Technology
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2022-04-08

Abstract

The invention belongs to the field of intelligent traffic automatic driving, and discloses a super-resolution network model and a method for performing semantic segmentation on a road by using the super-resolution network, wherein the super-resolution network model comprises two modules, namely: a low resolution convolution model; and a second module: a high resolution convolution model. The module convolves the low-resolution images, and the module convolves the high-resolution images to make up for data lost by the low-resolution images. Finally, a network model for segmenting the road can be obtained. And then training the network model, and giving out an optimizer, a learning rate, a training turn, an iteration turn and a loss function. The super-resolution network provided by the invention carries out semantic segmentation on the picture, creates innovation on the past network structure and effectively improves the accurate value of prediction.

Description

Super-resolution network model and application thereof

Technical Field

The invention belongs to the field of intelligent traffic automatic driving, and particularly relates to a super-resolution network model and application thereof.

Background

Semantic segmentation is a basic computer vision task, aims to classify each pixel point in a picture, and is widely applied to the fields of intelligent driving, medical imaging, pose analysis and the like. In the field of intelligent driving, semantic segmentation needs to maintain high accuracy and real-time detection, but a high-precision network cannot be used in applications with limited hardware facilities, and the identification delay is large. The classical semantic segmentation network usually needs to use a high-resolution atlas for training to achieve high accuracy, and the high-resolution picture can effectively transfer features in the picture and is convenient for network learning, so that the high-resolution features are very important in the high-precision network. Currently, there are two main lines to maintain the high resolution representation, one is to use hole convolution to maintain the high resolution features, and the other is by combining top-down paths and cross-connects. However, these methods consume very much computing resources, and based on this, the high resolution pictures are used as input, which further increases the amount of network computation and increases the picture segmentation time.

Disclosure of Invention

In order to solve the problems, the invention provides a super-resolution network model and application thereof. According to the method, a high-resolution convolution network is added into a super-resolution network, the high-resolution image convolution is used for making up semantic information missing from a low-resolution image, and a Transform is used for acquiring a receptive field larger than that of a common convolution.

The invention adopts the following specific technical scheme:

a super-resolution network model based on is specifically designed to include: the convolution model comprises a low-resolution convolution model and a high-resolution convolution model, wherein the low-resolution convolution model consists of a CSPDarknet53 backbone network, a Transform structure and a super-resolution network, and the high-resolution convolution model consists of a small number of simple convolution layers and is then transmitted into the super-resolution network of the low-resolution convolution model.

In the further improvement of the invention, the low-resolution convolution model inputs the RGB image after 2 times of down-sampling of the original resolution: firstly, an RGB image with the original resolution subjected to 2-time down-sampling is transmitted into a CSPDarknet53 backbone network to generate a feature map with the dimension of 512 and the resolution of 45 x 60, the feature map is converted into a feature layer with the dimension of 32 and the resolution of 45 x 60 again after a Transform structure, and the feature map with the dimension of 32 and the resolution of 720 x 960 is generated through convolution of the high-resolution convolution model.

The application based on the super-resolution network model comprises the following steps: a road picture semantic segmentation method based on a super-resolution network model specifically comprises the following steps:

s1: designing and generating a super-resolution network model;

s2: generating processing data and inputting the data into a designed super-resolution network model;

s3: training by utilizing input data to generate a super-resolution network;

s4: verifying the super-resolution network model trained in the step S3 by using verification data; if the verification result is ideal, this network model may be used for road semantic segmentation, otherwise S3 is continued until the verification result is ideal.

The specific implementation of the step S2 includes the following steps:

s2-1: collecting a batch of high-resolution atlas;

s2-2: the target object is labeled on the atlas and a specific category is given.

The specific implementation of the S3 includes the following steps:

s3-1: inputting an original resolution picture as a high-resolution convolution model, and performing 2-time down-sampling on the original resolution picture as a low-resolution convolution model;

s3-2, training a super-resolution network model: the model is built by using the pyrrch frame as a code frame, and the method comprises the following specific steps:

(1) data acquisition: after data are collected and processed in S2, data collected by a Dataloader in the pyrrch are transmitted into the model, the batch is selected as 16, and 4 threads are selected for parallel reading;

(2) designing an optimizer and a super parameter: the generator and the discriminator both select an Adam optimizer as a model optimizer, the training round is set to be 30 rounds, model parameters are saved once in each training round, the learning rate is set to be exponentially reduced, the initial learning rate is 0.002, and the learning rate in each round is set as follows:

lr(epoch)＝0.002×0.8 epoch^-1

(3) designing a loss function: BCE loss and CE loss functions are adopted as network loss functions.

The invention has the beneficial effects that:

(1) the method uses the super-resolution network to carry out semantic segmentation on the road image, creates innovation on the past network structure, and effectively improves the accuracy of prediction;

(2) the method is characterized in that a high-resolution convolution module is added on the basis of the original super-resolution network, the advantage of extracting plane features by using a convolution neural network is utilized, a Transform mechanism is added to expand the receptive field, and semantic information which disappears due to the reduction of resolution is made up;

(3) the invention provides a training method of a neural network model, provides the necessary hyper-parameters for the suggested neural network training, and leads the model training to obtain more accurate results under the condition of ensuring the training speed of the model.

Drawings

FIG. 1 is a road semantic segmentation model based on a super-resolution network according to the present invention.

Detailed Description

For the purpose of enhancing the understanding of the present invention, the present invention will be described in further detail with reference to the accompanying drawings and examples, which are provided for the purpose of illustration only and are not intended to limit the scope of the present invention.

The invention will be further explained with reference to the drawings.

Example (b): a road semantic segmentation method based on a super-resolution network is disclosed, and the flow for realizing the method comprises the following steps:

s1: designing and generating a super-resolution network model;

s2: generating processing data and inputting the data into a super-resolution network model;

s3: training a super-resolution network model by using input data;

s4: the generated confrontation network model trained in step S2 is verified using the verification data.

The embodiments will be specifically described below.

The super-resolution network model designed in step S1 is shown in fig. 1, and the network model is specifically designed as follows:

the super-resolution network model overall structure is shown in fig. 1 and comprises two modules. A first module: a low resolution convolution model; and a second module: and (4) obtaining a semantic segmentation result which is similar to the real segmentation height for a high-resolution convolution model. And splicing the outputs of the two networks and finally outputting. The following is a detailed description of each module.

1) Low resolution convolution model

The low-resolution convolution consists of a CSPDarknet53 backbone network, a Transform structure and a super-resolution network; inputting an RGB image with the original resolution ratio after 2 times of downsampling;

firstly, an RGB image with the original resolution subjected to 2-time down-sampling is transmitted into a CSPDarknet53 backbone network to generate a feature map with the dimension of 512 and the resolution of 45 x 60, and the feature map is converted into a feature layer with the dimension of 32 and the resolution of 45 x 60 again after being subjected to a Transform structure.

2) High resolution convolution model

The high-resolution convolution model consists of a small number of simple convolution layers, and is transmitted into a super-resolution network of the low-resolution convolution model, and an RGB image of the original resolution is input; the feature map with dimension of 32 resolution of 720 x 960 is generated through convolution.

3) Decoder

Recovering the output of the low-resolution convolution module to the resolution of the original image by using a super-resolution technology, wherein two super-resolution modules are used for training network parameters together during training; pruning was performed during the test, leaving one branch.

4) argmax layer design:

and splicing the output of the low-resolution convolution network and the output of the high-resolution convolution network, adjusting the number of channels through a 1 x 1 convolution kernel, and outputting the output through the category corresponding to the argmax output pixel point.

The specific implementation steps of S2 include the following:

s2-1: collecting a batch of high-resolution atlas;

s2-2: marking the target object on the atlas and giving out a specific category;

s2-3: and downsampling the original image resolution picture as the input of a low-resolution convolution network, and taking the original image resolution picture as the input of a high-resolution convolution network.

The specific implementation steps of S3 include the following:

lr(epoch)＝0.002×0.8 epoch^-1

(3) designing a loss function: using BCE loss and CE loss functions as network loss functions

The above-listed series of detailed descriptions are merely specific illustrations of possible embodiments of the present invention, and they are not intended to limit the scope of the present invention, and all equivalent means or modifications that do not depart from the technical spirit of the present invention are intended to be included within the scope of the present invention.

Claims

1. A super-resolution network model is characterized by comprising a low-resolution convolution model and a high-resolution convolution model, wherein the low-resolution convolution model is composed of a CSPDarknet53 backbone network, a Transform structure and a super-resolution network, and the high-resolution convolution model is composed of a simple and small number of convolution layers and then is transmitted into the super-resolution network of the low-resolution convolution model.

2. The super resolution network model of claim 1, wherein the low resolution convolution model inputs RGB images after 2 times down-sampling of original resolution: firstly, an RGB image with the original resolution subjected to 2-time down-sampling is transmitted into a CSPDarknet53 backbone network to generate a feature map with the dimension of 512 and the resolution of 45 x 60, the feature map is converted into a feature layer with the dimension of 32 and the resolution of 45 x 60 again after a Transform structure, and the feature map with the dimension of 32 and the resolution of 720 x 960 is generated through convolution of the high-resolution convolution model.

3. A road picture semantic segmentation method based on a super-resolution network model is characterized by comprising the following steps:

s1: designing and generating a super-resolution network model;

s3: training by utilizing input data to generate a super-resolution network;

4. The road picture semantic segmentation method based on the super-resolution network model of claim 3, wherein the specific implementation of the step S2 includes the following steps:

s2-1: collecting a batch of high-resolution atlas;

5. The road picture semantic segmentation method based on the super-resolution network model according to claim 3, wherein the specific implementation of S3 includes the following steps:

lr(epoch)＝0.002×0.8 epoch^-1