CN114298909A - Super-resolution network model and application thereof - Google Patents

Super-resolution network model and application thereof Download PDF

Info

Publication number
CN114298909A
CN114298909A CN202111654294.6A CN202111654294A CN114298909A CN 114298909 A CN114298909 A CN 114298909A CN 202111654294 A CN202111654294 A CN 202111654294A CN 114298909 A CN114298909 A CN 114298909A
Authority
CN
China
Prior art keywords
resolution
super
model
network
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111654294.6A
Other languages
Chinese (zh)
Inventor
李波
杭陶阳
赵齐贤
周丹
周鑫烨
高陈诚
朱芸海
倪曦
吴凡
贝绍轶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University of Technology
Original Assignee
Jiangsu University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University of Technology filed Critical Jiangsu University of Technology
Priority to CN202111654294.6A priority Critical patent/CN114298909A/en
Publication of CN114298909A publication Critical patent/CN114298909A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention belongs to the field of intelligent traffic automatic driving, and discloses a super-resolution network model and a method for performing semantic segmentation on a road by using the super-resolution network, wherein the super-resolution network model comprises two modules, namely: a low resolution convolution model; and a second module: a high resolution convolution model. The module convolves the low-resolution images, and the module convolves the high-resolution images to make up for data lost by the low-resolution images. Finally, a network model for segmenting the road can be obtained. And then training the network model, and giving out an optimizer, a learning rate, a training turn, an iteration turn and a loss function. The super-resolution network provided by the invention carries out semantic segmentation on the picture, creates innovation on the past network structure and effectively improves the accurate value of prediction.

Description

Super-resolution network model and application thereof
Technical Field
The invention belongs to the field of intelligent traffic automatic driving, and particularly relates to a super-resolution network model and application thereof.
Background
Semantic segmentation is a basic computer vision task, aims to classify each pixel point in a picture, and is widely applied to the fields of intelligent driving, medical imaging, pose analysis and the like. In the field of intelligent driving, semantic segmentation needs to maintain high accuracy and real-time detection, but a high-precision network cannot be used in applications with limited hardware facilities, and the identification delay is large. The classical semantic segmentation network usually needs to use a high-resolution atlas for training to achieve high accuracy, and the high-resolution picture can effectively transfer features in the picture and is convenient for network learning, so that the high-resolution features are very important in the high-precision network. Currently, there are two main lines to maintain the high resolution representation, one is to use hole convolution to maintain the high resolution features, and the other is by combining top-down paths and cross-connects. However, these methods consume very much computing resources, and based on this, the high resolution pictures are used as input, which further increases the amount of network computation and increases the picture segmentation time.
Disclosure of Invention
In order to solve the problems, the invention provides a super-resolution network model and application thereof. According to the method, a high-resolution convolution network is added into a super-resolution network, the high-resolution image convolution is used for making up semantic information missing from a low-resolution image, and a Transform is used for acquiring a receptive field larger than that of a common convolution.
The invention adopts the following specific technical scheme:
a super-resolution network model based on is specifically designed to include: the convolution model comprises a low-resolution convolution model and a high-resolution convolution model, wherein the low-resolution convolution model consists of a CSPDarknet53 backbone network, a Transform structure and a super-resolution network, and the high-resolution convolution model consists of a small number of simple convolution layers and is then transmitted into the super-resolution network of the low-resolution convolution model.
In the further improvement of the invention, the low-resolution convolution model inputs the RGB image after 2 times of down-sampling of the original resolution: firstly, an RGB image with the original resolution subjected to 2-time down-sampling is transmitted into a CSPDarknet53 backbone network to generate a feature map with the dimension of 512 and the resolution of 45 x 60, the feature map is converted into a feature layer with the dimension of 32 and the resolution of 45 x 60 again after a Transform structure, and the feature map with the dimension of 32 and the resolution of 720 x 960 is generated through convolution of the high-resolution convolution model.
The application based on the super-resolution network model comprises the following steps: a road picture semantic segmentation method based on a super-resolution network model specifically comprises the following steps:
s1: designing and generating a super-resolution network model;
s2: generating processing data and inputting the data into a designed super-resolution network model;
s3: training by utilizing input data to generate a super-resolution network;
s4: verifying the super-resolution network model trained in the step S3 by using verification data; if the verification result is ideal, this network model may be used for road semantic segmentation, otherwise S3 is continued until the verification result is ideal.
The specific implementation of the step S2 includes the following steps:
s2-1: collecting a batch of high-resolution atlas;
s2-2: the target object is labeled on the atlas and a specific category is given.
The specific implementation of the S3 includes the following steps:
s3-1: inputting an original resolution picture as a high-resolution convolution model, and performing 2-time down-sampling on the original resolution picture as a low-resolution convolution model;
s3-2, training a super-resolution network model: the model is built by using the pyrrch frame as a code frame, and the method comprises the following specific steps:
(1) data acquisition: after data are collected and processed in S2, data collected by a Dataloader in the pyrrch are transmitted into the model, the batch is selected as 16, and 4 threads are selected for parallel reading;
(2) designing an optimizer and a super parameter: the generator and the discriminator both select an Adam optimizer as a model optimizer, the training round is set to be 30 rounds, model parameters are saved once in each training round, the learning rate is set to be exponentially reduced, the initial learning rate is 0.002, and the learning rate in each round is set as follows:
lr(epoch)=0.002×0.8 epoch-1
(3) designing a loss function: BCE loss and CE loss functions are adopted as network loss functions.
The invention has the beneficial effects that:
(1) the method uses the super-resolution network to carry out semantic segmentation on the road image, creates innovation on the past network structure, and effectively improves the accuracy of prediction;
(2) the method is characterized in that a high-resolution convolution module is added on the basis of the original super-resolution network, the advantage of extracting plane features by using a convolution neural network is utilized, a Transform mechanism is added to expand the receptive field, and semantic information which disappears due to the reduction of resolution is made up;
(3) the invention provides a training method of a neural network model, provides the necessary hyper-parameters for the suggested neural network training, and leads the model training to obtain more accurate results under the condition of ensuring the training speed of the model.
Drawings
FIG. 1 is a road semantic segmentation model based on a super-resolution network according to the present invention.
Detailed Description
For the purpose of enhancing the understanding of the present invention, the present invention will be described in further detail with reference to the accompanying drawings and examples, which are provided for the purpose of illustration only and are not intended to limit the scope of the present invention.
The invention will be further explained with reference to the drawings.
Example (b): a road semantic segmentation method based on a super-resolution network is disclosed, and the flow for realizing the method comprises the following steps:
s1: designing and generating a super-resolution network model;
s2: generating processing data and inputting the data into a super-resolution network model;
s3: training a super-resolution network model by using input data;
s4: the generated confrontation network model trained in step S2 is verified using the verification data.
The embodiments will be specifically described below.
The super-resolution network model designed in step S1 is shown in fig. 1, and the network model is specifically designed as follows:
the super-resolution network model overall structure is shown in fig. 1 and comprises two modules. A first module: a low resolution convolution model; and a second module: and (4) obtaining a semantic segmentation result which is similar to the real segmentation height for a high-resolution convolution model. And splicing the outputs of the two networks and finally outputting. The following is a detailed description of each module.
1) Low resolution convolution model
The low-resolution convolution consists of a CSPDarknet53 backbone network, a Transform structure and a super-resolution network; inputting an RGB image with the original resolution ratio after 2 times of downsampling;
firstly, an RGB image with the original resolution subjected to 2-time down-sampling is transmitted into a CSPDarknet53 backbone network to generate a feature map with the dimension of 512 and the resolution of 45 x 60, and the feature map is converted into a feature layer with the dimension of 32 and the resolution of 45 x 60 again after being subjected to a Transform structure.
2) High resolution convolution model
The high-resolution convolution model consists of a small number of simple convolution layers, and is transmitted into a super-resolution network of the low-resolution convolution model, and an RGB image of the original resolution is input; the feature map with dimension of 32 resolution of 720 x 960 is generated through convolution.
3) Decoder
Recovering the output of the low-resolution convolution module to the resolution of the original image by using a super-resolution technology, wherein two super-resolution modules are used for training network parameters together during training; pruning was performed during the test, leaving one branch.
4) argmax layer design:
and splicing the output of the low-resolution convolution network and the output of the high-resolution convolution network, adjusting the number of channels through a 1 x 1 convolution kernel, and outputting the output through the category corresponding to the argmax output pixel point.
The specific implementation steps of S2 include the following:
s2-1: collecting a batch of high-resolution atlas;
s2-2: marking the target object on the atlas and giving out a specific category;
s2-3: and downsampling the original image resolution picture as the input of a low-resolution convolution network, and taking the original image resolution picture as the input of a high-resolution convolution network.
The specific implementation steps of S3 include the following:
s3-1: inputting an original resolution picture as a high-resolution convolution model, and performing 2-time down-sampling on the original resolution picture as a low-resolution convolution model;
s3-2, training a super-resolution network model: the model is built by using the pyrrch frame as a code frame, and the method comprises the following specific steps:
(1) data acquisition: after data are collected and processed in S2, data collected by a Dataloader in the pyrrch are transmitted into the model, the batch is selected as 16, and 4 threads are selected for parallel reading;
(2) designing an optimizer and a super parameter: the generator and the discriminator both select an Adam optimizer as a model optimizer, the training round is set to be 30 rounds, model parameters are saved once in each training round, the learning rate is set to be exponentially reduced, the initial learning rate is 0.002, and the learning rate in each round is set as follows:
lr(epoch)=0.002×0.8 epoch-1
(3) designing a loss function: using BCE loss and CE loss functions as network loss functions
The above-listed series of detailed descriptions are merely specific illustrations of possible embodiments of the present invention, and they are not intended to limit the scope of the present invention, and all equivalent means or modifications that do not depart from the technical spirit of the present invention are intended to be included within the scope of the present invention.

Claims (5)

1. A super-resolution network model is characterized by comprising a low-resolution convolution model and a high-resolution convolution model, wherein the low-resolution convolution model is composed of a CSPDarknet53 backbone network, a Transform structure and a super-resolution network, and the high-resolution convolution model is composed of a simple and small number of convolution layers and then is transmitted into the super-resolution network of the low-resolution convolution model.
2. The super resolution network model of claim 1, wherein the low resolution convolution model inputs RGB images after 2 times down-sampling of original resolution: firstly, an RGB image with the original resolution subjected to 2-time down-sampling is transmitted into a CSPDarknet53 backbone network to generate a feature map with the dimension of 512 and the resolution of 45 x 60, the feature map is converted into a feature layer with the dimension of 32 and the resolution of 45 x 60 again after a Transform structure, and the feature map with the dimension of 32 and the resolution of 720 x 960 is generated through convolution of the high-resolution convolution model.
3. A road picture semantic segmentation method based on a super-resolution network model is characterized by comprising the following steps:
s1: designing and generating a super-resolution network model;
s2: generating processing data and inputting the data into a designed super-resolution network model;
s3: training by utilizing input data to generate a super-resolution network;
s4: verifying the super-resolution network model trained in the step S3 by using verification data; if the verification result is ideal, this network model may be used for road semantic segmentation, otherwise S3 is continued until the verification result is ideal.
4. The road picture semantic segmentation method based on the super-resolution network model of claim 3, wherein the specific implementation of the step S2 includes the following steps:
s2-1: collecting a batch of high-resolution atlas;
s2-2: the target object is labeled on the atlas and a specific category is given.
5. The road picture semantic segmentation method based on the super-resolution network model according to claim 3, wherein the specific implementation of S3 includes the following steps:
s3-1: inputting an original resolution picture as a high-resolution convolution model, and performing 2-time down-sampling on the original resolution picture as a low-resolution convolution model;
s3-2, training a super-resolution network model: the model is built by using the pyrrch frame as a code frame, and the method comprises the following specific steps:
(1) data acquisition: after data are collected and processed in S2, data collected by a Dataloader in the pyrrch are transmitted into the model, the batch is selected as 16, and 4 threads are selected for parallel reading;
(2) designing an optimizer and a super parameter: the generator and the discriminator both select an Adam optimizer as a model optimizer, the training round is set to be 30 rounds, model parameters are saved once in each training round, the learning rate is set to be exponentially reduced, the initial learning rate is 0.002, and the learning rate in each round is set as follows:
lr(epoch)=0.002×0.8 epoch-1
(3) designing a loss function: BCE loss and CE loss functions are adopted as network loss functions.
CN202111654294.6A 2021-12-31 2021-12-31 Super-resolution network model and application thereof Pending CN114298909A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111654294.6A CN114298909A (en) 2021-12-31 2021-12-31 Super-resolution network model and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111654294.6A CN114298909A (en) 2021-12-31 2021-12-31 Super-resolution network model and application thereof

Publications (1)

Publication Number Publication Date
CN114298909A true CN114298909A (en) 2022-04-08

Family

ID=80973967

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111654294.6A Pending CN114298909A (en) 2021-12-31 2021-12-31 Super-resolution network model and application thereof

Country Status (1)

Country Link
CN (1) CN114298909A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117196957A (en) * 2023-11-03 2023-12-08 广东省电信规划设计院有限公司 Image resolution conversion method and device based on artificial intelligence

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117196957A (en) * 2023-11-03 2023-12-08 广东省电信规划设计院有限公司 Image resolution conversion method and device based on artificial intelligence
CN117196957B (en) * 2023-11-03 2024-03-22 广东省电信规划设计院有限公司 Image resolution conversion method and device based on artificial intelligence

Similar Documents

Publication Publication Date Title
CN112651973B (en) Semantic segmentation method based on cascade of feature pyramid attention and mixed attention
CN111563508B (en) Semantic segmentation method based on spatial information fusion
CN111695457B (en) Human body posture estimation method based on weak supervision mechanism
CN112381097A (en) Scene semantic segmentation method based on deep learning
CN112766283B (en) Two-phase flow pattern identification method based on multi-scale convolution network
CN115424059B (en) Remote sensing land utilization classification method based on pixel level contrast learning
CN111881743A (en) Human face feature point positioning method based on semantic segmentation
CN116524062A (en) Diffusion model-based 2D human body posture estimation method
CN114298909A (en) Super-resolution network model and application thereof
CN111612803B (en) Vehicle image semantic segmentation method based on image definition
CN117292126A (en) Building elevation analysis method and system using repeated texture constraint and electronic equipment
CN116310916A (en) Semantic segmentation method and system for high-resolution remote sensing city image
CN116092179A (en) Improved Yolox fall detection system
CN112686233B (en) Lane line identification method and device based on lightweight edge calculation
CN114998866A (en) Traffic sign identification method based on improved YOLOv4
CN112016403B (en) Video abnormal event detection method
Pang et al. PTRSegNet: A Patch-to-Region Bottom-Up Pyramid Framework for the Semantic Segmentation of Large-Format Remote Sensing Images
CN114463192A (en) Infrared video distortion correction method based on deep learning
Xie et al. Sparse high-level attention networks for person re-identification
Yian et al. Improved deeplabv3+ network segmentation method for urban road scenes
Min et al. Vehicle detection method based on deep learning and multi-layer feature fusion
CN111881746A (en) Face feature point positioning method and system based on information fusion
CN117036893B (en) Image fusion method based on local cross-stage and rapid downsampling
Liu et al. Auxiliary edge detection for semantic image segmentation
CN112215228B (en) Method for building efficient framework by directly simulating two-stage characteristics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination