CN112132746A - Small-scale pedestrian target rapid super-resolution method for intelligent roadside equipment - Google Patents

Small-scale pedestrian target rapid super-resolution method for intelligent roadside equipment Download PDF

Info

Publication number
CN112132746A
CN112132746A CN202010982493.9A CN202010982493A CN112132746A CN 112132746 A CN112132746 A CN 112132746A CN 202010982493 A CN202010982493 A CN 202010982493A CN 112132746 A CN112132746 A CN 112132746A
Authority
CN
China
Prior art keywords
layer
resolution
network
multiplied
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010982493.9A
Other languages
Chinese (zh)
Other versions
CN112132746B (en
Inventor
李旭
朱建潇
赵琬婷
徐启敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202010982493.9A priority Critical patent/CN112132746B/en
Publication of CN112132746A publication Critical patent/CN112132746A/en
Application granted granted Critical
Publication of CN112132746B publication Critical patent/CN112132746B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4053Super resolution, i.e. output image resolution higher than sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Abstract

The invention discloses a small-scale pedestrian target rapid super-resolution method for intelligent roadside equipment, which comprises the following steps: collecting and constructing a small-scale pedestrian high-low resolution data training set; based on a countermeasure idea, a lightweight generation network for a low-resolution small-scale pedestrian image is built, the network firstly utilizes separable convolution to extract image primary features, then a residual error module is combined to fit high-frequency information, and finally a pixel recombination module is used for carrying out high-resolution reconstruction on the low-resolution pedestrian image; building a discrimination network, and performing discrimination training on parameters of the generated network to obtain an optimal generated network; and carrying out super-resolution on the low-resolution small-scale pedestrian picture by utilizing the optimal generation network to obtain a high-resolution pedestrian target. The lightweight super-resolution generation network designed by the invention has the remarkable advantages of short training time and low inference delay, and fills the blank of the small-scale pedestrian real-time super-resolution technology in the intelligent roadside field.

Description

Small-scale pedestrian target rapid super-resolution method for intelligent roadside equipment
Technical Field
The invention belongs to the field of computer vision and intelligent traffic, and relates to a small-scale pedestrian target super-resolution method in an intelligent traffic roadside scene image, in particular to a small-scale pedestrian target rapid super-resolution method facing intelligent roadside equipment.
Background
Along with the rapid increase of road traffic scale in China, traffic accidents between pedestrians and vehicles occur frequently, the safety performance of the vehicles is continuously improved, but safety guarantee equipment for the pedestrians is in a lack state all the time. In order to ensure the basic safety of pedestrians, an intelligent road side system which adopts an electronic informatization technology to assist pedestrians to perform safety early warning on surrounding drivers or intelligent automobiles becomes the key point of domestic and foreign research. The accurate identification of the small-scale pedestrians is a precondition for guaranteeing the quick response of the safety early warning system, however, the target visual features of the small-scale pedestrians are sparse, the effectiveness of detection is difficult to guarantee by only relying on a pedestrian detection algorithm, and the super-resolution method is paid extensive attention to in order to improve the identification capability of the small-scale targets.
There are three main categories of general super-resolution methods, the first category being interpolation-based methods, such as nearest neighbor interpolation, bilinear interpolation and bicubic interpolation, which have the advantage of small computation, but neglects the common problems of motion blur and the like in the traffic field, the second method is based on reconstruction, mainly represented by an iterative back projection method, a convex set projection method, a maximum posterior probability method and the like, and the method is greatly improved compared with an interpolation method aiming at the reconstruction of a single scene, but for the problem of complex background, there are still cases where the peak signal-to-noise ratio is too low, the third category of methods is based on learning, the method mainly represents deep learning and sparse coding methods, has the characteristics of strong scene adaptation and high feature robustness, and is relatively fit with the super-resolution requirement of small-scale pedestrian targets under intelligent roadside viewing angles.
However, the existing deep learning super-resolution methods all have the serious problem of overlong algorithm execution time due to the complexity of a network structure, and are difficult to be directly used in intelligent road side equipment with high real-time requirement. Therefore, designing a rapid small-scale pedestrian target super-resolution method becomes a core loop for promoting the development of intelligent roadside equipment and ensuring the life safety of pedestrians.
Disclosure of Invention
In order to solve the problems, the invention discloses a small-scale pedestrian target rapid super-resolution method for intelligent roadside equipment, which effectively fills the blank of the small-scale pedestrian real-time super-resolution technology in the field of intelligent roadside at present and further improves the functionality and intelligence of the intelligent roadside equipment.
In order to achieve the purpose, the invention provides the following technical scheme:
a small-scale pedestrian target rapid super-resolution method for intelligent roadside equipment comprises the following steps:
(1) the method comprises the steps of collecting high-resolution small-scale pedestrian images containing various intelligent roadside scenes, obtaining a low-resolution small-scale pedestrian image set by using a down-sampling method, and constructing a small-scale pedestrian multi-resolution image training data set with the sample size of N by using the corresponding relation of high-resolution images and low-resolution images.
(2) And designing a generation network of the small-scale pedestrian target rapid super-resolution network based on the countermeasure concept. Firstly, the small-scale target semantic features in a low-resolution sample picture are preliminarily extracted through a lightweight convolution structure such as separable convolution and feature map compression, secondly, a residual error structure is stacked to form a residual error block, the residual error block is used as an estimation unit of high-frequency information of a high-resolution sample, so that the high-frequency difference between the low-resolution sample and the high-resolution sample can be fitted by using the residual error structure on the premise of not introducing excessive parameters, further, the output of the residual error block is connected into a feature compression convolution to reduce the dimensionality of feature parameters and ensure the real-time performance of an algorithm, then, a method of adding elements one by one is used, and lateral connection is introduced again to form a double feedforward structure, so that the gradient dispersion condition during gradient reverse propagation can be effectively avoided, and the training iteration number of generating the whole network can be remarkably reduced, and thirdly, the feature map is up-sampled by utilizing the pixel recombination layer, the jagged edges formed in the linear sampling or bilinear sampling process are avoided, so that a high-resolution feature map with higher quality is obtained, and finally, a high-resolution pedestrian picture is generated through a separable convolution structure. The network structure of this part is designed as follows:
layer 1 input layer: the number of input channels is 3, the resolution is A multiplied by A, and the output is A multiplied by 3 characteristic diagram.
Layer 2 feature extraction layer: 64 convolution kernels with the size of 7 × 1 and the step size of 1, and the output is a × 64 feature map.
Layer 3 feature extraction layer: 64 convolution kernels with the size of 1 × 7 and the step size of 1, and the output is a characteristic diagram of A × A × 64.
Layer 4 feature extraction layer: 256 convolution kernels of size 3 × 3 with step size 1, and the output is a × 256 feature map.
Layer 5 feature extraction layer: 128 convolution kernels of size 3 × 3 with step size 1, and the output is a × 128 feature map.
Layer 6 generating a structural residual block: 256 convolution layers with the size of 3 multiplied by 3 and the step length of 1 and the output is A multiplied by 256 characteristic diagram; the batch regularization processing layer outputs an A multiplied by 256 characteristic diagram; the pRelu activation function activates the layer, and the output is A multiplied by 256 characteristic map; 128 convolutional layers with the size of 1 multiplied by 1 and the step length of 1 and the output is A multiplied by 128 characteristic diagram; the batch regularization processing layer outputs an A multiplied by 128 feature map; and performing element-by-element addition on the input feature map of the generated structural residual block and the last batch regularization processing layer of the generated structural residual block, and outputting an A multiplied by 128 feature map.
Layer 7 generating a structural residual block: 256 convolution layers with the size of 3 multiplied by 3 and the step length of 1 and the output is A multiplied by 256 characteristic diagram; the batch regularization processing layer outputs an A multiplied by 256 characteristic diagram; the pRelu activation function activates the layer, and the output is A multiplied by 256 characteristic map; 128 convolutional layers with the size of 1 multiplied by 1 and the step length of 1 and the output is A multiplied by 128 characteristic diagram; the batch regularization processing layer outputs an A multiplied by 128 feature map; and performing element-by-element addition on the input feature map of the generated structural residual block and the last batch regularization processing layer of the generated structural residual block, and outputting an A multiplied by 128 feature map.
Layer 8 generating a structure convolution layer: 128 convolution kernels of size 3 × 3 with step size 1, and the output is a × 128 feature map.
Layer 9 creates a structural lateral connection layer: and adding the output characteristic diagram of the 8 th layer generation structure convolution layer and the input characteristic diagram of the 6 th layer generation structure residual block element by element to output an A multiplied by 128 characteristic diagram.
Layer 10 generates the structural upsampling layer: the upsampling of the feature map is realized by using a pixel reconstruction layer (PixelShuffler), and the output is the feature map of 2A multiplied by 128.
Layer 11 generates the structural upsampling layer: the up-sampling of the feature map is realized by using a pixel reconstruction layer (pixelshuffle), and the output is a 4A × 4A × 128 feature map.
Layer 12 generating a structure convolution layer: 3 separable convolution kernels of size 9 × 9 with step size 1, and high resolution pedestrian pictures of 4A × are output.
(3) Based on the idea of generating countermeasures, the invention designs a discrimination network of a small-scale pedestrian target rapid super-resolution network, extracts a plurality of semantic features of a target by means of a feature extraction structure in an Inception V2 network, then introduces a full connection layer with an output class of 2, further extracts the semantic features, finally utilizes a sigmoid activation function to normalize the output of the full connection layer by 0-1, outputs a truth estimation p for a given input picture, integrates the above structures to form a discrimination network D for judging truth of the generated high-resolution sample estimationθWherein θ is a parameter of the discrimination network, and the network structure is as follows:
layer 1 input layer: the number of input channels is 3, the resolution is 4A multiplied by 4A, and the output is a feature map of 4A multiplied by 3.
Layer 2 feature extraction layer: the characteristic extraction layer of Incepison V2 is selected as the structure of the layer, and the output is
Figure BDA0002688056990000021
The characteristic diagram of (1).
Layer 3 full connection layer: the three-dimensional input is stretched to 1-dimension, and the output class is 2.
Layer 4 normalization layer: and normalizing the result output by the 3 rd layer full-connection layer by using a sigmoid function, wherein the output category is 2.
(4) And training a network model for the designed generation network and the discrimination network. First, low resolution samples are sampled
Figure BDA0002688056990000022
As a generating network GωThe high resolution sample estimate is obtained by forward propagation through various convolutions
Figure BDA0002688056990000023
Computing high resolution sample estimates
Figure BDA0002688056990000024
And high resolution samples
Figure BDA0002688056990000025
Value of content loss L in betweencon. Next, the true false label of the high resolution sample estimate is set to 0, the high resolution sample
Figure BDA0002688056990000026
The true and false label of (2) is set to 1 to obtain a label value ykEstimating the truth of the inferred high-resolution picture and the original high-resolution picture by using a discrimination network to obtain an estimated value
Figure BDA0002688056990000027
Calculating an estimate
Figure BDA0002688056990000031
And the tag value ykIs a binary cross entropy loss value Lcro. Finally, the gradient is propagated reversely by utilizing the two loss values. The detailed steps of this section are as follows:
substep 1: the forward propagation is computed. Sampling low resolution
Figure BDA0002688056990000032
As input for generating the network, high resolution sample estimates are obtained through various convolution operations
Figure BDA0002688056990000033
Labeling the high resolution sample estimates
Figure BDA0002688056990000034
True and false label ykSet to a value of 0, corresponding to a high resolution sample
Figure BDA0002688056990000035
True and false label ykSet to 1, high resolution sample estimation using discriminant network
Figure BDA0002688056990000036
And high resolution samples
Figure BDA0002688056990000037
Carrying out truth estimation to obtain an estimated value
Figure BDA0002688056990000038
Substep 2: the loss value is calculated. Judging loss value of network as truth estimated value
Figure BDA0002688056990000039
And the value y of true and false labelkCross entropy of two classes LcroThe specific calculation formula is as follows:
Figure BDA00026880569900000310
generating loss values for a network as high resolution sample estimates
Figure BDA00026880569900000311
And corresponding high resolution samples
Figure BDA00026880569900000312
At discriminator DθMean square error L of tail layer feature map of feature extraction layerconThe specific calculation formula is as follows:
Figure BDA00026880569900000313
where Φ (x) represents a given input sample x passing through the discriminator DθThe tail layer feature map extracted by the feature extraction layer, W, H, respectively represents the width and height of the extracted feature map.
Substep 3: and carrying out gradient back propagation, and storing parameter values of the generated network and the judgment network in each iteration process.
Substep 4: and selecting the network parameter with the lowest sum of the generated network and the judgment network loss in the iteration process of the substep 3 as the optimal network parameter, and selecting the network model corresponding to the optimal network parameter as the optimal model.
(5) And (4) performing super-resolution operation on the small-scale pedestrian target under the intelligent roadside viewing angle by using the generation network in the optimal model output in the step 4.4.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the invention provides a small-scale pedestrian image target super-resolution method suitable for the intelligent roadside field.
2. The small-scale pedestrian super-resolution method based on the generation countermeasure network structure effectively overcomes the defects of high difficulty and serious inference time consumption of a universal super-resolution neural network, has the obvious advantages of low calculation delay and high output frequency, and has good peak signal-to-noise ratio and structural similarity of the output super-resolution result.
Drawings
Fig. 1 is a schematic diagram of a generation network structure of a small-scale pedestrian target rapid super-resolution method designed by the invention.
Fig. 2 is a schematic diagram of a discrimination network structure of the small-scale pedestrian target rapid super-resolution method designed by the invention.
FIG. 3 is the general steps of the method for rapid super-resolution of small-scale pedestrian targets designed by the present invention.
Fig. 4 is a graph comparing the effect of the method designed by the present invention and the typical super-resolution algorithm network SRGAN.
Detailed Description
The technical solutions provided by the present invention will be described in detail below with reference to specific examples, and it should be understood that the following specific embodiments are only illustrative of the present invention and are not intended to limit the scope of the present invention.
According to the small-scale pedestrian target rapid super-resolution method for the intelligent roadside equipment, a network model data support is formed by constructing a small-scale pedestrian multi-resolution training data set of an intelligent roadside scene, a countermeasure network concept is generated in combination, a lightweight super-resolution generation network and a discrimination network suitable for a roadside real-time scene are designed aiming at the specific problems of long execution time, serious training time consumption and the like of a traditional super-resolution network model algorithm, and finally the generation network is trained by combining a feature map loss function and a true and false discrimination loss function to obtain an optimal model of the small-scale pedestrian target rapid super-resolution network. Compared with other traditional super-resolution networks, the method designed by the invention has core advantages in two aspects of model overall characteristics, scene adaptability and the like. On the model characteristic, the main body of the method designed by the invention is composed of light-weight structures such as a residual block and separable convolution, and has the remarkable characteristics of less parameters and short execution time; in the aspect of scene adaptability, a traditional pedestrian super-resolution network model is difficult to be directly suitable for an intelligent roadside scene, and the method enables the transfer learning cost of the algorithm model to be lower and the scene adaptability to be stronger by constructing a targeted small-scale roadside pedestrian data set. The specific process of the invention comprises the following steps:
(1) and constructing a small-scale pedestrian multi-resolution image training data set containing various intelligent roadside scenes. Compared with the traditional super-resolution data set, the roadside scene small-scale pedestrian multi-resolution training data set has complex limiting conditions, and adoptsThe method comprises the steps of firstly defining a pedestrian target with the pixel height H smaller than a pixel threshold value H as a small-scale pedestrian, in an actual traffic scene, selecting the threshold value H as 100 pixel elements, then acquiring a pedestrian target picture containing different intelligent roadside scenes and having the resolution of 1920 x 1080, intercepting an area containing the small target pedestrian as a high-resolution sample, setting the resolution of the intercepted area to be 512 x 512, and setting the interception mode to be target-centered interception, thereby forming a high-resolution sample set with the size of N
Figure BDA0002688056990000041
Under the condition of fully considering the influence degree of data scale on the designed method, the picture acquisition cost and other factors, N is selected to be 5000, and after the maximum value down-sampling with the step length of 4 is carried out on the samples in the high-resolution set, the low-resolution sample set with one-to-one correspondence of the samples is formed
Figure BDA0002688056990000042
Wherein p iskRefers to sample No. k in the N sample pictures, HD refers to a sample resolution of 512 × 512, and LD refers to a sample resolution of 128 × 128. Concentrating arbitrary samples using high and low resolution samples
Figure BDA0002688056990000043
Down-sampling corresponding relation between them to form sample pair
Figure BDA0002688056990000044
Synthesizing N sample pairs to form small-scale pedestrian multi-resolution image training data set STrain
(2) And designing a generation network of the small-scale pedestrian target rapid super-resolution network based on the countermeasure concept. The effect of the generating network is to generate for a given low resolution sample
Figure BDA0002688056990000045
Generating high resolution samples corresponding thereto
Figure BDA0002688056990000046
High quality estimation of
Figure BDA0002688056990000047
This requires that the designed generation network can completely retain the semantic information of the low-resolution samples, and has the capability of completely supplementing the high-frequency information such as brightness, texture, edge, and the like. Due to the complex requirement, most of the existing super-resolution networks depend on huge parameters and are difficult to deploy in the intelligent roadside field. In consideration of the above requirements and practical problems, the present invention integrates the factors of generated picture similarity, feature map similarity, algorithm execution time, etc., and designs the following generated network GωWhere ω is a parameter of the generating network: firstly, preliminarily extracting semantic features of a target through a lightweight convolution structure such as separable convolution and feature map compression, secondly, stacking a residual structure to form a residual block, using the residual block as an estimation unit of image high-frequency information, fitting a high-frequency difference between a low-resolution sample and a high-resolution sample by using the residual structure on the premise of not introducing excessive parameters, further, connecting the output of the residual block into a feature compression convolution to reduce the dimensionality of feature parameters and ensure the real-time performance of an algorithm, then, utilizing a method of adding elements one by one, introducing lateral connection again to form a double feedforward structure, thus effectively avoiding the gradient dispersion condition during gradient reverse propagation, remarkably reducing the training iteration times of generating the whole network, and finally, utilizing a pixel recombination layer to perform upsampling on the feature map, and finally, generating a high-resolution pedestrian picture through a separable convolution structure. The network structure of this part is designed as follows:
layer 1 input layer: the number of input channels is 3, the resolution size is 128 × 128, and the output is a feature map of 128 × 128 × 3.
Layer 2 feature extraction layer: 64 convolution kernels of size 7 × 1 with step size 1, and a feature map of 128 × 128 × 64 is output.
Layer 3 feature extraction layer: 64 convolution kernels of size 1 × 7, step size 1, and output a 128 × 128 × 64 feature map.
Layer 4 feature extraction layer: 256 convolution kernels of size 3 × 3 with step size 1, and a feature map of 128 × 128 × 256 is output.
Layer 5 feature extraction layer: 128 convolution kernels of size 3 × 3 with step size 1, and a 128 × 128 × 128 feature map is output.
Layer 6 generating a structural residual block: 256 convolution layers with the size of 3 multiplied by 3 and the step size of 1 and the output of a characteristic diagram of 128 multiplied by 25; the batch regularization processing layer outputs a feature map of 128 multiplied by 256; the pRelu activation function activates the layer, and the output is a 128 × 128 × 256 feature map; 128 convolutional layers with the size of 1 multiplied by 1 and the step size of 1, and outputting a characteristic diagram of 128 multiplied by 128; the batch regularization processing layer outputs a 128 x 128 feature map; and performing element-by-element addition on the input feature map of the generated structural residual block and the last batch regularization processing layer of the generated structural residual block, and outputting a 128 × 128 × 128 feature map.
Layer 7 generating a structural residual block: 256 convolution layers with the size of 3 multiplied by 3 and the step size of 1 and the output of a characteristic diagram of 128 multiplied by 25; the batch regularization processing layer outputs a feature map of 128 multiplied by 256; the pRelu activation function activates the layer, and the output is a 128 × 128 × 256 feature map; 128 convolutional layers with the size of 1 multiplied by 1 and the step size of 1, and outputting a characteristic diagram of 128 multiplied by 128; the batch regularization processing layer outputs a 128 x 128 feature map; and performing element-by-element addition on the input feature map of the generated structural residual block and the last batch regularization processing layer of the generated structural residual block, and outputting a 128 × 128 × 128 feature map.
Layer 8 generating a structure convolution layer: 128 convolution kernels of size 3 × 3 with step size 1, and a 128 × 128 × 128 feature map is output.
Layer 9 creates a structural lateral connection layer: and adding the output characteristic diagram of the 8 th layer generation structure convolution layer and the input characteristic diagram of the 6 th layer generation structure residual block element by element to output a characteristic diagram of 128 multiplied by 128.
Layer 10 generates the structural upsampling layer: the upsampling of the feature map is implemented by using a pixel reconstruction layer (pixelshuffle), and the output is a feature map of 256 × 256 × 128.
Layer 11 generates the structural upsampling layer: the upsampling of the feature map is realized by using a pixel reconstruction layer (pixelshuffle), and the output is a feature map of 512 × 512 × 128.
Layer 12 generating a structure convolution layer: 3 separable convolution kernels of size 9 × 9 with step size 1, and a high resolution pedestrian picture of 512 × 512 × 3 is output.
(3) Based on the countermeasure idea, a discrimination network of a small-scale pedestrian target rapid super-resolution network is designed to process high-resolution samples
Figure BDA0002688056990000051
And the high resolution sample estimate generated in step 2
Figure BDA0002688056990000052
And (5) judging whether the product is true or false. The function of the discrimination network is to introduce a true and false supervision signal into the generation network in the form of an additional label to guide the generation network to generate a more vivid high-resolution picture, but the traditional large-scale discrimination network can bring better supervision performance, but also brings huge parameters and time consumption for training, and easily causes model divergence. Compared with other typical backbone networks (such as ResNet and DenseNet), the Inception V2 network has the specific advantages of moderate parameter quantity and strong extraction capability on image bottom layer features, so that by means of the feature extraction structure in the Inception V2 network, after a plurality of semantic features of a target are extracted, a full connection layer with the output class of 2 is introduced, the semantic features are extracted with high precision, finally, the output of the full connection layer is normalized by 0-1 by using a sigmoid activation function, the true degree estimation p of a given input picture is output, and the structure is integrated to form a discrimination network D for performing true and false judgment on the generated high-resolution sample estimationθWhere θ is a parameter of the discrimination network, netThe complex structure is as follows:
layer 1 input layer: the number of input channels is 3, the resolution is 512 × 512, and the output is a characteristic diagram of 512 × 512 × 3.
Layer 2 feature extraction layer: the IncepotionV 2 feature extraction layer is selected as the structure of the layer, and the feature map with the output of 32 x 256 is output.
Layer 3 full connection layer: the three-dimensional input is stretched to 1-dimension, i.e. the number of inputs is 262144 and the output class is 2.
Layer 4 normalization layer: and normalizing the result output by the 3 rd layer full-connection layer by using a sigmoid function, wherein the output category is 2.
(4) For the designed generation network GωAnd discriminating network DθAnd training the network model. The training loss function definition of the super-resolution method based on the generation countermeasure network emphasizes the characteristics of the generated target and the original target on all dimensions, and the manual characteristic setting mode is not suitable for complicated intelligent roadside scenes, so that the problem of the invention is that the loss function based on the characteristic diagram difference is innovatively designed for training the network model, and the specific content is that firstly, a low-resolution sample is used for training the network model
Figure BDA0002688056990000061
As a generating network GωThe high resolution sample estimate is obtained by forward propagation through various convolutions
Figure BDA0002688056990000062
Computing high resolution sample estimates
Figure BDA0002688056990000063
And high resolution samples
Figure BDA0002688056990000064
Value of content loss L in betweencon. Next, the true false label of the high resolution sample estimate is set to 0, the high resolution sample
Figure BDA0002688056990000065
The true and false label of (2) is set to 1 to obtain a label value ykEstimating the truth of the inferred high-resolution picture and the original high-resolution picture by using a discrimination network to obtain an estimated value
Figure BDA0002688056990000066
Calculating an estimate
Figure BDA0002688056990000067
And the tag value ykIs a binary cross entropy loss value Lcro. Finally, the gradient is propagated reversely by utilizing the two loss values. The detailed steps of this section are as follows:
substep 1: the forward propagation is computed. Sampling low resolution
Figure BDA0002688056990000068
As input for generating the network, high resolution sample estimates are obtained through various convolution operations
Figure BDA0002688056990000069
Labeling the high resolution sample estimates
Figure BDA00026880569900000610
True and false label ykSet to a value of 0, corresponding to a high resolution sample
Figure BDA00026880569900000611
True and false label ykSet to 1, high resolution sample estimation using discriminant network
Figure BDA00026880569900000612
And high resolution samples
Figure BDA00026880569900000613
Carrying out truth estimation to obtain an estimated value
Figure BDA00026880569900000614
Substep 2: meterAnd calculating a loss value. Judging loss value of network as truth estimated value
Figure BDA00026880569900000615
And the value y of true and false labelkCross entropy of two classes LcroThe specific calculation formula is as follows:
Figure BDA00026880569900000616
generating loss values for a network as high resolution sample estimates
Figure BDA00026880569900000617
And corresponding high resolution samples
Figure BDA00026880569900000618
At discriminator DθMean square error L of tail layer feature map of feature extraction layerconThe specific calculation formula is as follows:
Figure BDA00026880569900000619
where Φ (x) represents a given input sample x passing through the discriminator DθThe tail layer feature map extracted by the feature extraction layer, W, H, respectively represents the width and height of the extracted feature map.
Substep 3: gradient back propagation is performed. The mode of determining gradient back propagation is a random gradient descent method, the learning rate is 0.0001, and the iteration number C of the iteration process is 10000. In the iterative process, training parameters of the generated network and the judgment network are corrected by utilizing gradient back propagation, and the network parameters are stored as
Figure BDA00026880569900000620
Wherein Net is a parameter for forming and judging the network, and e is the current iteration number.
Substep 4: selecting the network parameter with the lowest sum of the generated network and the judgment network loss in the iteration process of the substep 3
Figure BDA00026880569900000621
And the network model corresponding to the optimal network parameter is the optimal model.
(5) And performing super-resolution operation on the small-scale pedestrian target under the intelligent roadside viewing angle by using a generation network in the optimal model.
(6) In order to verify the effectiveness of the network designed by the invention, a typical super-resolution network SRGAN (Takano N, Alaghband G.SRGAN: Training Dataset Matters [ J ].2019.) is selected as a reference, and is compared with two most typical parameters of peak signal-to-noise ratio and structural similarity of the network model designed by the invention under the same Training data and Training conditions, and a comparison table is shown in Table 1, wherein the higher the peak signal-to-noise ratio is, the less the image distortion is, the higher the structural similarity is, and the higher the similarity between the image and the truth value is. The method designed by the invention greatly shortens the operation time at the cost of slightly losing the peak signal-to-noise ratio and the structural similarity, and completely meets the real-time application requirements in the intelligent roadside field. Figure 3 graphically illustrates the super-resolution comparison of the present invention with SRGAN.
TABLE 1 evaluation results of small-scale pedestrian super-resolution network and SRGAN generation pedestrian image designed by the invention
Figure BDA0002688056990000071

Claims (1)

1. A small-scale pedestrian target rapid super-resolution method for intelligent roadside equipment is characterized by comprising the following steps:
(1) collecting high-resolution small-scale pedestrian images containing various intelligent roadside scenes, obtaining a low-resolution small-scale pedestrian image set by using a down-sampling method, and constructing a small-scale pedestrian multi-resolution image training data set with the sample size of N by using the corresponding relation of high-resolution images and low-resolution images;
(2) designing a generation network of a small-scale pedestrian target rapid super-resolution network based on a countermeasure idea; firstly, preliminarily extracting small-scale target semantic features in a low-resolution sample picture through a lightweight convolution structure such as separable convolution and feature map compression, secondly, stacking a residual structure to form a residual block, using the residual block as an estimation unit of high-resolution sample high-frequency information, further, connecting the output of the residual block into a feature compression convolution, then, utilizing a method of adding elements one by one, introducing lateral connection again to form a double feedforward structure, thirdly, utilizing a pixel recombination layer to perform upsampling on the feature map, thereby obtaining a higher-quality high-resolution feature map, and finally, generating a high-resolution pedestrian picture through the separable convolution structure, wherein the network structure of the part is designed as follows:
layer 1 input layer: the number of input channels is 3, the resolution is A multiplied by A, and the output is a characteristic diagram of A multiplied by 3;
layer 2 feature extraction layer: 64 convolution kernels with the size of 7 multiplied by 1 and the step size of 1 are output as a characteristic diagram of A multiplied by 64;
layer 3 feature extraction layer: 64 convolution kernels with the size of 1 × 7 and the step size of 1, and outputting a characteristic graph of A × 64;
layer 4 feature extraction layer: 256 convolution kernels with the size of 3 multiplied by 3, the step length is 1, and the output is A multiplied by 256 characteristic diagram;
layer 5 feature extraction layer: 128 convolution kernels with the size of 3 multiplied by 3, the step size is 1, and the output is A multiplied by 128 characteristic diagram;
layer 6 generating a structural residual block: 256 convolution layers with the size of 3 multiplied by 3 and the step length of 1 and the output is A multiplied by 256 characteristic diagram; the batch regularization processing layer outputs an A multiplied by 256 characteristic diagram; the pRelu activation function activates the layer, and the output is A multiplied by 256 characteristic map; 128 convolutional layers with the size of 1 multiplied by 1 and the step length of 1 and the output is A multiplied by 128 characteristic diagram; the batch regularization processing layer outputs an A multiplied by 128 feature map; adding the input feature map of the generated structural residual block and the last batch regularization processing layer of the generated structural residual block element by element, and outputting an A multiplied by 128 feature map;
layer 7 generating a structural residual block: 256 convolution layers with the size of 3 multiplied by 3 and the step length of 1 and the output is A multiplied by 256 characteristic diagram; the batch regularization processing layer outputs an A multiplied by 256 characteristic diagram; the pRelu activation function activates the layer, and the output is A multiplied by 256 characteristic map; 128 convolutional layers with the size of 1 multiplied by 1 and the step length of 1 and the output is A multiplied by 128 characteristic diagram; the batch regularization processing layer outputs an A multiplied by 128 feature map; adding the input feature map of the generated structural residual block and the last batch regularization processing layer of the generated structural residual block element by element, and outputting an A multiplied by 128 feature map;
layer 8 generating a structure convolution layer: 128 convolution kernels with the size of 3 multiplied by 3, the step size is 1, and the output is A multiplied by 128 characteristic diagram;
layer 9 creates a structural lateral connection layer: adding the output characteristic diagram of the 8 th layer generation structure convolution layer and the input characteristic diagram of the 6 th layer generation structure residual block element by element to output an A multiplied by 128 characteristic diagram;
layer 10 generates the structural upsampling layer: realizing the up-sampling of the feature map by utilizing the pixel recombination layer, and outputting the feature map with the output of 2A multiplied by 128;
layer 11 generates the structural upsampling layer: realizing the up-sampling of the feature map by utilizing the pixel recombination layer, and outputting the feature map with 4A multiplied by 128;
layer 12 generating a structure convolution layer: 3 separable convolution kernels with the size of 9 multiplied by 9 and the step length of 1 are output to be high-resolution pedestrian pictures with the size of 4A multiplied by 4A;
(3) based on the countermeasure idea generation, a discrimination network of a small-scale pedestrian target rapid super-resolution network is designed, a feature extraction structure in an Inception V2 network is used for extracting multiple semantic features of a target, a full connection layer with an output class of 2 is introduced, the semantic features are further extracted, finally, a sigmoid activation function is used for carrying out 0-1 normalization on the output of the full connection layer, a truth degree estimation p for a given input picture is output, and the structure is integrated to form a discrimination network D for carrying out truth judgment on the generated high-resolution sample estimationθWherein θ is a parameter of the discrimination network, and the network structure is as follows:
layer 1 input layer: the number of input channels is 3, the resolution is 4A multiplied by 4A, and a feature map of 4A multiplied by 3 is output;
layer 2 feature extraction layer: the characteristic extraction layer of Incepison V2 is selected as the structure of the layer, and the output is
Figure FDA0002688056980000021
A characteristic diagram of (1);
layer 3 full connection layer: stretching the three-dimensional input to 1 dimension, wherein the output category is 2;
layer 4 normalization layer: normalizing the result output by the 3 rd layer full-connection layer by using a sigmoid function, wherein the output category is 2;
(4) training a network model for the designed generating network and the judging network; first, low resolution samples are sampled
Figure FDA0002688056980000022
As a generating network GωThe high resolution sample estimate is obtained by forward propagation through various convolutions
Figure FDA0002688056980000023
Computing high resolution sample estimates
Figure FDA0002688056980000024
And high resolution samples
Figure FDA0002688056980000025
Value of content loss L in betweencon(ii) a Next, the true false label of the high resolution sample estimate is set to 0, the high resolution sample
Figure FDA0002688056980000026
The true and false label of (2) is set to 1 to obtain a label value ykEstimating the truth of the inferred high-resolution picture and the original high-resolution picture by using a discrimination network to obtain an estimated value
Figure FDA0002688056980000027
Calculating an estimate
Figure FDA0002688056980000028
And the tag value ykIs a binary cross entropy loss value Lcro(ii) a Finally, the two loss values are utilized to carry out gradient back propagation; the detailed steps of this section are as follows:
substep 1: calculating forward propagation; sampling low resolution
Figure FDA0002688056980000029
As input for generating the network, high resolution sample estimates are obtained through various convolution operations
Figure FDA00026880569800000210
Labeling the high resolution sample estimates
Figure FDA00026880569800000211
True and false label ykSet to a value of 0, corresponding to a high resolution sample
Figure FDA00026880569800000212
True and false label ykSet to 1, high resolution sample estimation using discriminant network
Figure FDA00026880569800000213
And high resolution samples
Figure FDA00026880569800000214
Carrying out truth estimation to obtain an estimated value
Figure FDA00026880569800000215
Substep 2: calculating a loss value; judging loss value of network as truth estimated value
Figure FDA00026880569800000216
And the value y of true and false labelkCross entropy of two classes LcroConcrete meterThe calculation formula is as follows:
Figure FDA00026880569800000217
generating loss values for a network as high resolution sample estimates
Figure FDA00026880569800000218
And corresponding high resolution samples
Figure FDA00026880569800000219
At discriminator DθMean square error L of tail layer feature map of feature extraction layerconThe specific calculation formula is as follows:
Figure FDA00026880569800000220
where Φ (x) represents a given input sample x passing through the discriminator DθThe tail layer feature map extracted by the feature extraction layer is W, H respectively representing the width and the height of the extracted feature map;
substep 3: carrying out gradient back propagation, and storing parameter values of a generated network and a judgment network in each iteration process;
substep 4: selecting the network parameter with the lowest sum of the generated network and the judgment network loss in the iteration process of the substep 3 as an optimal network parameter, and selecting the network model corresponding to the optimal network parameter as an optimal model;
(5) and (4) performing super-resolution operation on the small-scale pedestrian target under the intelligent roadside viewing angle by using the generation network in the optimal model output in the step 4.4.
CN202010982493.9A 2020-09-17 2020-09-17 Small-scale pedestrian target rapid super-resolution method for intelligent roadside equipment Active CN112132746B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010982493.9A CN112132746B (en) 2020-09-17 2020-09-17 Small-scale pedestrian target rapid super-resolution method for intelligent roadside equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010982493.9A CN112132746B (en) 2020-09-17 2020-09-17 Small-scale pedestrian target rapid super-resolution method for intelligent roadside equipment

Publications (2)

Publication Number Publication Date
CN112132746A true CN112132746A (en) 2020-12-25
CN112132746B CN112132746B (en) 2022-11-11

Family

ID=73841758

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010982493.9A Active CN112132746B (en) 2020-09-17 2020-09-17 Small-scale pedestrian target rapid super-resolution method for intelligent roadside equipment

Country Status (1)

Country Link
CN (1) CN112132746B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113554872A (en) * 2021-07-19 2021-10-26 昭通亮风台信息科技有限公司 Detection early warning method and system for traffic intersection and curve

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110136063A (en) * 2019-05-13 2019-08-16 南京信息工程大学 A kind of single image super resolution ratio reconstruction method generating confrontation network based on condition
CN110706157A (en) * 2019-09-18 2020-01-17 中国科学技术大学 Face super-resolution reconstruction method for generating confrontation network based on identity prior
CN111583109A (en) * 2020-04-23 2020-08-25 华南理工大学 Image super-resolution method based on generation countermeasure network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110136063A (en) * 2019-05-13 2019-08-16 南京信息工程大学 A kind of single image super resolution ratio reconstruction method generating confrontation network based on condition
CN110706157A (en) * 2019-09-18 2020-01-17 中国科学技术大学 Face super-resolution reconstruction method for generating confrontation network based on identity prior
CN111583109A (en) * 2020-04-23 2020-08-25 华南理工大学 Image super-resolution method based on generation countermeasure network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王志强等: "生成式对抗网络的图像超分辨率重建", 《西安工业大学学报》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113554872A (en) * 2021-07-19 2021-10-26 昭通亮风台信息科技有限公司 Detection early warning method and system for traffic intersection and curve

Also Published As

Publication number Publication date
CN112132746B (en) 2022-11-11

Similar Documents

Publication Publication Date Title
CN110706157B (en) Face super-resolution reconstruction method for generating confrontation network based on identity prior
CN109035149B (en) License plate image motion blur removing method based on deep learning
CN110009095B (en) Road driving area efficient segmentation method based on depth feature compressed convolutional network
CN111639692A (en) Shadow detection method based on attention mechanism
CN113642634A (en) Shadow detection method based on mixed attention
CN110689482A (en) Face super-resolution method based on supervised pixel-by-pixel generation countermeasure network
CN111612008A (en) Image segmentation method based on convolution network
CN112990065B (en) Vehicle classification detection method based on optimized YOLOv5 model
CN113888547A (en) Non-supervision domain self-adaptive remote sensing road semantic segmentation method based on GAN network
CN113743269B (en) Method for recognizing human body gesture of video in lightweight manner
CN113538457B (en) Video semantic segmentation method utilizing multi-frequency dynamic hole convolution
CN114898284B (en) Crowd counting method based on feature pyramid local difference attention mechanism
CN112966747A (en) Improved vehicle detection method based on anchor-frame-free detection network
CN113255837A (en) Improved CenterNet network-based target detection method in industrial environment
CN111898432A (en) Pedestrian detection system and method based on improved YOLOv3 algorithm
CN115482518A (en) Extensible multitask visual perception method for traffic scene
CN115115831A (en) Attention-guided multi-scale context information interaction semantic segmentation method
CN112132746B (en) Small-scale pedestrian target rapid super-resolution method for intelligent roadside equipment
CN113392728B (en) Target detection method based on SSA sharpening attention mechanism
US20230154157A1 (en) Saliency-based input resampling for efficient object detection
CN114612456B (en) Billet automatic semantic segmentation recognition method based on deep learning
CN112686233B (en) Lane line identification method and device based on lightweight edge calculation
CN112446292B (en) 2D image salient object detection method and system
CN114972851A (en) Remote sensing image-based ship target intelligent detection method
CN110827238A (en) Improved side-scan sonar image feature extraction method of full convolution neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant