CN116524432A - Application of small target detection algorithm in traffic monitoring - Google Patents

Application of small target detection algorithm in traffic monitoring Download PDF

Info

Publication number
CN116524432A
CN116524432A CN202310412581.9A CN202310412581A CN116524432A CN 116524432 A CN116524432 A CN 116524432A CN 202310412581 A CN202310412581 A CN 202310412581A CN 116524432 A CN116524432 A CN 116524432A
Authority
CN
China
Prior art keywords
branch
module
image
super
resolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310412581.9A
Other languages
Chinese (zh)
Inventor
吴璨
范海连
张凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Communications Huakong Tianjin Construction Group Co ltd
Original Assignee
China Communications Huakong Tianjin Construction Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Communications Huakong Tianjin Construction Group Co ltd filed Critical China Communications Huakong Tianjin Construction Group Co ltd
Priority to CN202310412581.9A priority Critical patent/CN116524432A/en
Publication of CN116524432A publication Critical patent/CN116524432A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • G06T3/4076Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution using the original low-resolution images to iteratively correct the high-resolution images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/73Deblurring; Sharpening
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to an application of a small target detection algorithm in traffic monitoring, which comprises the following steps: performing preliminary treatment on the picture to be detected; constructing a lightweight image super-resolution reconstruction network and completing network training; performing edge sharpening on the image subjected to super-resolution processing; inputting the image with the sharpened edge into a detection module for detection and obtaining a small traffic target detection result; the light-weight image super-resolution reconstruction network structure comprises a shallow layer feature extraction module, a deep layer feature extraction module, an attention module and a reconstruction module. The invention realizes small target detection based on the super-resolution network, can effectively enlarge the resolution of the small target, increases the characteristic information quantity, and has higher and more effective lifting precision compared with the traditional target detection algorithm-based optimization method.

Description

Application of small target detection algorithm in traffic monitoring
Technical Field
The invention relates to the technical field of traffic monitoring, in particular to application of a small target detection algorithm in traffic monitoring.
Background
With the change of the economic growth of China from high speed to high quality development, the transportation industry reaches the stage of basically adapting to the economic and social development demands, the national road traffic mileage and road traffic scale are rapidly increased, and the national motor vehicles keep 3.48 hundred million vehicles by the end of 2022, and the motor vehicle drivers reach 4.35 hundred million people. However, due to complex traffic conditions in China and weak security and guarantee capability of road infrastructure, the current traffic management and control capability cannot meet the rapidly-enlarged traffic demand, traffic contradictions among pedestrians, vehicles and roads are increasingly prominent, so that traffic accidents are frequent, and the total proportion of national security production accidents is obviously higher than that of other industries.
The intelligent monitoring system is used for detecting and predicting traffic accidents, so that the damage caused by the traffic accidents can be greatly reduced, the core of the intelligent monitoring system is a target detection algorithm, wherein the detection of small targets is a difficult point, the targets such as pedestrians and small target vehicles are highly concentrated below 50 pixels in an image, the appearance of colors, edges and the like is fuzzy, and the targets are difficult to distinguish in a complex traffic environment containing a large number of negative samples such as electric vehicles and the like. Therefore, the detection accuracy rate for small-scale pedestrians and vehicles is low, and the omission rate is high. Therefore, improving the detection accuracy of such small targets is of vital importance for traffic safety.
The existing small target detection is generally optimized based on a mainstream target detection algorithm, such as methods of enhancing a small target sample, optimizing a training method, an anchor-free frame mechanism, feature fusion and the like. Among them, in 2018, bai et al proposed an end-to-end multitasking generation countermeasure network (Multi-Task Generative Adversarial Network, MTGAN) to solve the problem of small target detection accuracy. The method comprises the following steps:
1) Cutting the input image according to the requirement;
2) Inputting the target object into a baseline target detector which takes a Faster RCNN or a Mask RCNN as a backbone network, and primarily identifying an object and a background;
3) The picture input generator after preliminary recognition is a super-resolution network and is used for upsampling the small blurred image to a fine image and recovering the detail information of the fine image so as to perform more accurate detection;
4) Inputting the super-resolution restored image into a discriminator, wherein the discriminator is a multitasking network, and the network describes each super-resolution processed image block by using a real or false score, an object category score and a frame regression; in order to enable the generator to recover more small target details for detection, the discriminators can back propagate the loss of classification and regression to the generator in the training process, and therefore the generation effect of the generator is improved.
Disadvantages of the prior art:
1) The method for detecting the small target based on optimization of the mainstream target detection algorithm, such as a method for enhancing a small target sample, an optimization training method, an anchor-free frame mechanism, feature fusion and the like, fundamentally does not solve the problem of detail missing of the small target object, and although the detection accuracy of the small target is improved in a total view, the amplitude of the improvement is limited, and a plurality of algorithms may generate artifacts in practical application.
2) The super-resolution algorithm based on the GAN (antagonistic generation network) is introduced into target detection, so that the detection precision of a small target object is effectively improved, the resolution of the small target object can be effectively enlarged, and the characteristic information quantity is increased, but because the super-resolution algorithm greatly increases the network layer number of the algorithm, the training of the GAN network is relatively difficult, and the real-time performance is difficult to realize and the super-resolution algorithm is applied to a specific scene.
Disclosure of Invention
The present invention addresses the above-mentioned shortcomings, and provides for the use of a small target detection algorithm in traffic monitoring,
the invention adopts the following technical scheme to realize the aim:
a light-weight image super-resolution reconstruction network structure consists of a shallow feature extraction module, a deep feature extraction module, an attention module and a reconstruction module;
the shallow feature extraction module maps the input image to a high-dimensional feature space, comprising a convolution layer of convolution kernel size 3*3, expressed as x 0 =f ext (I LR );
Deep feature extraction module composed of multiple large receptive field information distillation blocks (Vast-redundant-field Information distillation Block, VIDB), for x 0 Deep feature extraction is performed by progressively refining the extracted features by the VIDBs stack, a process expressed as
An attention module, which consists of an ESA module (Efficient ChannelAttention, high-efficiency channel attention) and a CCA module (Coordinated attention);
the reconstruction module is used for completing reconstruction by adopting a Pixelshuffle algorithm and takes the shape as (, c r) 2 H, W) are reconstructed into tensors of the shape (, C, H, W).
Specifically, the VIDB block performs a convolution operation with a convolution kernel of 1*1 on the input image, and then divides the input image into two branches, namely a first branch and a second branch, and the processing results of the first branch and the second branch are output after being added, and perform a pixel normalization operation.
Specifically, the first branch is a direct communication path, the second branch is activated by an activation function based on a gate function, then characteristic weight distribution is carried out by a channel attention module based on information distillation and large convolution kernel depth separation convolution operation, then characteristic fusion between characteristic graphs is carried out by a convolution layer with the size of 1*1, and the characteristic fusion is added with the direct communication path.
Specifically, an input feature map with the size of C, H and W is divided into two feature maps with the size of C/2, W and H according to the channel number by an activation function based on a gate function, and then the feature maps are output after product processing; the channel attention module divides the activated characteristic diagram into two branches for processing, namely a third branch and a fourth branch, and the processing results of the third branch and the fourth branch are output after being added, and a convolution operation with the convolution kernel size of 1*1 is carried out.
Specifically, the third branch firstly carries out a convolution operation with a convolution kernel size of 1*1, and then carries out a deep convolution operation with a convolution kernel size of 9*9, a cloth length of 1 and a filling of 4; the fourth branch first performs a convolution operation with a convolution kernel size of 1*1 and then activates by the gel activation function.
Specifically, the CCA module performs contrast loss calculation and adaptive global pooling on the input picture, performs addition processing on the input picture, outputs the added input picture, sequentially passes through convolution operation with a convolution kernel size of 1*1, activation of a Relu activation function and convolution operation with a convolution kernel size of 1*1, and finally performs multiplication processing on the input picture to obtain an output picture, thereby completing feature learning based on position information.
Specifically, the ESA module performs a convolution operation with a convolution kernel size of 1*1 on the input picture, then divides the input picture into two branches for processing, namely a fourth branch and a fifth branch, performs addition processing on processing results of the fourth branch and the fifth branch, performs convolution operation with the convolution kernel size of 1*1 on the added feature picture, fuses features and recovers the number of channels, activates the feature picture through a sigmoid linear activation function, performs multiplication processing on the feature picture and obtains an output picture, and learns cross-channel interaction relation under the condition of not reducing dimension.
In particular, the fourth branch performs a convolution operation with a convolution kernel size of 1*1; the fifth branch sequentially carries out convolution operation with the convolution kernel size of 3*3, the step length of 2 and the filling of 1; a max pooling layer of core size 7*7 with step size 7; the original image size is scientifically recovered by the deep convolution operation with the convolution kernel size of 3*3, the activation of a GELU activation function and bilinear interpolation processing.
An application of a small target detection algorithm in traffic monitoring, which reconstructs a network structure based on light-weight image super-resolution, comprises the following steps:
s1, performing preliminary processing on a picture to be detected, wherein the specific steps are as follows:
s11, performing format conversion on a low-resolution image to be processed to obtain a low-resolution YCbCr image;
s12, equally dividing the low-resolution YCbCr image into a plurality of sub-images according to rows and columns, wherein the size of the sub-images after dividing is 480 pixels;
s13, randomly rotating the sub-images by 90 degrees or 180 degrees to enhance data so as to provide more data samples and reduce the storage space required by the feature map in network propagation;
s2, constructing a lightweight image super-resolution reconstruction network, and completing network training, wherein the training loss function adopts L2 loss;
s3, carrying out edge sharpening on the image subjected to super-resolution processing;
s4, inputting the image with the sharpened edges into a detection module for detection and obtaining a small traffic target detection result.
In particular, the detection module employs the YOLOv3 algorithm to divide the image into a plurality of regions and predicts the probability of the bounding box and each region.
The beneficial effects of the invention are as follows:
1. the invention reconstructs a network structure by setting the super-resolution of the lightweight image, and the network structure comprises a shallow layer feature extraction module, a deep layer feature extraction module (VIDB block), an attention module (ESA module, CCA module) and a reconstruction module, wherein the super-resolution algorithm effectively enlarges the resolution of a small target, increases the characteristic information quantity, and has higher and more effective lifting precision compared with the traditional optimization method based on the target detection algorithm.
2. According to the invention, the GAN-based super-resolution network is replaced by the lightweight image super-resolution reconstruction network, so that the detection precision of a small target object is improved, the parameter quantity of a network model is greatly reduced, and the training and the deployment are easier.
3. The invention uses the YOLOv3 as a detection module of the system, and ensures a good compromise in detection precision and detection speed.
Drawings
FIG. 1 is a block diagram of an application system of the present invention in traffic monitoring;
FIG. 2 is a schematic diagram of a VIDB block structure according to the present invention;
FIG. 3 is a schematic diagram of a lightweight image super-resolution reconstruction network structure according to the present invention;
fig. 4 is a schematic structural view of a CCA module of the present invention;
FIG. 5 is a schematic diagram of an ESA module structure according to the present invention;
FIG. 6 is a schematic diagram of the YOLOv3 algorithm of the present invention;
the embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
Detailed Description
The invention is further illustrated by the following examples:
as shown in fig. 3, a super-resolution reconstruction network structure of a lightweight image is composed of a shallow layer feature extraction module, a deep layer feature extraction module, an attention module and a reconstruction module;
the shallow feature extraction module maps the input image to a high-dimensional feature space, comprising a convolution layer of convolution kernel size 3*3, expressed as x 0 =f ext (I LR );
Deep feature extraction module composed of multiple large receptive field information distillation blocks (Vast-redundant-field Information distillation Block, VIDB), for x 0 Deep feature extraction is performed by progressively refining the extracted features by the VIDBs stack, a process expressed as
Specifically, as shown in fig. 2, the VIDB block performs a convolution operation with a convolution kernel size of 1*1 on the input image, and then divides the input image into two branches, namely a first branch and a second branch, and the processing results of the first branch and the second branch are output after being added, and perform a pixel normalization operation.
The first branch is a direct communication path, the second branch is activated by an activation function based on a gate function, characteristic weight distribution is carried out by a channel attention module based on information distillation and large convolution kernel depth separation convolution operation, characteristic fusion between characteristic graphs is carried out by a convolution layer with the size of 1*1, and the characteristic fusion is added with the direct communication path.
The key innovation point of the invention is to provide a channel attention module based on an activation function of a gate function and based on information distillation and large convolution kernel depth separation convolution operation, namely:
the activation function based on the gate function divides an input feature map with the size of C.H.W (wherein C refers to channel number is 64, H.W is the size of a picture after clipping and 480.480 pixels) into two feature maps with the size of C/2.W.H according to channel number, and then outputs the feature maps after multiplication processing, so that the effect similar to that of the traditional activation function is achieved but the parameter quantity is greatly reduced.
The channel attention module divides the activated characteristic diagram into two branches for processing, namely a third branch and a fourth branch, and the processing results of the third branch and the fourth branch are output after being added, and a convolution operation with the convolution kernel size of 1*1 is carried out.
The third branch firstly carries out a convolution operation with a convolution kernel size of 1*1, and then carries out a depth convolution operation with a convolution kernel size of 9*9, a cloth length of 1 and a filling of 4; the third branch carries out the depth separation convolution of the large convolution kernel, the large convolution kernel of 9*9 is beneficial to the extraction of the picture characteristic information, the depth separation operation divides one convolution layer into point convolution (Pointwise) and depth convolution (stride) of 1, and the filling (padding) is the depth convolution (depthwise) of 4, so that the parameter number is greatly reduced; the fourth branch firstly carries out convolution operation with the convolution kernel size of 1*1 and then is activated by a GELU activation function; the parameter is reduced, so that training and deployment are easier, the real-time performance is good, and the method is easy to apply to specific scenes.
An attention module, which consists of an ESA module (Efficient ChannelAttention, high-efficiency channel attention) and a CCA module (Coordinated attention);
specifically, the attention module is added behind the deep feature extraction module to further extract the performance of the neural network, and the ESA module is a lightweight channel attention module which can learn the cross-channel interaction relationship through a one-dimensional convolution layer under the condition of not reducing the dimension; the CCA module is capable of embedding location information into the channel attention, which may generate an attention map with spatial selectivity.
As shown in fig. 4, the CCA module performs contrast loss calculation and adaptive global pooling on the input picture, performs addition processing on the input picture, outputs the result after the addition processing, sequentially passes through convolution operation with a convolution kernel size of 1*1, activation of a Relu activation function, and convolution operation with a convolution kernel size of 1*1, and finally performs multiplication processing on the result with the input picture to obtain an output picture, thereby completing feature learning based on position information.
As shown in fig. 5, the ESA module performs a convolution operation with a convolution kernel size of 1*1 on an input picture, then divides the input picture into two branches, namely a fourth branch and a fifth branch, performs addition processing on the processing results of the fourth branch and the fifth branch, performs a convolution operation with a convolution kernel size of 1*1 on the added feature map, fuses the features and recovers the number of channels, activates the feature map through a sigmoid linear activation function, performs multiplication processing on the feature map and the input picture, obtains an output picture, and learns cross-channel interaction relations under the condition of not reducing dimensions.
The fourth branch performs a convolution operation with a convolution kernel size of 1*1; the fifth branch sequentially carries out convolution operation with the convolution kernel size of 3*3, the step length of 2 and the filling of 1; a max pooling layer of core size 7*7 with step size 7; the original image size is scientifically recovered by the deep convolution operation with the convolution kernel size of 3*3, the activation of a GELU activation function and bilinear interpolation processing.
The reconstruction module is an up-sampling module, and the reconstruction is completed by adopting a Pixelfuffle algorithm, which can realize efficient sub-pixel convolution, has a step length of 1/r and takes a shape of (, c r) 2 H, W) are reconstructed into tensors of the shape (, C, H, W).
As shown in fig. 1, an application of a small target detection algorithm in traffic monitoring, which reconstructs a network structure based on light-weight image super-resolution, comprises the following steps:
s1, performing preliminary processing on a picture to be detected, wherein the specific steps are as follows:
s11, performing format conversion on a low-resolution image to be processed to obtain a low-resolution YCbCr (Y represents a brightness component, cb represents a blue chrominance component, and Cr represents a red chrominance component) image; compared with RGB image, YCbCr image only occupies little bandwidth in transmission process, so the invention carries out format conversion;
s12, equally dividing the low-resolution YCbCr image into a plurality of sub-images according to rows and columns, wherein the size of the sub-images after dividing is 480 pixels;
s13, randomly rotating the sub-images by 90 degrees or 180 degrees to enhance data so as to provide more data samples, and greatly reducing the storage space required by the feature map in network propagation;
s2, constructing a lightweight image super-resolution reconstruction network, and completing network training, wherein the training loss function adopts L2 loss;
s3, carrying out edge sharpening on the image subjected to super-resolution processing;
s4, inputting the image with the sharpened edges into a detection module for detection and obtaining a small traffic target detection result; the detection module of the invention adopts the YOLOv3 algorithm to divide the image into a plurality of areas and predicts the probability of the boundary box and each area.
Specifically, as shown in fig. 6, the present invention uses Yolov3 (You Only Look Once) as a detection module, which, although not the most accurate algorithm, selects a compromise between accuracy and speed, which is suitable for deployment in practical applications. The YOLOv3 algorithm uses a single neural network to act on the image, divides the image into a plurality of areas, predicts the probability of the boundary box and each area, uses the FPN technology and the multi-level detection method, and has good small target detection capability.
YOLOv3 uses only convolutional layers, using dark-53 as the backbone network, it contains 53 convolutional layers, each followed by a batch normalization (batch normalization) layer and a leak ReLU (linear activation layer), the entire framework can be divided into 3 parts: the method is characterized in that the method comprises the steps of inputting an image x into a Darkenet-53 network structure, carrying out a series of convolution and a staggered network to obtain feature maps (namely feature maps 1, 2 and 3 in the maps) of original images 1/8, 1/16 and 1/32 respectively, wherein the process is a feature extraction process, feature fusion of feature maps of different sizes is carried out in the feature extraction process to obtain stronger feature expressive force, and because of different sizes, up-sampling operation is needed in the middle to change the feature maps into the same size, and then stacking, fusion and corresponding convolution operation are carried out. Finally, a 255-dimensional feature map is obtained, and then a convolution operation with a convolution kernel 3*3 and a convolution operation with a convolution kernel 1*1 are needed to obtain a 75-dimensional feature vector, wherein the feature vector contains 3 x (4+1+20) information which is expressed as 3 prediction frames (boundingbox) corresponding to the target category and the position information in the original map, and each prediction frame is composed of 25 parameters including 4 position coordinate information, 1 category confidence coefficient and 20 category predicted values.
According to the invention, by setting the lightweight image super-resolution reconstruction network structure, the resolution of a small target is effectively enlarged, the characteristic information quantity is increased, and compared with the traditional target detection algorithm-based optimization method, the method has the advantages that the lifting precision is higher and more effective.
According to the invention, the GAN-based super-resolution network is replaced by the lightweight image super-resolution reconstruction network, so that the detection precision of a small target object is improved, the parameter quantity of a network model is greatly reduced, and the training and the deployment are easier.
The invention uses the YOLOv3 as a detection module of the system, and ensures a good compromise in detection precision and detection speed.
In the present invention, the terms "first", "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present invention, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise.
While the invention has been described above by way of example, it will be apparent that the invention is not limited to the above embodiments, but is intended to be within the scope of the invention, as long as various modifications of the method concepts and technical solutions of the invention are adopted, or as long as modifications are directly applicable to other applications without modification.

Claims (10)

1. The light-weight image super-resolution reconstruction network structure is characterized by comprising a shallow layer feature extraction module, a deep layer feature extraction module, an attention module and a reconstruction module;
the shallow feature extraction module maps the input image to a high-dimensional feature space, comprising a convolution layer of convolution kernel size 3*3, expressed as x 0 =f ext (I LR );
Deep feature extraction module composed of multiple large receptive field information distillation blocks (Vast-redundant-field Information distillation Block, VIDB), for x 0 Deep feature extraction is performed by progressively refining the extracted features by the VIDBs stack, a process expressed as
An attention module consisting of two parts, an ESA module (Efficient Channel Attention ) and a CCA module (Coordinate Attention, coordinated attention);
the reconstruction module is used for completing reconstruction by adopting a Pixelshuffle algorithm and takes the shape as (, c r) 2 H, W) are reconstructed into tensors of the shape (, C, H, W).
2. The super-resolution reconstruction network structure according to claim 1, wherein the VIDB block performs a convolution operation with a convolution kernel size of 1*1 on the input image, and then divides the input image into two branches, namely a first branch and a second branch, and the processing results of the first branch and the second branch are output after being added, and perform a pixel normalization operation.
3. The super-resolution reconstruction network structure of a lightweight image according to claim 2, wherein the first branch is a direct communication path, the second branch is activated by an activation function based on a gate function, then characteristic weight distribution is performed by a channel attention module based on information distillation and large convolution kernel depth separation convolution operation, then characteristic fusion between characteristic graphs is performed by a convolution layer with a convolution kernel size of 1*1, and the fused characteristic fusion is added with the direct communication path.
4. A lightweight image super-resolution reconstruction network structure according to claim 3, wherein the input feature map with the size of c×h×w is divided into two feature maps with the size of C/2×w×h by the number of channels based on the activation function of the gate function, and then the two feature maps are output after being multiplied; the channel attention module divides the activated characteristic diagram into two branches for processing, namely a third branch and a fourth branch, and the processing results of the third branch and the fourth branch are output after being added, and a convolution operation with the convolution kernel size of 1*1 is carried out.
5. The super-resolution reconstruction network structure of claim 4, wherein said third branch first performs a convolution operation with a convolution kernel size of 1*1, and then performs a deep convolution operation with a convolution kernel size of 9*9, a length of 1, and a fill of 4; the fourth branch first performs a convolution operation with a convolution kernel size of 1*1 and then activates by the gel activation function.
6. The super-resolution reconstruction network structure according to claim 5, wherein the CCA module performs contrast loss calculation and adaptive global pooling on the input picture, performs addition processing on the input picture, outputs the result after the addition processing, sequentially performs convolution operation with a convolution kernel size of 1*1, activation of a Relu activation function, convolution operation with a convolution kernel size of 1*1, and finally performs multiplication processing on the input picture to obtain an output picture, thereby completing feature learning based on position information.
7. The super-resolution reconstruction network structure of claim 6, wherein the ESA module performs a convolution operation with a convolution kernel size of 1*1 on the input picture, then divides the input picture into two branches, namely a fourth branch and a fifth branch, performs addition processing on the processing results of the fourth branch and the fifth branch, performs a convolution operation with a convolution kernel size of 1*1 on the added feature map, merges features and recovers the number of channels, activates the feature map through a sigmoid linear activation function, performs multiplication processing on the input picture, and obtains an output picture, and learns the cross-channel interaction relationship without reducing the dimension.
8. The light-weight image super-resolution reconstruction network structure according to claim 7, wherein the fourth branch performs a convolution operation with a convolution kernel size of 1*1; the fifth branch sequentially carries out convolution operation with the convolution kernel size of 3*3, the step length of 2 and the filling of 1; a max pooling layer of core size 7*7 with step size 7; the original image size is scientifically recovered by the deep convolution operation with the convolution kernel size of 3*3, the activation of a GELU activation function and bilinear interpolation processing.
9. Use of a small object detection algorithm in traffic monitoring based on the lightweight image super-resolution reconstruction network structure of claim 8, characterized by the following steps:
s1, performing preliminary processing on a picture to be detected, wherein the specific steps are as follows:
s11, performing format conversion on a low-resolution image to be processed to obtain a low-resolution YCbCr image;
s12, equally dividing the low-resolution YCbCr image into a plurality of sub-images according to rows and columns, wherein the size of the sub-images after dividing is 480 pixels;
s13, randomly rotating the sub-images by 90 degrees or 180 degrees to enhance data so as to provide more data samples and reduce the storage space required by the feature map in network propagation;
s2, constructing a lightweight image super-resolution reconstruction network, and completing network training, wherein the training loss function adopts L2 loss;
s3, carrying out edge sharpening on the image subjected to super-resolution processing;
s4, inputting the image with the sharpened edges into a detection module for detection and obtaining a small traffic target detection result.
10. The use of a small object detection algorithm in traffic monitoring according to claim 9, wherein the detection module employs YOLOv3 algorithm to divide the image into a plurality of regions and predict the probability of bounding boxes and each region.
CN202310412581.9A 2023-04-18 2023-04-18 Application of small target detection algorithm in traffic monitoring Pending CN116524432A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310412581.9A CN116524432A (en) 2023-04-18 2023-04-18 Application of small target detection algorithm in traffic monitoring

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310412581.9A CN116524432A (en) 2023-04-18 2023-04-18 Application of small target detection algorithm in traffic monitoring

Publications (1)

Publication Number Publication Date
CN116524432A true CN116524432A (en) 2023-08-01

Family

ID=87407521

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310412581.9A Pending CN116524432A (en) 2023-04-18 2023-04-18 Application of small target detection algorithm in traffic monitoring

Country Status (1)

Country Link
CN (1) CN116524432A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116721302A (en) * 2023-08-10 2023-09-08 成都信息工程大学 Ice and snow crystal particle image classification method based on lightweight network

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116721302A (en) * 2023-08-10 2023-09-08 成都信息工程大学 Ice and snow crystal particle image classification method based on lightweight network
CN116721302B (en) * 2023-08-10 2024-01-12 成都信息工程大学 Ice and snow crystal particle image classification method based on lightweight network

Similar Documents

Publication Publication Date Title
US11651477B2 (en) Generating an image mask for a digital image by utilizing a multi-branch masking pipeline with neural networks
CN111104903B (en) Depth perception traffic scene multi-target detection method and system
US11393100B2 (en) Automatically generating a trimap segmentation for a digital image by utilizing a trimap generation neural network
CN112232349A (en) Model training method, image segmentation method and device
CN111696110B (en) Scene segmentation method and system
CN110390314B (en) Visual perception method and equipment
CN114549563A (en) Real-time composite insulator segmentation method and system based on deep LabV3+
CN112215074A (en) Real-time target identification and detection tracking system and method based on unmanned aerial vehicle vision
CN110570402B (en) Binocular salient object detection method based on boundary perception neural network
CN114612456B (en) Billet automatic semantic segmentation recognition method based on deep learning
CN114724155A (en) Scene text detection method, system and equipment based on deep convolutional neural network
CN116524432A (en) Application of small target detection algorithm in traffic monitoring
CN110503049B (en) Satellite video vehicle number estimation method based on generation countermeasure network
CN112613434A (en) Road target detection method, device and storage medium
CN116189191A (en) Variable-length license plate recognition method based on yolov5
CN112989919B (en) Method and system for extracting target object from image
CN117789077A (en) Method for predicting people and vehicles for video structuring in general scene
CN115761552B (en) Target detection method, device and medium for unmanned aerial vehicle carrying platform
CN112487911A (en) Real-time pedestrian detection method and device based on improved yolov3 in intelligent monitoring environment
CN114663654B (en) Improved YOLOv4 network model and small target detection method
CN116311154A (en) Vehicle detection and identification method based on YOLOv5 model optimization
CN113255646B (en) Real-time scene text detection method
CN111047571A (en) Image salient target detection method with self-adaptive selection training process
CN117291802B (en) Image super-resolution reconstruction method and system based on composite network structure
CN112926588B (en) Large-angle license plate detection method based on convolutional network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination