CN113837058A - Lightweight rainwater grate detection method coupled with context aggregation network - Google Patents

Lightweight rainwater grate detection method coupled with context aggregation network Download PDF

Info

Publication number
CN113837058A
CN113837058A CN202111102992.5A CN202111102992A CN113837058A CN 113837058 A CN113837058 A CN 113837058A CN 202111102992 A CN202111102992 A CN 202111102992A CN 113837058 A CN113837058 A CN 113837058A
Authority
CN
China
Prior art keywords
image
module
rainwater grate
model
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111102992.5A
Other languages
Chinese (zh)
Other versions
CN113837058B (en
Inventor
车明亮
曹鑫亮
杨帆
郭有志
李凯隆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nantong University
Original Assignee
Nantong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nantong University filed Critical Nantong University
Priority to CN202111102992.5A priority Critical patent/CN113837058B/en
Publication of CN113837058A publication Critical patent/CN113837058A/en
Application granted granted Critical
Publication of CN113837058B publication Critical patent/CN113837058B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a lightweight rainwater grate detection method of a coupling context aggregation network, which comprises 7 steps of image preprocessing, parameter initialization, data set generation, rainwater grate detection model construction, model training and prediction, mask processing and post processing. The rainwater grate spatial distribution prior probability is considered in model prediction so as to further improve the rainwater grate detection precision. The detection method has high real-time performance, the light network architecture reduces the consumption of computing resources, the model loading and feedforward time is shortened, and the operation speed is increased.

Description

Lightweight rainwater grate detection method coupled with context aggregation network
Technical Field
The invention relates to the field of image target detection, in particular to a light-weight small target detection method coupling attention and context.
Background
The rainwater grate is a rectangular appliance which is positioned on the impervious layer and used for draining water, can filter rainwater and intercept large-volume dirt such as leaves, plastic bags, paper shells, food waste and the like in the retained rainwater. However, the rainwater grate may be damaged and deformed to different degrees after being used for a long time, so that potential safety hazards are generated, and in continuous heavy rain weather, road rainwater interception is increased, so that the potential safety hazards of rain waterlogging are generated, and regular maintenance and replacement are required. In urban roads, the laying of the rainwater grate is related to local rainfall, road width, road area and other factors. In the later maintenance of the rainwater grates, the positions, the quantity and the states of the rainwater grates in the original road and the newly added road need to be checked regularly. In the face of a large-scale urban road, the target detection technology is a main means for acquiring basic information of the rainwater grate quickly and at low cost.
In the present stage, deep learning is an important direction of a target detection technology, and a two-stage model and a single-stage model appear. The former accomplishes the detection task by extracting and classifying proposed regions, and representative methods thereof include R-CNN series models such as R-CNN, Fast R-CNN (Girshick et al 2014; Girshick et al 2015; Ren et al 2015) and the like, and SPP Net models (He et al 2015). The latter can directly and quickly output the target category and the corresponding position without extracting a suggestion area by using a Single network, and representative models thereof comprise a Single Shot multi-box Detector (SSD) model (Liu et al, 2016) and a YOLO (You Only Look one) series model V1-V5 (Redmon et al, 2016; Redmon et al, 2017; Redmon et al, 2018; Bochkovskiy et al, 2020). The detection precision of the two-stage model is usually high, but the real-time performance is poor, and the detection precision of the single-stage model is just opposite. However, the detection performance of any model in terms of small-size targets is not high, and the performance difference with the performance of detecting large targets is large.
The rainwater grate is a small target, and the area occupation ratio of the rainwater grate in a single image is usually very small in a street view image shot by an unmanned aerial vehicle or a street view car. Taking a street view image of a hundred-degree map as an example, the area percentage of the rain grate is less than 8 per thousand on average and is far smaller than the size of a water Bottle (Bottle) which is the minimum target in the VOC 2007 data set (the average area percentage is about 5%). Therefore, the existing deep learning model has great limitation in the case of the rainwater grate detection with such a small size. In addition, the rainwater grate can be degenerated to different degrees after being put into use, and can be shielded by shadows, covered by leaves and garbage, encroached by stone cracks and weeds, covered by road traffic markings and the like in a real scene. These phenomena weaken the distinguishing characteristics of the rain grate and the background object, and further increase the detection difficulty. Although some advanced techniques such as feature pyramids (Lin et al, 2017), attention mechanisms (Wang et al, 2017; Woo et al, 2018) and context information (Lin et al, 2019) etc. may be used to couple into the model to improve detection accuracy. However, the use of these techniques generally increases the computational complexity of the model, increases the number of parameters of the model, and extends the training and running time of the model. In consideration of reducing computing resources and keeping the same detection precision, particularly for embedded equipment, the light detection method has higher detection efficiency. Therefore, further technical optimization of the prior art is required.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to solve the defects in the prior art, and provides a lightweight rainwater grate detection method coupled with a context aggregation network, so as to solve the technical problems of high missing detection rate and false detection rate, large model parameter quantity, low operation efficiency and the like in the conventional rainwater grate detection method.
The technical scheme is as follows: the invention relates to a lightweight rainwater grate detection method coupled with a context aggregation network, which comprises the following steps:
(1) image preprocessing: according to the obtained street view image data, marking the position of a rainwater grate in the street view image by using an image marking tool to generate rainwater grate image-label data; enhancing the street view image by using an image processing technology;
(2) initializing parameters: initializing parameters related to the detection method;
(3) generating a data set: performing image-label screening on the image-label data in the step 1 to ensure that the image data and the label data are in one-to-one correspondence; according to the initial parameters set in the step 2, carrying out size adjustment and channel normalization on the image, and converting the label data into grid data; dividing a data set into training data and testing data according to the empirical ratio of the training samples to the testing samples; generating a rainwater grate spatial distribution prior probability graph according to training data;
(4) constructing a rainwater grate detection model: the model framework mainly comprises a framework network and a context aggregation network, wherein the framework network is composed of a lightweight Conv1 module, a series of serially connected Block modules and Regressor modules; the context aggregation network is formed by performing channel aggregation on a shallow context feature map and a deep target feature map; by utilizing the context aggregation network, the detail information of the shallow target object and the semantic information of the deep target object can be fused, so that the attenuation of the target object information in network transmission, particularly the small target object information, is reduced;
(5) model training and prediction: training the detection model by using the training data generated in the step 3 according to the initial parameters set in the step 2 until convergence, and recording and storing the optimal model weight; after training is finished, the optimal model weight is loaded, and the detection model constructed in the step 4 is used for rainwater grate prediction;
(6) mask treatment: according to the result of model prediction, masking the rainwater grate spatial distribution prior probability graph by using the rainwater grate spatial distribution prior probability graph to obtain a masked frame result;
(7) and (3) post-treatment: and (3) according to the initial parameters set in the step (2), carrying out non-maximum suppression and deduplication processing on the frame in the step (6), carrying out space transformation on the coordinates of the frame, restoring the coordinates into image absolute coordinates, and testing by using the test data obtained in the step (3).
Preferably, in step (1), the obtained street view image data includes near 2000 high-definition color images, and the resolution of the street view image data is 1024 × 512 pixels; scenes in the street view image data mainly comprise roads, sidewalks, street trees, road isolation belts, rain grates and buildings; the rainwater grate is mainly distributed on two sides of a road and at the upstream of a pedestrian crosswalk, the area ratio in the whole image is very small, the average value is only 5 per thousand, an open source annotation tool LabelMe is adopted as an image annotation tool, and the tag data storage format is JSON or XML; the image enhancement method mainly relates to image denoising, image sharpening and image equalization operation; the image denoising method mainly adopts a median filter with a convolution kernel of 3 multiplied by 3 to eliminate salt and pepper noise; the image sharpening mainly uses Laplacian Operator in the field of 4 to highlight the contour of the ground object; the image equalization method mainly comprises global histogram equalization, so that the brightness consistency of all areas of an image is kept and the image definition of partial areas is improved.
2. The light rainwater grate detection method coupled with the context aggregation network according to claim 1, wherein the method comprises the following steps: in step (2), the parameters to be initialized mainly include: number of classifications CLS _ NUM =1, grid SIZE S =7, predicted number of frames per grid B =2, image SIZE IMG _ SIZE =448, BATCH SIZE BATCH _ SIZE =8, and learning rate LR =2 × 10-4LOSS threshold LOSS _ THR =50, confidence threshold CONF _ THR =0.8, and merge ratio threshold IOU _ THR = 0.5.
Preferably, in the step (3), the image data is resampled by a nearest neighbor method according to the set image SIZE IMG _ SIZE and the label data is subjected to SIZE transformation; the image data can be converted into a tensor with the value range of [ -1,1] after being normalized by a channel; converting the label data into grid data according to the set grid number S, and simultaneously converting the frame coordinates of the label from image absolute coordinates into grid relative coordinates; according to the training data, the spatial position information of the rain grate in the image is counted, and a spatial distribution prior probability map of the rain grate is drawn.
Preferably, in the step (4), the skeleton network is mainly formed by connecting a Conv1 module, a Block1 module, a Block2 module, a Block3 module, a Block4 module and a Regressor module in series; the context aggregation network Feature Fusion module is connected behind the Conv1 module, the Block1 module and the Block4 module;
in the Conv1 module, a base convolution operation with a convolution kernel of 3 and a maximum pooling operation with a span of 2 would down-sample the input feature map by a factor of 4.
In the basic convolution module Conv, a three-layer network module is formed by convolution operation, BatchNorm batch normalization and ReLU linear mapping;
the Block module takes a Fire module of the SqueezeNet model as a basic unit, and builds an effective Block by stacking different Fire modules and connecting pooling layers; the Fire module comprises a compression layer and an expansion layer; the compression layer is composed of a group of continuous 1 x 1 convolutions, and the expansion layer is composed of a group of continuous 1 x 1 convolutions and a group of continuous 3 x 3 convolution concatenations; the number of parameters can be greatly reduced by using a special combination of 1 × 1 convolution and 3 × 3 convolution; wherein, Block1 and Block2 have the same structure, and both comprise 2 Fries modules and 1 maximum pooling layer; they differ in the number of channels; through the operations of Block1 and Block2, the space size of the input image is respectively reduced to 1/8 and 1/16 of the original size; block3 contains 4 Fries modules and 1 max pooling layer to further increase the network depth; through Block3 operation, the space size of the input image is respectively reduced to 1/32 of the original size; block4 contains 2 Fries modules and 1 global average pooling layer for reducing the feature map spatial size to the target feature map size;
in the context aggregation network, the context feature map is subjected to 1 × 1 convolution module and 3 × 3 convolution module respectively, global average pooling and channel aggregation to form an aggregation feature map; the convolution module is still composed of convolution operation, BatchNorm batch normalization and ReLU linear mapping; the 1 × 1 convolution operation is adopted to reduce the number of channels of the context feature map, i.e. to the number of channels of the target feature maprThe number of times of the total number of the parts,rthe value is [0.1,0.5 ]]This is done to ensure that the amount of context information does not obscure the target feature map itself; the 3 × 3 convolution operation is adopted to perform the first downsampling; the global average pooling operation is to perform a direct downsampling; finally, channel aggregation is carried out on the context feature diagram to form an aggregated feature diagram, and the aggregated feature diagram and the target are obtainedThe characteristic diagram is subjected to channel aggregation again and sent to a Regressor for processing;
the Regressor module consists of three layers of series-connected 1 multiplied by 1 convolution modules and a Sigmoid function; wherein the volume block is still composed of convolution operation, BatchNorm batch normalization and ReLU linear mapping; after the target characteristic diagram is processed by the regressor, a characteristic diagram vector with space size of S multiplied by S and channel number of C is obtained; similar to the design idea of the YOLO model, each feature vector comprises B prediction frames and a classification probability; b prediction frames are respectively used for predicting B targets, if the frames are overlapped, the frames are regarded as one target, and each frame is a 5-dimensional vector and comprises a target existence probability conf, an upper left corner coordinate (x, y) and a target size (w, h); if the detection target includes CLS _ NUM classes, the number of channels C is B × 5+ CLS _ NUM.
Preferably, in the step (5), an RMSprop algorithm is adopted by an optimizer of the training model, and a StepLR strategy is adjusted at equal intervals for learning rate attenuation; and when the LOSS function value of the detection model is lower than the LOSS threshold LOSS _ THR, finishing training and storing the model parameters.
Preferably, in the step (6), a mask map is generated by reading the rainwater grate spatial distribution prior probability map and performing image binarization on the rainwater grate spatial distribution prior probability map; using the mask map to model-predicted bounding results, i.e. feature map vectors, in terms ofMThe operator performs product operation to finally obtain a mask frame result;Mthe operator is defined as follows:
Figure 810971DEST_PATH_IMAGE001
in the formula (I), the compound is shown in the specification,Vis the first in the feature mapiLine, firstjThe feature vector of the column is determined,bthe number of the frame is shown as the frame number,confis the confidence probability in the bounding box;Pis the first in the mask diagramiLine, firstjThe prior probability values of the columns.
Preferably, in step (7), the NMS deduplication process firstly screens the frames with confidence higher than the threshold CONF _ THR, and then removes the frames with high coincidence rate according to the intersection ratio threshold IOU _ THR. The coordinate transformation of the frame is transformed from the relative coordinates of the grid to the absolute coordinates of the image.
Compared with the prior art, the invention discloses a light-weight small target detection method coupling attention and context, which has the following beneficial effects:
compared with some existing detection methods, the beneficial effects of the invention comprise the following points.
1) The invention designs a model structure of the framework network coupling context information based on a lightweight convolution module and a Fire module; meanwhile, the rainwater grate spatial distribution prior probability factor is considered in model prediction, so that the target detection precision is obviously improved; the Average Precision (AP) of the rainwater grate detected by the method can reach 0.79, wherein the Recall rate (Recall) reaches 0.87, the Precision rate (Precision) reaches 0.91, the Precision is improved by 12% compared with a YOLO model, the Precision is improved by 1% compared with a VGG-YOLO model, and the Precision is improved by 15% compared with an SSD model.
2) The model framework designed by the invention mainly comprises a 1 × 1 convolution module and a 3 × 3 convolution module, so the parameters of the model are light, the weight file of the model is only 13.98MB, which is only 6% of the YOLO model, and is 23% of the VGG-Yolo model and 15% of the SSD model.
3) The detection method designed by the invention has higher real-time performance, the light network architecture reduces the consumption of computing resources, shortens the model loading and feedforward time, and improves the operation speed. The frame Frequency (FPS) of the detection method designed by the invention can reach 56, which is increased by 11 compared with a YOLO model, by 45 compared with a VGG-Yolo model, and by 22 compared with an SSD model.
Drawings
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is a prior probability graph of spatial distribution of rain grate in the embodiment of the present invention;
FIG. 3 is a structural diagram of a rainwater grate detection model designed by the invention;
FIG. 4 is a block structure diagram included in a rainwater grate detection model according to the present invention;
FIG. 5 is a structure diagram of a context aggregation network included in a rainwater grate detection model according to the present invention;
FIG. 6 is a diagram of a mask process designed in accordance with the present invention;
fig. 7 is a diagram illustrating an actual effect of the rainwater grate according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
A light rainwater grate detection method coupled with a context aggregation network is shown in figure 1, and comprises the following specific steps:
(1) image preprocessing: according to the obtained street view image data, the positions of the rain grates are marked in the street view image by using an image marking tool, and rain grate image-label data are generated. And carrying out enhancement processing on the street view image by utilizing an image processing technology.
(2) Initializing parameters: initializing parameters related to the detection method;
(3) generating a data set: and (3) carrying out image-label screening on the image-label data in the step (1) to ensure that the image data and the label data are in one-to-one correspondence. And (3) according to the initial parameters set in the step (2), carrying out size adjustment and channel normalization on the image, and converting the label data into grid data. And dividing the data set into training data and testing data according to the empirical ratio of the training samples to the testing samples. Generating a rainwater grate spatial distribution prior probability graph according to training data;
(4) constructing a rainwater grate detection model: the model framework mainly comprises a skeleton network and a context aggregation network. The framework network is composed of a lightweight Conv1 module, a series of serially connected Block modules and Regressor modules. The context aggregation network is formed by channel aggregation of a shallow context feature map and a deep target feature map. By utilizing the context aggregation network, the superficial layer of target detail information and the deep layer of target semantic information can be fused, so that the attenuation of target information in network transmission, particularly small target information, is reduced.
(5) Model training and prediction: and (3) training the detection model by using the training data generated in the step (3) according to the initial parameters set in the step (2) until convergence, and recording and storing the optimal model weight. After training is finished, the optimal model weight is loaded, and the detection model constructed by the method is used for rainwater grate prediction;
(6) mask treatment: and according to the result of model prediction, masking the rainwater grate spatial distribution prior probability graph by using the rainwater grate spatial distribution prior probability graph to obtain a masked frame result.
(7) And (3) post-treatment: and (3) according to the initial parameters set in the step (2), carrying out Non-Maximum Suppression (NMS) duplicate removal processing on the frame in the step (6), carrying out space transformation on the coordinates of the frame, reducing the coordinates into image absolute coordinates, and testing by using the test data obtained in the step (3).
Further preferably, in the step (1), the street view image data obtained includes near 2000 high-definition color images, and the resolution is 1024 × 512 pixels. The scenes in the street view image data mainly include roads, sidewalks, street trees, road isolation strips, buildings and the like. The rainwater grate is mainly distributed on two sides of a road and at the upstream of a pedestrian crosswalk, and the area ratio in the whole image is very small (the average value is only 5 per thousand). The image annotation tool used is not unique, in the embodiment, an open source annotation tool LabelMe is used, and the tag data storage format is JSON or XML; . The image enhancement method mainly relates to operations such as image denoising, image sharpening, image equalization and the like. The image denoising method mainly adopts the median filtering with a convolution kernel of 3 multiplied by 3 to eliminate the salt and pepper noise. Image sharpening mainly uses a laplacian Operator (Laplace Operator) in the 4-domain to highlight the contour of the ground object. The image equalization method mainly comprises global histogram equalization, so that the brightness consistency of all areas of an image is kept and the image definition of partial areas is improved;
further preferred, inIn the step (2), the parameters to be initialized mainly include: number of classifications CLS _ NUM =1, grid SIZE S =7, predicted number of frames per grid B =2, image SIZE IMG _ SIZE =448, BATCH SIZE BATCH _ SIZE =8, and learning rate LR =2 × 10-4LOSS threshold LOSS _ THR =50, confidence threshold CONF _ THR =0.8, intersection ratio threshold IOU _ THR =0.5, and so on;
further preferably, in the step (3), the image data is resampled by a nearest neighbor method according to the set image SIZE IMG _ SIZE and the label data is SIZE-converted. The image data can be converted into a tensor with a value range of [ -1,1] after being normalized by a channel. And converting the label data into grid data according to the set grid number S, and simultaneously converting the frame coordinates of the label from image absolute coordinates into grid relative coordinates. The training sample to test sample ratio was maintained at 8: 2. According to the training data, the spatial position information of the rain grate in the image is counted, and a spatial distribution prior probability graph of the rain grate is drawn, wherein the spatial distribution prior probability graph is shown in figure 2;
further preferably, in the step (4), the backbone network is composed of a Conv1 module, a Block1 module, a Block2 module, a Block3 module, a Block4 module, a Regressor module, and the like, which are connected in series, as shown in fig. 3. The context aggregation network Feature Fusion is connected behind the Conv1 module, the Block1 module and the Block4 module, as shown in fig. 3.
In the Conv1 module, a base convolution (Conv) operation with a convolution kernel of 3 and a max pooling operation with a span of 2 would down-sample the input feature map by a factor of 4, as shown in FIG. 4 a.
In the basic convolution module Conv, the convolution operation, BatchNorm batch normalization and ReLU linear mapping constitute a three-layer network module, as shown in fig. 4 f.
The Block module takes a Fire module of an SqueezeNet model (Iandola and the like, 2017) as a basic unit, and builds an effective Block by stacking different Fire modules and connecting pooling layers. The Fire module comprises a compression layer and an expansion layer. The compression layer is composed of a set of successive 1 x 1 convolutions and the extension layer is composed of a set of successive 1 x 1 convolutions and a set of successive 3 x 3 convolution concatenations. The number of parameters can be reduced considerably using a special combination of 1 × 1 convolution and 3 × 3 convolution. In the present invention, Block1 and Block2 have the same structure, each containing 2 Fries modules and 1 max pooling layer. They differ in the number of channels as shown in fig. 4 b. The spatial size of the input image is reduced to 1/8 and 1/16, respectively, at its original size, via Block1 and Block2 operations. Block3 contains 4 Fries blocks and 1 max pooling layer to further increase the network depth, as shown in FIG. 4 c. By the Block3 operation, the spatial size of the input image is reduced to 1/32 of the original size. Block4 contains 2 Fries blocks and 1 global average pooling layer to reduce the feature map spatial size to the target feature map size, as shown in FIG. 4 d.
In the context aggregation network, the context feature map is passed through a 1 × 1 convolution module and a 3 × 3 convolution module, respectively, and the global average pooling and channel aggregation are performed to form an aggregation feature map, as shown in fig. 5. Wherein the convolution module is still composed of convolution operation, BatchNorm batch normalization and ReLU linear mapping, as shown in fig. 4 f. The 1 × 1 convolution operation is adopted to reduce the number of channels of the context feature map, i.e. to the number of channels of the target feature maprThe number of times of the total number of the parts,rthe value is [0.1,0.5 ]]This is done to ensure that the amount of context information does not obscure the target feature map itself. The first downsampling is performed using a 3 x 3 convolution operation. The global average pooling operation is a direct downsampling. And finally, performing channel aggregation on the context characteristic diagram to form an aggregated characteristic diagram, performing channel aggregation on the aggregated characteristic diagram and the target characteristic diagram again, and sending the aggregated characteristic diagram and the target characteristic diagram into a Regressor for processing.
The Regressor module consists of three layers of series connected 1 × 1 convolution modules and Sigmoid function, as shown in fig. 4 e. Where the volume block is still composed of convolution operations, BatchNorm batch normalization and ReLU linear mapping, as shown in fig. 4 f. After the target feature map is processed by the regressor, a feature map vector with space size of S multiplied by S and channel number of C is obtained. Similar to the design idea of the YOLO model (Redmon et al, 2016), each feature vector contains B prediction frames and a classification probability. The B prediction frames are used to predict B targets (if the frames overlap, the frame is regarded as one target), and each frame is a 5-dimensional vector including a target existence probability conf, an upper-left coordinate (x, y), and a target size (w, h). If the detection target comprises CLS _ NUM classes, the number of channels C is B × 5+ CLS _ NUM;
further preferably, in the step (5), the RMSprop algorithm is used by the optimizer for training the model, and the StepLR strategy is adjusted at equal intervals for learning rate attenuation. And when the LOSS function value of the detection model is lower than the LOSS threshold LOSS _ THR, finishing training and storing the model parameters.
Further preferably, in the step (6), the rainwater grate spatial distribution prior probability map is read, and image binarization processing is performed on the rainwater grate spatial distribution prior probability map to generate the mask map. Bounding box result (i.e., feature map vector) of model prediction using mask mapMThe operator performs product operation to obtain a mask frame result, as shown in fig. 6.MThe operator is defined as follows:
Figure 94184DEST_PATH_IMAGE001
in the formula (I), the compound is shown in the specification,Vis the first in the feature mapiLine, firstjThe feature vector of the column is determined,bthe number of the frame is shown as the frame number,confis the confidence probability in the bounding box.PIs the first in the mask diagramiLine, firstjThe prior probability values of the columns.
Further preferably, in the step (7), the NMS deduplication process firstly screens the frames with the confidence higher than the threshold CONF _ THR, and then removes the frames with the high overlapping rate according to the intersection ratio threshold IOU _ THR. The coordinate transformation of the frame is transformed from the relative coordinates of the grid to the absolute coordinates of the image.
After the steps are processed, the actual effect of the detection method for detecting the rainwater grate is shown in fig. 7 (the black frame is a detection value, and the white frame is a real value). It can be seen that in the two scenes, the detection method designed by the invention can better detect the rainwater grate.

Claims (8)

1. A lightweight rainwater grate detection method coupled with a context aggregation network is characterized by comprising the following steps: the method comprises the following steps:
(1) image preprocessing: according to the obtained street view image data, marking the position of a rainwater grate in the street view image by using an image marking tool to generate rainwater grate image-label data; enhancing the street view image by using an image processing technology;
(2) initializing parameters: initializing parameters related to the detection method;
(3) generating a data set: performing image-label screening on the image-label data in the step 1 to ensure that the image data and the label data are in one-to-one correspondence; according to the initial parameters set in the step 2, carrying out size adjustment and channel normalization on the image, and converting the label data into grid data; dividing a data set into training data and testing data according to the empirical ratio of the training samples to the testing samples; generating a rainwater grate spatial distribution prior probability graph according to training data;
(4) constructing a rainwater grate detection model: the model framework mainly comprises a framework network and a context aggregation network, wherein the framework network is composed of a lightweight Conv1 module, a series of serially connected Block modules and Regressor modules; the context aggregation network is formed by performing channel aggregation on a shallow context feature map and a deep target feature map; by utilizing the context aggregation network, the detail information of the shallow target object and the semantic information of the deep target object can be fused, so that the attenuation of the target object information in network transmission, particularly the small target object information, is reduced;
(5) model training and prediction: training the detection model by using the training data generated in the step 3 according to the initial parameters set in the step 2 until convergence, and recording and storing the optimal model weight; after training is finished, the optimal model weight is loaded, and the detection model constructed in the step 4 is used for rainwater grate prediction;
(6) mask treatment: according to the result of model prediction, masking the rainwater grate spatial distribution prior probability graph by using the rainwater grate spatial distribution prior probability graph to obtain a masked frame result;
(7) and (3) post-treatment: and (3) according to the initial parameters set in the step (2), carrying out non-maximum suppression and deduplication processing on the frame in the step (6), carrying out space transformation on the coordinates of the frame, restoring the coordinates into image absolute coordinates, and testing by using the test data obtained in the step (3).
2. The light rainwater grate detection method coupled with the context aggregation network according to claim 1, wherein the method comprises the following steps: in the step (1), the obtained street view image data contains near 2000 high-definition color images, and the resolution of the street view image data is 1024 × 512 pixels; scenes in the street view image data mainly comprise roads, sidewalks, street trees, road isolation belts, rain grates and buildings; the rainwater grate is mainly distributed on two sides of a road and at the upstream of a pedestrian crosswalk, the area ratio in the whole image is very small, the average value is only 5 per thousand, an open source annotation tool LabelMe is adopted as an image annotation tool, and the tag data storage format is JSON or XML; the image enhancement method mainly relates to image denoising, image sharpening and image equalization operation; the image denoising method mainly adopts a median filter with a convolution kernel of 3 multiplied by 3 to eliminate salt and pepper noise; the image sharpening mainly uses Laplacian Operator in the field of 4 to highlight the contour of the ground object; the image equalization method mainly comprises global histogram equalization, so that the brightness consistency of all areas of an image is kept and the image definition of partial areas is improved.
3. The light rainwater grate detection method coupled with the context aggregation network according to claim 1, wherein the method comprises the following steps: in step (2), the parameters to be initialized mainly include: number of classifications CLS _ NUM =1, grid SIZE S =7, predicted number of frames per grid B =2, image SIZE IMG _ SIZE =448, BATCH SIZE BATCH _ SIZE =8, and learning rate LR =2 × 10-4LOSS threshold LOSS _ THR =50, confidence threshold CONF _ THR =0.8, and merge ratio threshold IOU _ THR = 0.5.
4. The light rainwater grate detection method coupled with the context aggregation network according to claim 1, wherein the method comprises the following steps: in the step (3), resampling the image data according to a nearest neighbor method and carrying out SIZE transformation on the label data according to the set image SIZE IMG _ SIZE; the image data can be converted into a tensor with the value range of [ -1,1] after being normalized by a channel; converting the label data into grid data according to the set grid number S, and simultaneously converting the frame coordinates of the label from image absolute coordinates into grid relative coordinates; according to the training data, the spatial position information of the rain grate in the image is counted, and a spatial distribution prior probability map of the rain grate is drawn.
5. The light rainwater grate detection method coupled with the context aggregation network according to claim 1, wherein the method comprises the following steps: in the step (4), the skeleton network is mainly formed by connecting a Conv1 module, a Block1 module, a Block2 module, a Block3 module, a Block4 module and a Regressor module in series; the context aggregation network Feature Fusion module is connected behind the Conv1 module, the Block1 module and the Block4 module;
in the Conv1 module, a base convolution operation with a convolution kernel of 3 and a maximum pooling operation with a span of 2 would down-sample the input feature map by a factor of 4;
in the basic convolution module Conv, a three-layer network module is formed by convolution operation, BatchNorm batch normalization and ReLU linear mapping;
the Block module takes a Fire module of the SqueezeNet model as a basic unit, and builds an effective Block by stacking different Fire modules and connecting pooling layers; the Fire module comprises a compression layer and an expansion layer; the compression layer is composed of a group of continuous 1 x 1 convolutions, and the expansion layer is composed of a group of continuous 1 x 1 convolutions and a group of continuous 3 x 3 convolution concatenations; the number of parameters can be greatly reduced by using a special combination of 1 × 1 convolution and 3 × 3 convolution; wherein, Block1 and Block2 have the same structure, and both comprise 2 Fries modules and 1 maximum pooling layer; they differ in the number of channels; through the operations of Block1 and Block2, the space size of the input image is respectively reduced to 1/8 and 1/16 of the original size; block3 contains 4 Fries modules and 1 max pooling layer to further increase the network depth; through Block3 operation, the space size of the input image is respectively reduced to 1/32 of the original size; block4 contains 2 Fries modules and 1 global average pooling layer for reducing the feature map spatial size to the target feature map size;
in the context aggregation network, the context feature map is subjected to 1 × 1 convolution module and 3 × 3 convolution module respectively, global average pooling and channel aggregation to form an aggregation feature map; the convolution module is still composed of convolution operation, BatchNorm batch normalization and ReLU linear mapping; the 1 × 1 convolution operation is adopted to reduce the number of channels of the context feature map, i.e. to the number of channels of the target feature maprThe number of times of the total number of the parts,rthe value is [0.1,0.5 ]]This is done to ensure that the amount of context information does not obscure the target feature map itself; the 3 × 3 convolution operation is adopted to perform the first downsampling; the global average pooling operation is to perform a direct downsampling; finally, channel aggregation is carried out on the context characteristic diagram to form an aggregated characteristic diagram, the aggregated characteristic diagram and the target characteristic diagram are subjected to channel aggregation again, and the aggregated characteristic diagram and the target characteristic diagram are sent to a Regressor to be processed;
the Regressor module consists of three layers of series-connected 1 multiplied by 1 convolution modules and a Sigmoid function; wherein the volume block is still composed of convolution operation, BatchNorm batch normalization and ReLU linear mapping; after the target characteristic diagram is processed by the regressor, a characteristic diagram vector with space size of S multiplied by S and channel number of C is obtained; similar to the design idea of the YOLO model, each feature vector comprises B prediction frames and a classification probability; b prediction frames are respectively used for predicting B targets, if the frames are overlapped, the frames are regarded as one target, and each frame is a 5-dimensional vector and comprises a target existence probability conf, an upper left corner coordinate (x, y) and a target size (w, h); if the detection target includes CLS _ NUM classes, the number of channels C is B × 5+ CLS _ NUM.
6. The light rainwater grate detection method coupled with the context aggregation network according to claim 1, wherein the method comprises the following steps: in the step (5), an RMSprop algorithm is adopted by an optimizer of the training model, and a StepLR strategy is adjusted at equal intervals for the attenuation of the learning rate; and when the LOSS function value of the detection model is lower than the LOSS threshold LOSS _ THR, finishing training and storing the model parameters.
7. A coupling according to claim 1The detection method for the lightweight rainwater grate of the context aggregation network is characterized by comprising the following steps: in the step (6), a rainwater grate space distribution prior probability graph is read, and image binarization is carried out on the rainwater grate space distribution prior probability graph to generate a mask graph; using the mask map to model-predicted bounding results, i.e. feature map vectors, in terms ofMThe operator performs product operation to finally obtain a mask frame result;Mthe operator is defined as follows:
Figure 820248DEST_PATH_IMAGE001
in the formula (I), the compound is shown in the specification,Vis the first in the feature mapiLine, firstjThe feature vector of the column is determined,bthe number of the frame is shown as the frame number,confis the confidence probability in the bounding box;Pis the first in the mask diagramiLine, firstjThe prior probability values of the columns.
8. The light rainwater grate detection method coupled with the context aggregation network according to claim 1, wherein the method comprises the following steps:
in the step (7), NMS deduplication processing firstly screens frames with confidence degrees higher than a threshold CONF _ THR, and then removes frames with high coincidence rate according to a cross ratio threshold IOU _ THR; the coordinate transformation of the frame is transformed from the relative coordinates of the grid to the absolute coordinates of the image.
CN202111102992.5A 2021-09-17 2021-09-17 Lightweight rainwater grate detection method coupled with context aggregation network Active CN113837058B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111102992.5A CN113837058B (en) 2021-09-17 2021-09-17 Lightweight rainwater grate detection method coupled with context aggregation network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111102992.5A CN113837058B (en) 2021-09-17 2021-09-17 Lightweight rainwater grate detection method coupled with context aggregation network

Publications (2)

Publication Number Publication Date
CN113837058A true CN113837058A (en) 2021-12-24
CN113837058B CN113837058B (en) 2022-09-30

Family

ID=78960079

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111102992.5A Active CN113837058B (en) 2021-09-17 2021-09-17 Lightweight rainwater grate detection method coupled with context aggregation network

Country Status (1)

Country Link
CN (1) CN113837058B (en)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109800628A (en) * 2018-12-04 2019-05-24 华南理工大学 A kind of network structure and detection method for reinforcing SSD Small object pedestrian detection performance
CN110956119A (en) * 2019-11-26 2020-04-03 大连理工大学 Accurate and rapid target detection method in image
CN111144376A (en) * 2019-12-31 2020-05-12 华南理工大学 Video target detection feature extraction method
CN111666836A (en) * 2020-05-22 2020-09-15 北京工业大学 High-resolution remote sensing image target detection method of M-F-Y type lightweight convolutional neural network
CN111898410A (en) * 2020-06-11 2020-11-06 东南大学 Face detection method based on context reasoning under unconstrained scene
CN112232232A (en) * 2020-10-20 2021-01-15 城云科技(中国)有限公司 Target detection method
CN112329658A (en) * 2020-11-10 2021-02-05 江苏科技大学 Method for improving detection algorithm of YOLOV3 network
CN112396002A (en) * 2020-11-20 2021-02-23 重庆邮电大学 Lightweight remote sensing target detection method based on SE-YOLOv3
CN112418117A (en) * 2020-11-27 2021-02-26 北京工商大学 Small target detection method based on unmanned aerial vehicle image
CN112446388A (en) * 2020-12-05 2021-03-05 天津职业技术师范大学(中国职业培训指导教师进修中心) Multi-category vegetable seedling identification method and system based on lightweight two-stage detection model
CN112818862A (en) * 2021-02-02 2021-05-18 南京邮电大学 Face tampering detection method and system based on multi-source clues and mixed attention
CN113011336A (en) * 2021-03-19 2021-06-22 厦门大学 Real-time street view image semantic segmentation method based on deep multi-branch aggregation
CN113191296A (en) * 2021-05-13 2021-07-30 中国人民解放军陆军炮兵防空兵学院 Method for detecting five parameters of target in any orientation based on YOLOV5
CN113378890A (en) * 2021-05-17 2021-09-10 浙江工业大学 Lightweight pedestrian and vehicle detection method based on improved YOLO v4

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109800628A (en) * 2018-12-04 2019-05-24 华南理工大学 A kind of network structure and detection method for reinforcing SSD Small object pedestrian detection performance
CN110956119A (en) * 2019-11-26 2020-04-03 大连理工大学 Accurate and rapid target detection method in image
CN111144376A (en) * 2019-12-31 2020-05-12 华南理工大学 Video target detection feature extraction method
CN111666836A (en) * 2020-05-22 2020-09-15 北京工业大学 High-resolution remote sensing image target detection method of M-F-Y type lightweight convolutional neural network
CN111898410A (en) * 2020-06-11 2020-11-06 东南大学 Face detection method based on context reasoning under unconstrained scene
CN112232232A (en) * 2020-10-20 2021-01-15 城云科技(中国)有限公司 Target detection method
CN112329658A (en) * 2020-11-10 2021-02-05 江苏科技大学 Method for improving detection algorithm of YOLOV3 network
CN112396002A (en) * 2020-11-20 2021-02-23 重庆邮电大学 Lightweight remote sensing target detection method based on SE-YOLOv3
CN112418117A (en) * 2020-11-27 2021-02-26 北京工商大学 Small target detection method based on unmanned aerial vehicle image
CN112446388A (en) * 2020-12-05 2021-03-05 天津职业技术师范大学(中国职业培训指导教师进修中心) Multi-category vegetable seedling identification method and system based on lightweight two-stage detection model
CN112818862A (en) * 2021-02-02 2021-05-18 南京邮电大学 Face tampering detection method and system based on multi-source clues and mixed attention
CN113011336A (en) * 2021-03-19 2021-06-22 厦门大学 Real-time street view image semantic segmentation method based on deep multi-branch aggregation
CN113191296A (en) * 2021-05-13 2021-07-30 中国人民解放军陆军炮兵防空兵学院 Method for detecting five parameters of target in any orientation based on YOLOV5
CN113378890A (en) * 2021-05-17 2021-09-10 浙江工业大学 Lightweight pedestrian and vehicle detection method based on improved YOLO v4

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JIAHUI SUN等: "AS-YOLO: An Improved YOLOv4 based on Attention Mechanism and SqueezeNet for Person Detection", 《2021 IEEE 5TH ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC)》 *
YAZHOU LIU等: "Modular Lightweight Network for Road Object Detection Using a Feature Fusion Approach", 《IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS》 *
王兵等: "改进YOLO轻量化网络的口罩检测算法", 《计算机工程与应用》 *
陈春霖: "复杂场景下的目标检测算法研究", 《中国博士学位论文全文数据库 信息科技辑》 *

Also Published As

Publication number Publication date
CN113837058B (en) 2022-09-30

Similar Documents

Publication Publication Date Title
Chen et al. Distribution line pole detection and counting based on YOLO using UAV inspection line video
CN111598030B (en) Method and system for detecting and segmenting vehicle in aerial image
CN110263706B (en) Method for detecting and identifying dynamic target of vehicle-mounted video in haze weather
Han et al. A point-based deep learning network for semantic segmentation of MLS point clouds
CN111899227A (en) Automatic railway fastener defect acquisition and identification method based on unmanned aerial vehicle operation
CN111527467A (en) Method and apparatus for automatically defining computer-aided design files using machine learning, image analysis, and/or computer vision
Jiang et al. Deep neural networks-based vehicle detection in satellite images
Dai et al. Residential building facade segmentation in the urban environment
CN109902676B (en) Dynamic background-based violation detection algorithm
CN104036323A (en) Vehicle detection method based on convolutional neural network
CN105405138A (en) Water surface target tracking method based on saliency detection
CN105574545A (en) Environment image multi-view-angle meaning cutting method and device
CN114708566A (en) Improved YOLOv 4-based automatic driving target detection method
CN116597411A (en) Method and system for identifying traffic sign by unmanned vehicle in extreme weather
Cheng et al. Semantic segmentation of road profiles for efficient sensing in autonomous driving
Jiang et al. Remote sensing object detection based on convolution and Swin transformer
CN114581307A (en) Multi-image stitching method, system, device and medium for target tracking identification
CN114399734A (en) Forest fire early warning method based on visual information
CN113673616B (en) Light-weight small target detection method coupling attention and context
CN113837058B (en) Lightweight rainwater grate detection method coupled with context aggregation network
CN116958911A (en) Traffic monitoring image target detection method oriented to severe weather
Huang et al. Detection of river floating debris in uav images based on improved yolov5
CN116152696A (en) Intelligent security image identification method and system for industrial control system
CN115546667A (en) Real-time lane line detection method for unmanned aerial vehicle scene
Samadzadegan et al. Automatic Road Crack Recognition Based on Deep Learning Networks from UAV Imagery

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant