CN117115616A - Real-time low-illumination image target detection method based on convolutional neural network - Google Patents
Real-time low-illumination image target detection method based on convolutional neural network Download PDFInfo
- Publication number
- CN117115616A CN117115616A CN202310940678.7A CN202310940678A CN117115616A CN 117115616 A CN117115616 A CN 117115616A CN 202310940678 A CN202310940678 A CN 202310940678A CN 117115616 A CN117115616 A CN 117115616A
- Authority
- CN
- China
- Prior art keywords
- image
- low
- network
- enhancement
- real
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 66
- 238000005286 illumination Methods 0.000 title claims abstract description 41
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 12
- 238000000034 method Methods 0.000 claims abstract description 34
- 230000007246 mechanism Effects 0.000 claims abstract description 18
- 238000007781 pre-processing Methods 0.000 claims abstract description 15
- 238000004364 calculation method Methods 0.000 claims abstract description 13
- 230000004927 fusion Effects 0.000 claims abstract description 10
- 230000003321 amplification Effects 0.000 claims abstract description 8
- 238000003199 nucleic acid amplification method Methods 0.000 claims abstract description 8
- 230000008569 process Effects 0.000 claims abstract description 7
- 230000002401 inhibitory effect Effects 0.000 claims abstract description 4
- 238000000605 extraction Methods 0.000 claims description 16
- 230000000694 effects Effects 0.000 claims description 15
- 238000012549 training Methods 0.000 claims description 13
- 238000011084 recovery Methods 0.000 claims description 11
- 238000013528 artificial neural network Methods 0.000 claims description 8
- 238000012360 testing method Methods 0.000 claims description 7
- 238000012795 verification Methods 0.000 claims description 6
- 239000013585 weight reducing agent Substances 0.000 claims description 4
- 230000002708 enhancing effect Effects 0.000 claims description 2
- 238000003062 neural network model Methods 0.000 claims description 2
- 230000004931 aggregating effect Effects 0.000 claims 1
- 239000002131 composite material Substances 0.000 claims 1
- 238000002372 labelling Methods 0.000 claims 1
- 238000005457 optimization Methods 0.000 claims 1
- 238000010586 diagram Methods 0.000 abstract description 9
- 230000000295 complement effect Effects 0.000 abstract 1
- 239000003623 enhancer Substances 0.000 description 8
- 238000012545 processing Methods 0.000 description 5
- 239000013598 vector Substances 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000011897 real-time detection Methods 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 4
- 230000004913 activation Effects 0.000 description 3
- 230000002776 aggregation Effects 0.000 description 3
- 238000004220 aggregation Methods 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 230000009467 reduction Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000006583 body weight regulation Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000003331 infrared imaging Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000010827 pathological analysis Methods 0.000 description 1
- 238000001931 thermography Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/30—Noise filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/50—Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a real-time low-illumination image target detection method based on a convolutional neural network, which is applied to the technical field of computer vision and comprises the following steps: constructing a real low-illumination image dataset; based on the image enhancement process, downsampling the high-resolution image, so as to reduce the calculation cost; the contrast of the low-illumination image is restored by utilizing the depth curve estimation, so that the image quality is improved; based on the target detection process, a lightweight network is used to meet the overall real-time requirement of the detection model; the efficient coordinate attention mechanism is used for focusing on the channel and space position information, so that the characteristic learning capability of the network is enhanced; taking image enhancement as a preprocessing part of target detection to form an enhancement+detection model; adding a weight for each characteristic channel, learning the importance of each channel in the characteristic diagram, establishing a double-channel fusion original image and an enhanced image, and inhibiting the influence of noise amplification caused by image enhancement. And establishing a complementary relation between the original image and the enhanced image.
Description
Technical Field
The invention belongs to the field of deep learning and computer vision, and particularly relates to a real-time low-illumination image target detection method based on a convolutional neural network.
Background
Along with the development of deep learning in the field of computer vision, a target detection algorithm is gradually evolved into Two branches, namely One-Stage and Two-Stage. The target detection task is regarded as a regression problem of positioning and classification by a one-stage algorithm, and the candidate region is selected and classified by a two-stage algorithm. Compared with a one-stage algorithm, the two-stage algorithm is often required to be deployed on a platform with larger calculation power and takes longer detection time, and the real-time requirement of a target detection task is not met. The existing target detection algorithm such as R-CNN, SSD, YOLO can obtain a good detection effect in the universal data set ImageNet, COCO, VOC, and is widely applied to the fields of intelligent traffic, face recognition, pathological analysis, industrial detection and the like. However, imaging in the real world is affected by illumination and equipment, and the captured image has problems of insufficient contrast and low signal-to-noise ratio. The low-quality image not only affects the visual effect, but also makes the downstream visual task more difficult, and seriously affects the detection accuracy of the algorithm.
Researchers generally use two types of methods to deal with the detection problem of low-illumination images, one type of method uses devices such as thermal imaging or infrared sensors to acquire images, but the method has high requirements on physical devices and high cost; another category restores image quality by image enhancement techniques, but traditional histogram equalization or Retinex-based methods focus on restoring the contrast of the image, failing to restore the true color of the image. With the application of deep learning in the field of image processing, the convolutional neural network can be used for extracting the high-level semantics of an algorithm model, learning the characteristics of image contrast, illumination color and the like, and generating a more expressive effect. The method does not judge the image quality by visual sense, combines with a downstream visual task, takes image enhancement as a preprocessing operation of target detection, and cascades an enhancer and a detector to form an enhancement and detection strategy, thereby reducing the influence of a low-illumination image on a target detection algorithm.
Disclosure of Invention
The invention provides a convolution neural network-based real-time low-illumination image target detection method for solving the problem of low target detection precision in a low-illumination environment. Aiming at the characteristics of insufficient contrast and low signal-to-noise ratio of the low-illumination image, the method utilizes a depth network to enhance and recover the image on the premise of not utilizing physical equipment such as illumination, infrared and the like, and optimizes a data set for training a neural network in a data enhancement mode. The invention mainly solves two technical problems, namely, the image is enhanced to restore the image and the noise is amplified, so that the detection model has lower recognition capability on fuzzy objects; and secondly, the parameter quantity of the cascade connection of the enhancer and the detector in actual application is too large, the calculated quantity is too high, and the requirement of realizing real-time detection on an embedded platform with smaller calculation force cannot be met.
Aiming at the first problem, the invention designs a feature fusion module based on a channel attention mechanism to form a feature extraction network of attention pixel level information, and the fusion module is utilized to fuse low-level features of an enhanced image and an original image so as to strengthen the recognition capability of a fuzzy object; aiming at the second problem, the enhancer is subjected to light weight treatment, and the common convolution in the depth network is replaced by the depth separable convolution, so that the network parameter number of the enhancer is greatly reduced, and the processing speed and reasoning capacity of the enhancer are improved. The technical scheme adopted by the invention is as follows:
a real-time low-illumination image target detection method based on a convolutional neural network comprises the following steps:
step one, configuring a deep learning software environment, and configuring an image enhancement algorithm and a target detection algorithm environment based on a convolutional neural network;
step two, constructing a low-illumination image data set, acquiring a real low-illumination image, marking the image, and summarizing the image into a tag data set;
step three, an enhancement recovery module is established, and a weight mechanism is used for inhibiting noise amplification caused by image enhancement;
and fourthly, constructing a deep neural network, and establishing a detection mode of cascade connection of the enhancer and the detector. The method comprises the steps of performing image enhancement on a data set to serve as a preprocessing part of a detection network, and then performing feature extraction and detection activities;
step five, a lightweight network firstly performs downsampling processing on an image in a preprocessing process, and secondly replaces standard convolution with depth separable convolution to ensure network real-time requirements;
step six, optimizing the network, adding an attention mechanism into the feature extraction network to make up for the problem of precision reduction caused by network weight reduction, so that the overall model is balanced in precision and speed;
and step seven, training a neural network model, and verifying the detection effect of the low-light environment.
Specific:
the target detection algorithm is a two-step algorithm based on candidate areas or a single-step algorithm based on regression, and the image enhancement algorithm is a histogram image enhancement algorithm, a tone mapping image enhancement algorithm or a Retinex image enhancement algorithm;
and step two, the real low-illumination image data set is a synthesized data set, and firstly, the real low-illumination image data set is obtained by utilizing a network, and the low-illumination images in the public data set are screened for expansion. And marking the images and summarizing the images into a label data set. Finally, randomly dividing the label data set into a training set, a verification set and a test set;
and thirdly, the enhancement recovery module takes the enhanced image and the original image as input to be fused in a double-channel mode, and adds a weight for each characteristic channel by utilizing a channel attention mechanism. And secondly, learning the importance of each channel in the feature map through a neural network. Finally, according to the weight aggregation dual-channel input characteristic channel, the attention of the module to the target characteristic information channel is improved, and the influence of noise amplification during image enhancement is restrained;
the cascade network of the enhancer and the detector consists of a low-illumination image enhancement algorithm, a cascade module and a target detection algorithm, and comprises an image preprocessing layer, a feature fusion layer, a feature extraction layer and a prediction layer. The low-illumination image enhancement algorithm is used as a preprocessing layer of a network to enhance image quality, the low-illumination image enhancement algorithm comprises a histogram equalization method and a method based on Retinex theory or curve mapping, the cascade module is an enhancement recovery module in the third step, the target detection algorithm comprises a two-step algorithm based on candidate areas or a single-step algorithm based on classification regression, feature fusion is carried out through a network model such as CSPDarknet, VGG or Mobilene, feature extraction is carried out through a network structure such as feature pyramid FPN, PANet or BiFPN, classification regression is carried out through a convolution module of 3 multiplied by 1, and the probability that a target appears in a priori frame is calculated and compared;
and fifthly, the model is light, and the preprocessing part takes the downsampled small-scale image as the input of the preprocessing layer, so that the calculation cost of convolutional layer learning is reduced. And secondly, restoring the enhanced image to the original resolution through upsampling, and substituting the enhanced image into a subsequent activity. Finally, the common convolution is replaced by the depth separable convolution, and the parameter quantity can be reduced to one tenth of the original one;
and step six, adding an attention mechanism into the feature extraction network to compensate for the problem of precision reduction caused by network weight reduction, wherein the attention mechanism can allocate computing resources to more important tasks under the condition of limited computing capacity, so that the neural network has the feature extraction capacity of concentrating on space information and channel information, and the overall model achieves balance in precision and speed.
And step seven, the model training part randomly divides the low-illumination image data set into a training set, a verification set and a test set according to the ratio of 8:1:1 to generate a low-illumination image target detection model. Secondly, verifying the detection effect of the model, shooting a real image under the low-illumination condition, respectively sending the image into a traditional target detection model and the low-illumination target detection model based on the invention, and verifying the detection effect of the model;
the invention has the beneficial effects that:
firstly, the invention uses the image enhancement algorithm as a preprocessing step of target detection, is more suitable for extracting the characteristics of the low-illumination image, and can improve the accuracy of the neural network on the low-illumination image identification. And secondly, the invention designs an enhancement recovery module based on a channel attention mechanism, and weight regulation is carried out on the image noise amplification problem caused by image enhancement, so that an image with higher quality is obtained. And thirdly, the invention pre-processes the downsampled image, reduces the requirement of the model on the calculation force, and can apply the model to a platform with lower calculation force such as a mobile terminal or embedded equipment. Then, the method replaces standard convolution with depth separable convolution in the feature extraction stage, so that the overall model is improved in detection speed, and the requirement of real-time detection is met. Finally, the invention does not need to use hardware equipment such as infrared imaging and the like to process the image, and has lower cost.
Drawings
FIG. 1 is a flow chart of real-time low-light level target detection based on convolutional neural networks;
FIG. 2 is a block diagram of an enhanced recovery module;
FIG. 3 is a block diagram of a depth separable convolution;
fig. 4 is a block diagram of a coordinated attention mechanism.
Detailed description of the preferred embodiments
In order to make the technical solution and features of the present invention more clearly revealed, the present invention is explained below with reference to the accompanying drawings, but the present invention is not limited by examples.
Example 1:
a method for detecting a real-time low-illuminance image target based on a convolutional neural network, the method comprising:
step one, configuring an environment: and configuring an image enhancement algorithm and a target detection algorithm environment based on deep learning. The required development environment is configured under the window system, wherein the computer graphics card used is RTX3060, and each application environment is python 3.9.7,anconda 4.11.0,cuda11.0. The present example obtains the open source procedure of the object detection algorithm YOLOX and the image enhancement algorithm ZeroDCE on the gitsub.
Step two, collecting data: and constructing a low-illumination image data set, acquiring a real low-illumination image, marking the image, and summarizing the image into a label data set. The example uses an open source real low-light data set Expark to cover 10 low-light conditions with different degrees, including 12 categories of people, bicycles, boats, chairs and the like, and 7363 low-light images. Since both the PSCAL VOC and the actual dim light detection dataset ExDark contain 10 classes of objects, 2760 low-intensity images were screened from the VOC2007 dataset for expansion, forming a new dataset a. To facilitate YOLOX training, the labels in dataset a are converted to VOC2007 format and the image resolution is adjusted to accommodate the network input.
And thirdly, establishing an enhancement recovery module, and using a weight mechanism to inhibit noise amplification caused by image enhancement. The embodiment provides a new cascade module by referring to the network structure of SKNet. The enhancement recovery module is shown in fig. 2, and is composed of an input layer, a feature fusion layer and a feature aggregation layer, wherein the input layer takes enhanced image features and original image features as the input of a model, the fusion features are influenced by the amplified noise of the enhanced image by taking the pixel-by-pixel Addition (Point-Wise Addition) method into consideration, a vector splicing (connectate) method is selected to fuse images input by two channels, the feature size of the fused image is 2C, H, W, and the fused features U E R are obtained 2C*H*W . The calculation formula can be expressed as: u=u 1 +U 2 Wherein U represents the characteristics of the fused image, U 1 Representing enhanced image features, U 2 Representing original image features; secondly, in order to characterize the importance of the information of each channel, the feature fusion layer adopts a global average pooling method to encode the feature channels of H and W dimensions, and reduces the dimension of each layer of U into a number M. The calculation formula can be expressed as:where W and H represent the width and height of the feature and (i, j) represent the spatial location of the feature. In order to learn the correlation between the feature channels, the module reduces the dimension of the M input first and then increases the dimension to obtain a weight vector Z by dividing the M input into two layers of full-connection layer branches fc, and a calculation formula can be expressed as follows: z is Z a =F fc (M,W)=σ(W a δ(WM)),Z b =F fc (M,W)=σ(W b Delta (WM)), where Z a And Z b Representing two output weight vectors, wherein W is a parameter of a first full connection layer, the dimension is c/gamma c, gamma is a scaling factor, and the scaling factor is used for reducing vector dimension, reducing calculated amount and W a And W is b Parameters of the second full-connection layer in the two full-connection branches are respectively that the dimension isC/γ for generating a weight vector corresponding to an input feature, δ being a ReLu activation function, β being a Sigmoid layer; finally, obtaining the channel weight Z of the original image features and the enhanced image features by using a softmax function in the feature aggregation layer a ,Z b U is set up 1 ,U 2 Extracting feature weighted addition to obtain feature map U + ,U + Can be expressed as: u (U) + =Z a *U 1 +Z b *U 2 . The contrast enhancer is directly connected with the detector, and the enhancement recovery module provided by the invention preferentially aggregates the enhanced image with the original image, so that the image quality is improved, and the influence of noise amplification after the enhanced image is reduced.
And fourthly, the model is light, and the model is ensured to meet the requirement of real-time detection. The ZeroDCE and YOLOX are selected as image enhancement and object detection models, respectively. Firstly, taking a downsampled small-scale image as an input of a depth network DCENT for an image enhancement part, mapping and upsampling a curve parameter of depth estimation, recovering to an original resolution, and then carrying out subsequent iterative enhancement. This downsampling operation takes as input a low resolution image, which can significantly reduce the computational cost. And secondly, for the target detection part, the common convolution used by the characteristic extraction network can be replaced by a more efficient depth separable network. As shown in the depth separable convolution of fig. 3, it is shown how the standard convolution (a) is decomposed into a depth-wise convolution (b) and a point-wise convolution (c). Standard convolution layer input image size is MxD F ×D F With N sizes M x D K ×D K Is convolved with a final output size of NxD G ×D G Is a feature map of (1). Wherein D is F Representing the width and height of the input feature map, D G Representing the width and height of the output feature map, D K Is the spatial dimension of the convolution kernel, M is the number of input channels and N is the number of output channels. The parameters and calculated amounts used to finally obtain the standard convolution and the depth separable convolution are as follows: the reference number of the standard convolution layer is D K ×D K X M x N; the standard convolution calculated amount is D K ×D K ×M×N×D F ×D F The method comprises the steps of carrying out a first treatment on the surface of the Depth separable convolution parameterNumber D K ×D K ×M×N+D K ×D K X N; depth separable calculated amount D K ×D K ×M×D F ×D F +M×N×D F ×D F . The ratio of the parameter quantity to the calculated quantity can find that the light characteristic extraction network parameter quantity and the calculated quantity are about one ninth of the original parameter quantity and the calculated quantity, so that the height and the width of the model are greatly reduced, and the model reasoning speed is improved.
And fifthly, optimizing the model, and adding an attention mechanism into the feature extraction network to make up for the problem of precision reduction caused by network weight reduction. The embodiment adds a high-efficiency coordinate attention mechanism CA for the mobile terminal in the feature extraction layer, and can encode the transverse and longitudinal position information into the feature channel, so that the mobile network can pay attention to the position information in a large range, better locate and identify the target, and can not bring excessive calculation amount. The CA attention mechanism module aims at enhancing the expression capability of the mobile network learning characteristics, the implementation process of the CA attention mechanism module is as shown in an attention structure diagram of the CA in fig. 4, and in order to acquire the attention on the width and the height of an image and encode accurate position information, the CA firstly divides the input characteristic diagram into two directions of the width and the height to respectively carry out global average pooling, and respectively obtains the characteristic diagram in the two directions of the width and the height. Then the feature graphs of the width and the height directions of the obtained global receptive field are spliced together, then the feature graphs are sent to a convolution module with a shared convolution kernel of 1 multiplied by 1, the dimension of the feature graphs is reduced to be the original C/r, and then the feature graph F1 subjected to batch normalization processing is sent to a Sigmoid activation function to obtain the feature graph with the shape of 1 multiplied by (W+H) multiplied by C/r. And then, carrying out convolution with the convolution kernel of 1 multiplied by 1 on the feature map according to the original height and width to respectively obtain feature maps Fh and Fw with the same channel number as the original feature maps, and respectively obtaining the attention weights of the feature map on the height and width after a Sigmoid activation function. Finally, the characteristic diagram with the attention weight in the width and height directions is finally obtained through multiplication weighted calculation on the original characteristic diagram.
And step six, training a model, and testing the detection effect of the low-illumination environment. Dividing the data set A in the second step into training sets according to the ratio of 8:1:1, introducing a verification set and a test set into the embodiment to serve as the training set, the verification set and the test set, and finally generating a low-illumination target detection model.
Experimental results:
and (3) simultaneously sending the low-illumination image into a common target detection model and a model of the embodiment to verify the weak light detection effect. The experimental results are shown in table 1,
table 1 comparison of results
The model of the invention exceeds YOLOX in terms of both detection speed and accuracy, and the model size is smaller. In addition, the invention reduces the resolution of the original image by downsampling and then carries out upsampling to restore the enhanced image, thereby reducing the demand of the model on calculation force on the premise of not influencing the image enhancement effect, leading the model processing speed to be faster and meeting the demand of real-time detection.
While the invention has been described in detail in connection with specific embodiments thereof, it will be understood that the individual details are not limited to the particular embodiments described above, and that various changes and modifications may be made by one skilled in the art without departing from the spirit of the invention, within the scope of the appended claims.
Claims (8)
1. A real-time low-illumination image target detection method based on a convolutional neural network comprises the following steps:
step one, configuring a deep learning software environment, and configuring an image enhancement algorithm and a target detection algorithm environment based on a convolutional neural network;
step two, constructing a low-illumination image data set, acquiring a real low-illumination image, marking the image, and summarizing the image into a tag data set;
step three, an enhancement recovery module is established, and a weight mechanism is used for inhibiting noise amplification caused by image enhancement;
constructing a deep neural network, and establishing a detection mode of 'enhancement+detection', wherein the data set is subjected to image enhancement firstly to serve as a preprocessing part of the detection network, and then feature extraction and detection activities are carried out;
step five, a lightweight network is adopted, the image is subjected to downsampling treatment in the preprocessing process, and standard convolution is replaced by depth separable convolution, so that the real-time requirement of the network is ensured;
step six, optimizing the network, focusing on the channel and space position information by using a high-efficiency coordinate attention mechanism, and enhancing the characteristic learning capability of the network;
and step seven, training a neural network model, and verifying the detection effect of the low-light environment.
2. The method according to claim 1, wherein the target detection algorithm in the first step is a two-step algorithm based on candidate regions or a single-step algorithm based on regression; the image enhancement algorithm is a histogram image enhancement algorithm, a tone mapped image enhancement algorithm, or a Retinex image enhancement algorithm.
3. The method for detecting real-time low-illuminance images according to claim 1, wherein step two, the real low-illuminance image dataset is a composite dataset, and the real low-illuminance image dataset is obtained by using a network and the low-illuminance images in the public dataset are screened for expansion; secondly, labeling the images, and summarizing the images into a label data set; and finally, dividing the label data set into a training set, a verification set and a test set.
4. The method according to claim 1, wherein the enhancement recovery module first fuses the enhanced image with the original image as input in a dual-channel manner, and adds a weight to each feature channel by using a channel attention mechanism; secondly, learning the importance of each channel in the feature map through a neural network; and finally, aggregating the input characteristic channels of the two channels according to the weights, improving the attention of the module to the target characteristic information channel, and inhibiting the influence of noise amplification during image enhancement.
5. The method according to claim 1, wherein the "enhancement+detection" network in step four is composed of a low-illumination image enhancement algorithm, a cascade module and a target detection algorithm, and comprises an image preprocessing layer, a feature fusion layer, a feature extraction layer and a prediction layer. The low-illumination image enhancement algorithm is used as a preprocessing layer of a network to enhance the image quality, and comprises a histogram equalization method and a method based on Retinex theory or depth curve estimation; the cascade module is the enhancement recovery module in claim 4; the target detection algorithm comprises a two-step algorithm based on candidate areas or a single-step algorithm based on classification regression, feature fusion is carried out through network models such as CSPDarknet, VGG or Mobilene, feature extraction is carried out through network structures such as feature pyramids FPN, PANet or BiFPN, classification regression is carried out through convolution modules of 3 multiplied by 3 and 1 multiplied by 1, the cross ratio is calculated, and the probability that a target appears in a priori frame is predicted.
6. The method for detecting a real-time low-illuminance image according to claim 1, wherein the model weight reduction step five uses a small-scale image after downsampling as an input of a preprocessing layer in the preprocessing part, thereby reducing the calculation cost of convolutional layer learning; secondly, restoring the enhanced image to the original resolution through upsampling to replace the subsequent activity; and finally, replacing the common convolution with the depth separable convolution, and reducing the parameter quantity to one tenth of the original one.
7. The method for detecting a real-time low-illuminance image according to claim 1, wherein in step six, an attention mechanism is added to the feature extraction network by the optimization model to compensate for the problem of reduced accuracy caused by light weight of the network, so that the overall model is balanced in accuracy and speed.
8. The method according to claim 1, wherein the model training section firstly randomly divides the low-illuminance image dataset into a training set, a verification set and a test set according to a ratio of 8:1:1 to generate a low-illuminance image target detection model; secondly, verifying the model detection effect, shooting a real image under the low-illumination condition, and respectively sending the image into the target detection model without image enhancement and the low-illumination target detection model in claim 1, and verifying the model detection effect.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310940678.7A CN117115616A (en) | 2023-07-28 | 2023-07-28 | Real-time low-illumination image target detection method based on convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310940678.7A CN117115616A (en) | 2023-07-28 | 2023-07-28 | Real-time low-illumination image target detection method based on convolutional neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117115616A true CN117115616A (en) | 2023-11-24 |
Family
ID=88811871
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310940678.7A Pending CN117115616A (en) | 2023-07-28 | 2023-07-28 | Real-time low-illumination image target detection method based on convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117115616A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118071752A (en) * | 2024-04-24 | 2024-05-24 | 中铁电气化局集团有限公司 | Contact net detection method |
-
2023
- 2023-07-28 CN CN202310940678.7A patent/CN117115616A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118071752A (en) * | 2024-04-24 | 2024-05-24 | 中铁电气化局集团有限公司 | Contact net detection method |
CN118071752B (en) * | 2024-04-24 | 2024-07-19 | 中铁电气化局集团有限公司 | Contact net detection method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109584248B (en) | Infrared target instance segmentation method based on feature fusion and dense connection network | |
CN111950649B (en) | Attention mechanism and capsule network-based low-illumination image classification method | |
CN110909666B (en) | Night vehicle detection method based on improved YOLOv3 convolutional neural network | |
CN112183203B (en) | Real-time traffic sign detection method based on multi-scale pixel feature fusion | |
CN112801027B (en) | Vehicle target detection method based on event camera | |
CN111209858B (en) | Real-time license plate detection method based on deep convolutional neural network | |
CN115861380B (en) | Method and device for tracking visual target of end-to-end unmanned aerial vehicle under foggy low-illumination scene | |
Cho et al. | Semantic segmentation with low light images by modified CycleGAN-based image enhancement | |
CN113888461A (en) | Method, system and equipment for detecting defects of hardware parts based on deep learning | |
CN116385958A (en) | Edge intelligent detection method for power grid inspection and monitoring | |
CN113052057A (en) | Traffic sign identification method based on improved convolutional neural network | |
CN118230175B (en) | Real estate mapping data processing method and system based on artificial intelligence | |
CN111199255A (en) | Small target detection network model and detection method based on dark net53 network | |
CN114140672A (en) | Target detection network system and method applied to multi-sensor data fusion in rainy and snowy weather scene | |
CN117115616A (en) | Real-time low-illumination image target detection method based on convolutional neural network | |
Wu et al. | Vehicle detection based on adaptive multi-modal feature fusion and cross-modal vehicle index using RGB-T images | |
CN112149526A (en) | Lane line detection method and system based on long-distance information fusion | |
CN117611911A (en) | Single-frame infrared dim target detection method based on improved YOLOv7 | |
Li et al. | An end-to-end system for unmanned aerial vehicle high-resolution remote sensing image haze removal algorithm using convolution neural network | |
Cho et al. | Modified perceptual cycle generative adversarial network-based image enhancement for improving accuracy of low light image segmentation | |
Liangjun et al. | MSFA-YOLO: A Multi-Scale SAR Ship Detection Algorithm Based on Fused Attention | |
CN117994573A (en) | Infrared dim target detection method based on superpixel and deformable convolution | |
CN117372853A (en) | Underwater target detection algorithm based on image enhancement and attention mechanism | |
CN116363610A (en) | Improved YOLOv 5-based aerial vehicle rotating target detection method | |
CN115861595A (en) | Multi-scale domain self-adaptive heterogeneous image matching method based on deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication |