CN116129375B - Weak light vehicle detection method based on multi-exposure generation fusion - Google Patents

Weak light vehicle detection method based on multi-exposure generation fusion Download PDF

Info

Publication number
CN116129375B
CN116129375B CN202310410770.2A CN202310410770A CN116129375B CN 116129375 B CN116129375 B CN 116129375B CN 202310410770 A CN202310410770 A CN 202310410770A CN 116129375 B CN116129375 B CN 116129375B
Authority
CN
China
Prior art keywords
exposure
network
fusion
vehicle detection
light vehicle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310410770.2A
Other languages
Chinese (zh)
Other versions
CN116129375A (en
Inventor
喻莉
杜博阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN202310410770.2A priority Critical patent/CN116129375B/en
Publication of CN116129375A publication Critical patent/CN116129375A/en
Application granted granted Critical
Publication of CN116129375B publication Critical patent/CN116129375B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/54Surveillance or monitoring of activities, e.g. for recognising suspicious objects of traffic, e.g. cars on the road, trains or boats
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a weak light vehicle detection method based on multi-exposure generation fusion, and belongs to the technical field of computer vision. The invention designs a multi-exposure generation fusion network, realizes functions of end-to-end image enhancement and vehicle detection, and enables any detection frame to be compatible with the invention; a double-input single-output convolution circulation module is selected in a multi-exposure generation network to avoid the problems of excessive smoothing and excessive exposure, and a single-gate memory unit is designed for implementation; during pseudo-supervision pre-training, a loss function combining SMAE and SSIM is designed, and calculation is performed by integrating two points of overall visual quality and pixel value error; adding a double-balance factor loss function into the loss function of the end-to-end training, so that the loss function focuses on vehicle samples which are difficult to detect under dim light; and adding a compression excitation network between the multi-exposure generation network and the multi-exposure fusion network, and completing fusion butt joint of the two networks by using a convolution kernel channel dimension expansion method.

Description

Weak light vehicle detection method based on multi-exposure generation fusion
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a weak light vehicle detection method based on multi-exposure generation fusion.
Background
Today the Intelligent Traffic System (ITS) is an important component of the smart city and is the main direction of future traffic system development. The system is a real-time, accurate and convenient comprehensive traffic management system, is interwoven with work and travel life, and plays an extremely key role. In the research of ITS, the target detection of the vehicle is the most important of all researches, but from the practical application point of view, about half of the time is affected by the dark light at night, and the detection rate of the vehicle is lower. For ITS, it is not practical to rely on manual means only to monitor a darkish vehicle in real time. Therefore, in order to improve the accuracy of vehicle detection under dim light, it is necessary to build a high-efficiency and accurate vehicle detection system under dim light to perform real-time monitoring.
The image under the weak light can influence the extraction of the characteristics due to low brightness and low contrast, so that the vehicle detection performance is reduced, and meanwhile, extra noise points can be generated in the low light, so that the information structure of the vehicle image is further influenced and damaged. Some conventional image enhancement methods may address this problem to some extent, but these non-deep learning enhancement methods aim to improve the visual quality of the entire image, not entirely in line with the targets of vehicle detection. For example, common smoothing operations may compromise the features necessary for vehicle detection, and may also inevitably produce some noise for enhancement at too low a brightness. And the end-to-end brightness enhancement-vehicle detection scheme based on deep learning can effectively solve the problem. In the task of computer vision, a method of generating a target image is generally to take a previous stage image as an input to generate a next stage image, and form a single-input single-output frame through a convolutional neural network CNN. However, due to the influence of factors such as lamps and street lamps in actual road traffic and the like, the problem of uneven illumination and darkness is serious, and the phenomenon of excessive smoothness or excessive exposure of a picture part is most likely to occur by using the strategy, so that detailed information in a vehicle detection task is damaged, and difficulty is brought to target detection under dim light.
Disclosure of Invention
Aiming at the defects or improvement demands of the prior art, the invention provides a weak light vehicle detection method based on multi-exposure generation fusion, which aims to improve the recall rate of vehicle detection under dark light.
In order to achieve the above object, the present invention provides a method for detecting a low-light vehicle based on multi-exposure fusion generation, comprising:
s1, constructing a weak light vehicle detection network; the low-light vehicle detection network comprises a multi-exposure generation network and a multi-exposure fusion network; the multi-exposure generation network comprises an encoder and a decoder for generating images of different exposure degrees; wherein the encoder has a plurality of stages; each stage comprises a plurality of layers of serially connected convolution circulation modules; each convolution circulation module is of a double-input single-output structure; one input is the parameter output of the same layer convolution circulation module in the previous stage; the other input is the code output of the layer on the same stage;
the multi-exposure fusion network is realized by adopting a target detection frame and is used for carrying out complementary learning on difference information among images with different exposure degrees to obtain a vehicle detection result; the parameter weight of the first layer convolution kernel in the target detection frame is copied and normalized for T times and then is connected with the output of the multi-exposure generation network; t represents the number of encoding stages;
s2, generating a plurality of pseudo exposure images with different exposure degrees from the low exposure image;
s3, performing pseudo-supervision pre-training on the multi-exposure generating network by using the low-exposure image and the multiple pseudo-exposure images in the step S2 to obtain pre-training parameters of the multi-exposure generating network; then, performing end-to-end joint training on the multi-exposure generating network and the multi-exposure fusion network by adopting the low-exposure image, so that the multi-exposure fusion network provides vehicle detection information to guide the multi-exposure generating network to perform parameter fine adjustment, and the multi-exposure generating network provides a pre-trained enhanced image for the multi-exposure fusion network to perform self-adaptive fusion detection, thereby finally obtaining a weak light vehicle detection network;
s4, inputting the weak light image to be detected into a weak light vehicle detection network to obtain a vehicle detection result.
Further, adoptGenerating a plurality of pseudo exposure patterns with different exposure degrees from the low exposure image;
P x 、P 0 the required image pixel value and the original image pixel value in the database are respectively,k x in order for the exposure rate to be high,abis an internal parameter related to the camera.
Further, the encoder has four stages; each stage includes four layers of serially connected convolution loop modules.
Further, the low-light vehicle detection network further comprises a compression excitation network; the compression stimulus network is connected between the multi-exposure generation network and the multi-exposure fusion network.
Further, the target detection framework adopts a feature pyramid network under one-stage; batch normalized BN and nonlinear activated RELU operations in the backbone network Resnet of the feature pyramid are located before the convolutional layer.
Further, the convolution loop module comprises a single gate memory unit, a channel attention mechanism and a spatial attention mechanism.
Further, the loss function adopted by the multi-exposure generation network pseudo-supervision pre-training is as follows:
I t the images generated for the network training are displayed,in the form of a pseudo-exposure pattern,Nrepresenting the number of pixels in the image, SSIM represents a structural similarity index, and H represents SMAE loss.
Further, confidence loss adopted in the weak light vehicle detection network training process is as follows:
、/>is a double balance factor, ++>1 represents a vehicle, +.>0 represents background, < >>A confidence score between 0 and 1.
In general, the above technical solution conceived by the present invention can achieve the following advantageous effects compared to the prior art.
(1) The invention adopts a dual-input single-output RNN module to be applied to a multi-exposure generation network, on one hand, history information is utilized to maintain key area details, and on the other hand, images with gradually higher exposure degree are generated. The design can avoid the problem of excessive smoothness or excessive exposure caused by single input and single output of the conventional convolutional neural network, and can effectively solve the problem of uneven brightness of the vehicle detection under the influence of the lamps and the street lamps in the dark; meanwhile, the invention adopts a multi-exposure enhanced detection scheme, realizes the detail interaction and feature fusion under different exposure conditions, and can effectively improve the accuracy of vehicle detection under low illumination;
the invention is an end-to-end image enhancement and vehicle detection network, not only can perform detail information interaction and feature fusion under different illumination, but also can effectively optimize the generation network aiming at the problem of vehicle detection through end-to-end learning, and meanwhile, due to the light design, the invention only increases few parameters.
(2) According to the invention, the vehicle detection algorithm under the CNN is selected as the multi-exposure fusion frame, the multi-exposure generation network can be arranged in front of the multi-exposure fusion network, and the vehicle detection algorithm can be compatible with any target detection frame only by processing the channel dimension at the joint of the two networks, so that the application range is wide.
(3) Before information is input into the multi-exposure fusion network, the generated multiple exposure pictures are input into a compression-Excitation network (SENet), so that the learning process can use global information to strengthen useful channel characteristics and fade useless channel characteristics to obtain channel dimension weights when 3T channels are fused.
(4) The target detection frame is trained by adopting a characteristic pyramid network under one-stage, so that the target detection frame has the result precision which is not input into two-stage on the premise of keeping the one-stage speed; the invention adopts a variant of the residual network as the backbone network of the feature pyramid. Compared to the most commonly used Resnet, this variant moves Bulk Normalization (BN) and nonlinear activation (RELU) operations to the front of the convolutional layer, making it pre-activation function easier for the entire network to train.
(5) According to the invention, two dynamic scaling factors are added on the two classification cross entropies, so that the loss caused by easily distinguishing samples in the learning process is dynamically and greatly reduced, and the center of gravity of the loss is placed on the part of the positive and negative samples which are difficult to distinguish, so that the detection of difficult targets under weak illumination is facilitated.
Drawings
FIG. 1 is a diagram of a multiple exposure generation and fusion network.
Fig. 2 is a schematic diagram of a convolution loop module.
In fig. 3, (a) is a feature pyramid network and (b) is a network schematic diagram of a Resnet variant backbone.
FIG. 4 is a graph of the accuracy results of different numbers of stages and different numbers of coding layers of a multi-exposure generation network.
Fig. 5 is a statistical histogram of the required exposure for the dataset.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
S1, constructing a weak light vehicle detection network; referring to fig. 1, the low-light vehicle detection network includes a multi-exposure generation network including an encoder and a decoder for generating images of different exposure degrees and a multi-exposure fusion network; wherein the encoder has a plurality of stages; each stage comprises a plurality of layers of serially connected convolution circulation modules; in the invention, each convolution circulation module is set to be in a double-input single-output structure, and specifically as shown in fig. 1, in a single low-light image, the method is realized through a series of double-branch convolution training: the input of one direction directs the network frame to encode and decode to produce images of different exposure levels as shown in the following equation.
The other direction input is shown in the following formula, the coding networks of the same layer are connected in series, and the output of the stage is used as one of the inputs of the next stage.
Where I represents pictures of various stages, H represents an intermediate hidden layer module,and->Representing decoders and encoders in a module, having corresponding parametersCount->And->. Wherein a plurality of RNN-based modules are cascaded to form an encoder for encoding an input image into a multi-layer feature map. />The first layer, which represents the t-th stage in the hidden layer, and the input of each encoder consists of a previous layer and a previous stage.
On one hand, the memory function of the neural network is realized by inputting information depending on the previous stage, and the key region details are maintained by utilizing the history information, so that the capability of losing the information of the stage with a longer learning interval is avoided, and the probability of extreme values of key positions in an image is reduced; on the other hand, the last layer of the same stage is input for completing the encoding process, the features of the input image of the stage are extracted by increasing the number of channel dimensions, and then the size of the input image is restored by decoding to generate an image with gradually higher exposure. The design can avoid the problem of excessive smoothness or excessive exposure caused by single input and single output of the conventional convolutional neural network, and can effectively solve the problem of uneven brightness of vehicle detection under the influence of the lamps and the street lamps in the dark.
The vehicle detection based on the multi-exposure generation network is the first part in the ITS algorithm architecture, and the operation speed of the vehicle detection can influence the operation of ITS algorithms such as subsequent license plate detection, driver behavior recognition and the like. The convolution cyclic module (ConvRN) formed by convoluting the RNN has a certain amount of parameters, and meanwhile, the multi-exposure generation network also needs a certain amount of coding modules, so that the lightweight ConvRNN (ConvRNN) structure with fewer design parameters is important on the premise of not losing the precision.
In a typical ConvRNN, the information stored in each stage is usually short-term information, and the data in the previous stage may show an exponential level missing. This results in very little parameter variation of the convolution cyclic unit at the earlier stage due to the chain derivative rule of the complex function during back propagation, and the required purpose of avoiding excessive smoothing and excessive exposure by deep learning is not achieved. Therefore, according to the number of stages required by the experiment, the invention designs a special RNN memory unit, and only a small amount of parameters are added to ensure that long-term key information is learned.
The structure of the present invention is shown in fig. 2. From the aspect of the sequence length, the number of stages of the multi-exposure generation network is far smaller than the sequence length of natural language processing, so that the parameter sum of the memory unit can be smaller than the common ConvLTSM and ConvGRU. Firstly, performing sigmoid gating operation on two inputs after cavity convolution W and common convolution U respectively to obtain a unique gating module in a network. Then the gating module is multiplied by the input of the previous stage and convolved to obtain the hidden layer of the convolution circulation module>The gating module controls how much effective information of the previous stage is contained in the hidden layer, so that gradient disappearance and gradient explosion are avoided. Next, the last-stage effective information of the required memory is dynamically adjusted by using the gating parameters to obtain the output +.>. The total parameter of the convolution circulation module is only half of ConvLSTM, and the convolution circulation module has higher training and use speed. The operation formula of the memory unit is shown as follows,
in addition, a lightweight channel attention mechanism A is also designed in the circular convolution module c Module and spatial attention mechanism A s The module is placed behind the memory cell to focus on important features and suppress unnecessary features. In the channel dimension, compressing the length and the width of the image by using global maximum pooling and global average pooling in parallel, and then obtaining the weight calibration of the channel dimension by sharing a multi-layer perceptron; in the space dimension, the average value and the maximum value are needed to be respectively taken in the channel dimension to extract information, and then the compression of the number of channels is continuously carried out on the two obtained results through a convolution kernel with a self-defined size. Finally, weighting the weight coefficient of the original characteristic diagram by using a nonlinear function, thereby completing the whole attention module.
In the invention, the target detection frame under CNN is selected as the multi-exposure fusion network, so that the multi-exposure generation network can be placed in front of the fusion network, and the aim of compatibility of the algorithm and any vehicle detection frame can be achieved by only processing the channel dimension at the joint of the two networks.
In deep learning, the target detection framework is divided into two types, a single-stage (one-stage) and a double-stage (two-stage). Compared with a double-stage network, the single-stage network has fewer area candidate networks, so that the speed is faster, and the requirement of a chip in a camera is met. Therefore, the object detection framework of the present invention adopts a feature pyramid network under a single-stage network for training, and the network structure of the feature pyramid is shown in fig. 3 (a). The method can have the result precision which is not input into the double-stage on the premise of keeping the single-stage network speed. Meanwhile, the variant of the residual network is adopted as the backbone network of the feature pyramid, and compared with the most commonly used Resnet, the variant moves Batch Normalization (BN) and nonlinear activation (RELU) operations to the front of the convolution layer, and the variant has the function of pre-activation, so that the training of the whole network is easier.
Because the selected detectors are all based on the convolutional neural network, the invention selects the first layer of the frame of the cut detector, and uses a convolutional kernel expansion method in the channel dimension, so that the detector can process a plurality of images at the same time and perform self-adaptive fusion. Specifically, for the pre-trained vehicle detector, the multi-exposure fusion network and the generation network are guaranteed to have the same channel dimension by copying and normalizing the parameter weight of the T first-layer convolution kernels, and meanwhile better distinguishing and complementary area information can be kept in different pseudo-exposures. In summary, after the first convolution layer in (b) in fig. 3 is subjected to convolution kernel channel dimension expansion, the first convolution layer can be connected to the whole network frame shown in fig. 1, so as to realize the multi-exposure fusion function as a whole.
In addition, since the final output of the multi-exposure generating network is T RGB pictures, the number of input channels of the fusion network is 3T. Therefore, before information is input into a vehicle detector, the generated picture can be input into a compression-Excitation network (SENet), so that the learning process can use global information to strengthen useful channel characteristics and fade useless channel characteristics so as to obtain channel dimension weights when 3T channels are fused.
For the multi-exposure generation network, 4 stages are selected as the exposure quantity of the multi-exposure, and 4 convolution circulation modules are selected as the encoder in each stage. Generally, increasing the number of stages not only increases the number of parameters and the operation cost of the model, but also requires a larger space to generate more pseudo-exposure images in advance to support learning training; increasing the number of codes in each stage may result in over-fitting due to the excessive number of layers. For the selection of the number, the invention also carries out a comparative experiment, and the result is shown in fig. 4. A trade-off is made between accuracy and efficiency, and a 4 by 4 coding structure is ultimately chosen. In addition, when the number of network stages is 1, the present invention becomes a single exposure enhanced detection network, and the results in FIG. 4 also verify that the multi-exposure method used has higher accuracy than the single exposure algorithm.
S2, generating a plurality of pseudo exposure images with different exposure degrees from the low exposure image;
the vehicle type detection data set acquired by the camera is difficult to collect exposure maps of different levels of exposure of the same scene. Therefore, a multi-exposure generating network is needed before a multi-exposure fusion network, images in a data set are generated into pseudo-exposure images with different levels through deep learning, and in order to complete the supervision training task of the multi-exposure generating network, a plurality of pseudo-exposure images with different exposure degrees are firstly generated from an original low-exposure input image, so that a database is not needed to depend on low-illumination and normal-illumination image pairs for research, and the multi-exposure generating network has good expandability. Because of the nonlinear functional relationship between camera sensor irradiance and image pixel values, the two relationships can be roughly expressed as:
wherein P is x And P 0 The exposure rate k is the pixel value of the required image and the original image in the database x The function g is the exposure transform function, which is the ratio of their irradiance. Parameters a and b are internal parameters of the camera used, and are related only to the camera. And obtaining the value by fitting the response curves of all bayonets and the electric police camera in the database.
The parameter k is related to the required exposure rate of the data set photo, and the value of the required exposure rate is needed to be solved and counted. The statistical histogram is plotted as shown in fig. 5 below. Then the first quartile, the second quartile, the third quartile and the maximum in the distribution are selected as the required four different levels of exposure rates, so the four phases of exposure rate settings are in turn: k (k) 1 =3.8328,k 2 =4.0223,k 3 =4.3367,k 4 = 6.9852. Finally, several k values are brought into the above formula for pseudo-exposure map generation.
S3, performing pseudo-supervision pre-training on the multi-exposure generating network by using the low-exposure image and the multiple pseudo-exposure images in the step S2 to obtain pre-training parameters of the multi-exposure generating network; then, carrying out joint training on the multi-exposure generating network and the multi-exposure fusion network by adopting a low-exposure image to obtain a weak light vehicle detection network;
specifically, the pre-training of the part can not only reduce the time for the subsequent end-to-end combined training, but also improve the performance of vehicle detection under the final dim light. Image I generated for measuring network training t And the pseudo-exposure map generated from step S2The present invention requires constructing a corresponding loss function. Since the result is a problem of uneven brightness caused by lamps or streetlamps, if only common MSE losses are adopted, the outliers of the overexposure may have a great influence on the result, so that the loss function is larger. Therefore, the invention combines MSE and MAE to form SMAE, and for the position with small error, MSE is used to make the loss easy to converge and more stable; and MAE is used to reduce the effect for outliers with large errors. In addition to the regression loss due to pixel errors, the overall quality of the generated image needs to be considered and the evaluation index added to the loss function. The structural similarity index SSIM can well reflect the quality feeling in the visual field, so the invention designs the loss function of the combination of the SSIM and the SMAE, the formula is shown as follows,
where H represents SMAE, the mean μ and variance δ are determined by applying the pixel p to the image t Calculated by applying a gaussian filter, N represents the number of pixels in the image.
Meanwhile, because the generated pictures can inevitably generate noise and artifacts, the invention adopts the common early stop method of deep learning to avoid the overfitting of the adverse factors in the pre-training, and prepares for the subsequent end-to-end generation fusion training.
In the vehicle detection phase, the loss function typically includes two parts, confidence loss and regression loss:
wherein N is a Represents the total number of samples, N p Representing the number of positive samples, the parameter λ is used to balance the confidence loss L conf And regression loss L reg
Vehicle target detection is applied to distinguish vehicles from the background, so confidence loss L conf Is a two-class loss function with a value of [0,1 ]]. For the common two-class cross entropy, the larger the positive sample output probability is, the more the positive sample output probability is close to 1, and the smaller the loss is; likewise, the smaller the negative sample output probability, the closer to 0, and the smaller the loss. In the road data set acquired by the camera, the duty ratio of the vehicle target is far smaller than that of the background, so that the target of the anchor is mainly a negative sample. Also, because there are no area candidate networks in one-stage, typically several hundred thousand anchors are generated on one picture, which results in a negative sample that is mostly easily distinguished during the last training process. Although they produce low losses, a large number of simple samples still produce large losses. Furthermore, the iterative process of a large number of simple samples is long and detrimental to target convergence. Therefore, the invention adds two dynamic scaling factors on the two-class cross entropy, and dynamically and greatly reduces the loss caused by the easy-to-distinguish sample in the learning process, thereby placing the center of gravity of the loss on the part of the indistinguishable positive and negative sample, which is very critical for difficult target detection under weak illumination. In summary, the confidence loss is given by the following formula, where the parametersAnd->For the designed double balance factors, simple samples are smoothly adjusted, so that the weight loss of the double balance factors is greatly reduced, the loss function focuses on vehicle samples which are difficult to detect under dim light, and finally the double balance factor loss function is formed. />1 represents a vehicle, +.>0 represents background, < >>A confidence score between 0 and 1.
The regression loss is then another part of the loss function, which is generated by just the positive sample calculation. The present invention uses SMAE loss to calculate prediction boxAnd (3) true frame->The distance between them is as follows:
the multi-exposure generating network and the multi-exposure fusion network are connected in series for training, so that an end-to-end learning system is formed by the two parts. From two-part functional cut-in analysis: the multi-exposure fusion network provides vehicle detection information to guide the generation network to conduct parameter fine adjustment, so that the vehicle area can be specially enhanced for detection; the multi-exposure generating network provides the enhanced image after supervision and pre-training for the fusion network, and performs self-adaptive fusion detection after a channel dimension attention mechanism. According to the invention, the two parts are coupled and then trained, so that not only can the detail interaction and feature fusion under different illumination conditions be realized, but also the end-to-end learning of the vehicle detection under abnormal illumination can be realized.
S4, inputting the weak light image to be detected into a weak light vehicle detection network to obtain a vehicle detection result.
The effect of the present invention is verified by experiments as follows. Firstly, quantitative analysis is carried out on the multi-exposure generation fusion method provided by the invention. Table 1 shows the results of the comparison of the network of the present invention with a single target detection network in terms of parameters, speed and overall accuracy. It can be seen that after the front low-light vehicle detects the network, the AP value of the photos in the data set of the special vehicle type is obviously improved by 7.3% during detection. AIoU also increased by 4.7%. In terms of parameter, due to the design of a light-weight attention mechanism and a single-gate memory unit, the existing algorithm only increases 0.49M of calculation parameters, and the increment in the aspect of special vehicle detection is within an acceptable range for practical use.
Table 1 multidimensional comparison of the invention with the original detection algorithm
For target detection under the condition of dark light, it is important to measure whether performance index Recall (Recall) of missed detection is achieved, so that key parameters IoU can be modified to perform experiments for multiple times, and further performance of Recall under different conditions is tested. The quantitative results of the experiment are shown in Table 2. It is readily seen that the value of Recall is decreasing with increasing IoU. Meanwhile, in any IoU, the recall ratio of the method is far higher than that of the Baseline algorithm.
TABLE 2 comparison of recall at different cross ratios
Finally, under the condition of IoU being 0.5, the conventional image enhancement algorithm under dark light is compared with the algorithm of the invention, and the comparison result is shown in table 3. It can be seen that the performance of the special vehicle detection algorithm under the dim light provided by the invention is superior to that of the other main stream schemes, and the special vehicle detection algorithm has better performance.
Table 3 comparison of different dim light detection algorithm performances
It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (7)

1. A weak light vehicle detection method based on multi-exposure generation fusion is characterized by comprising the following steps:
s1, constructing a weak light vehicle detection network; the low-light vehicle detection network comprises a multi-exposure generation network and a multi-exposure fusion network; the multi-exposure generation network comprises an encoder and a decoder for generating images of different exposure degrees; wherein the encoder has a plurality of stages; each stage comprises a plurality of layers of serially connected convolution circulation modules; each convolution circulation module is of a double-input single-output structure, one input is the parameter output of the same layer of convolution circulation module in the previous stage, and the other input is the coding output of the same layer in the same stage;
the multi-exposure fusion network is realized by adopting a target detection frame and is used for carrying out complementary learning on difference information among images with different exposure degrees to obtain a vehicle detection result; the parameter weight of the first layer convolution kernel in the target detection frame is copied and normalized for T times and then is connected with the output of the multi-exposure generation network; t represents the number of encoding stages;
s2, generating a plurality of pseudo exposure images with different exposure degrees from the low exposure image;
s3, performing pseudo-supervision pre-training on the multi-exposure generating network by using the low-exposure image and the multiple pseudo-exposure images in the step S2 to obtain pre-training parameters of the multi-exposure generating network; then, performing end-to-end joint training on the multi-exposure generating network and the multi-exposure fusion network by adopting the low-exposure image, so that the multi-exposure fusion network provides vehicle detection information to guide the multi-exposure generating network to perform parameter fine adjustment, and the multi-exposure generating network provides a pre-trained enhanced image for the multi-exposure fusion network to perform self-adaptive fusion detection, thereby finally obtaining a weak light vehicle detection network;
s4, inputting the weak light image to be detected into a weak light vehicle detection network to obtain a vehicle detection result;
the low light vehicle detection network further comprises a compression excitation network; the compression excitation network is connected between the multi-exposure generation network and the multi-exposure fusion network and is used for enabling the learning process to strengthen useful channel characteristics by using global information and fade useless channel characteristics so as to obtain channel dimension weights when 3T channels are fused;
the convolution circulation module comprises a single-gate memory unit, a channel attention mechanism and a space attention mechanism; the single-gating memory unit firstly carries out sigmoid gating operation after two inputs are respectively subjected to cavity convolution W and common convolution U to obtain a gating moduleThe method comprises the steps of carrying out a first treatment on the surface of the Then the gating module is multiplied by the input of the previous stage and convolved to obtain a hidden layer +.>The method comprises the steps of carrying out a first treatment on the surface of the Next, dynamically adjusting the effective information of the last stage of the required memory by using the gating parameters to obtain the output +_of the single gating memory unit>
The calculation formula of the single gate control memory unit is as follows:
wherein, the liquid crystal display device comprises a liquid crystal display device,is a sigmoid gating operation.
2. The method for detecting a weak light vehicle based on fusion generated by multiple exposure according to claim 1, wherein the method comprises the following steps ofGenerating a plurality of pseudo exposure patterns with different exposure degrees from the low exposure image;
P x 、P 0 the required image pixel value and the original image pixel value in the database are respectively,k x in order for the exposure rate to be high,abis an internal parameter related to the camera.
3. The method for detecting a low-light vehicle based on multi-exposure generation fusion according to claim 1, wherein the encoder has four stages; each stage includes four layers of serially connected convolution loop modules.
4. The method for detecting the weak light vehicle based on the fusion generated by multiple exposure according to claim 3, wherein the target detection framework adopts a feature pyramid network under one-stage; batch normalized BN and nonlinear activated RELU operations in the backbone network Resnet of the feature pyramid are located before the convolutional layer.
5. The method for detecting a weak light vehicle based on multi-exposure generation fusion according to any one of claims 1 to 4, wherein a loss function adopted by the multi-exposure generation network pseudo-supervision pre-training is as follows:
I t the images generated for the network training are displayed,in the form of a pseudo-exposure pattern,Nrepresenting the number of pixels in the image, SSIM represents a structural similarity index, and H represents SMAE loss.
6. The method for detecting a low-light vehicle based on multi-exposure fusion according to any one of claims 1 to 4, wherein the confidence loss used in the training process of the low-light vehicle detection network is:
、/>is a double balance factor, ++>1 represents a vehicle, +.>0 represents background, < >>A confidence score between 0 and 1.
7. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when executed by a processor, implements the method for detecting a low-light vehicle based on multi-exposure generation fusion according to any one of claims 1 to 6.
CN202310410770.2A 2023-04-18 2023-04-18 Weak light vehicle detection method based on multi-exposure generation fusion Active CN116129375B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310410770.2A CN116129375B (en) 2023-04-18 2023-04-18 Weak light vehicle detection method based on multi-exposure generation fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310410770.2A CN116129375B (en) 2023-04-18 2023-04-18 Weak light vehicle detection method based on multi-exposure generation fusion

Publications (2)

Publication Number Publication Date
CN116129375A CN116129375A (en) 2023-05-16
CN116129375B true CN116129375B (en) 2023-07-21

Family

ID=86312166

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310410770.2A Active CN116129375B (en) 2023-04-18 2023-04-18 Weak light vehicle detection method based on multi-exposure generation fusion

Country Status (1)

Country Link
CN (1) CN116129375B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106875352A (en) * 2017-01-17 2017-06-20 北京大学深圳研究生院 A kind of enhancement method of low-illumination image
JP2020027659A (en) * 2018-08-10 2020-02-20 ネイバー コーポレーションNAVER Corporation Method for training convolutional recurrent neural network, and inputted video semantic segmentation method using trained convolutional recurrent neural network
CN113516124A (en) * 2021-05-29 2021-10-19 大连民族大学 Electric energy meter electricity consumption information identification algorithm based on computer vision technology
CN113641722A (en) * 2021-07-20 2021-11-12 西安理工大学 Long-term time series data prediction method based on variant LSTM
US11222217B1 (en) * 2020-08-14 2022-01-11 Tsinghua University Detection method using fusion network based on attention mechanism, and terminal device
CN113971811A (en) * 2021-11-16 2022-01-25 北京国泰星云科技有限公司 Intelligent container feature identification method based on machine vision and deep learning
KR20220031249A (en) * 2020-09-04 2022-03-11 인하대학교 산학협력단 Lightweight Driver Behavior Identification Model with Sparse Learning on In-vehicle CAN-BUS Sensor Data
CN115511760A (en) * 2022-09-26 2022-12-23 上海交通大学 Pyramid convolution-based multi-exposure image fusion method
CN115643660A (en) * 2022-09-09 2023-01-24 桂林电子科技大学 Intelligent energy-saving lamp pole group control method based on recurrent neural network algorithm

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163234B (en) * 2018-10-10 2023-04-18 腾讯科技(深圳)有限公司 Model training method and device and storage medium
CN112464806A (en) * 2020-11-27 2021-03-09 山东交通学院 Low-illumination vehicle detection and identification method and system based on artificial intelligence

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106875352A (en) * 2017-01-17 2017-06-20 北京大学深圳研究生院 A kind of enhancement method of low-illumination image
JP2020027659A (en) * 2018-08-10 2020-02-20 ネイバー コーポレーションNAVER Corporation Method for training convolutional recurrent neural network, and inputted video semantic segmentation method using trained convolutional recurrent neural network
US11222217B1 (en) * 2020-08-14 2022-01-11 Tsinghua University Detection method using fusion network based on attention mechanism, and terminal device
KR20220031249A (en) * 2020-09-04 2022-03-11 인하대학교 산학협력단 Lightweight Driver Behavior Identification Model with Sparse Learning on In-vehicle CAN-BUS Sensor Data
CN113516124A (en) * 2021-05-29 2021-10-19 大连民族大学 Electric energy meter electricity consumption information identification algorithm based on computer vision technology
CN113641722A (en) * 2021-07-20 2021-11-12 西安理工大学 Long-term time series data prediction method based on variant LSTM
CN113971811A (en) * 2021-11-16 2022-01-25 北京国泰星云科技有限公司 Intelligent container feature identification method based on machine vision and deep learning
CN115643660A (en) * 2022-09-09 2023-01-24 桂林电子科技大学 Intelligent energy-saving lamp pole group control method based on recurrent neural network algorithm
CN115511760A (en) * 2022-09-26 2022-12-23 上海交通大学 Pyramid convolution-based multi-exposure image fusion method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
3维卷积递归神经网络的高光谱图像分类方法;关世豪;杨桄;李豪;付严宇;;《激光技术》;第44卷(第4期);第485-491页 *
A Real-Time Speech Enhancement Algorithm Based on Convolutional Recurrent Network and Wiener Filter;Jingyu Hou et al.;《2021 IEEE 6th International Conference on Computer and Communication Systems (ICCCS)》;第683-688页 *
Recurrent Exposure Generation for Low-Light Face Detection;Jinxiu Liang et al.;《IEEE Transactions on Multimedia》;第24卷;第1609-1621页 *

Also Published As

Publication number Publication date
CN116129375A (en) 2023-05-16

Similar Documents

Publication Publication Date Title
CN113052210B (en) Rapid low-light target detection method based on convolutional neural network
CN112308200B (en) Searching method and device for neural network
CN112150493B (en) Semantic guidance-based screen area detection method in natural scene
CN110443763B (en) Convolutional neural network-based image shadow removing method
Onzon et al. Neural auto-exposure for high-dynamic range object detection
WO2020001196A1 (en) Image processing method, electronic device, and computer readable storage medium
CN110807384A (en) Small target detection method and system under low visibility
CN114842503A (en) Helmet detection method based on YOLOv5 network
CN112084897A (en) Rapid traffic large-scene vehicle target detection method of GS-SSD
CN115115973A (en) Weak and small target detection method based on multiple receptive fields and depth characteristics
CN116129375B (en) Weak light vehicle detection method based on multi-exposure generation fusion
Ren et al. A lightweight object detection network in low-light conditions based on depthwise separable pyramid network and attention mechanism on embedded platforms
CN116597144A (en) Image semantic segmentation method based on event camera
Du et al. MEGF-Net: multi-exposure generation and fusion network for vehicle detection under dim light conditions
CN116452472A (en) Low-illumination image enhancement method based on semantic knowledge guidance
CN115359376A (en) Pedestrian detection method of lightweight YOLOv4 under view angle of unmanned aerial vehicle
Kocdemir et al. TMO-Det: Deep tone-mapping optimized with and for object detection
CN114998930A (en) Heavy-shielding image set generation and heavy-shielding human body target model training method
CN115171001A (en) Method and system for detecting vehicle on enhanced thermal infrared image based on improved SSD
Zhao et al. Object detection based on multi-channel deep CNN
Gao et al. YOLO-TLA: An Efficient and Lightweight Small Object Detection Model based on YOLOv5
Li Low-light image enhancement with contrast regularization
CN113518210B (en) Method and device for automatic white balance of image
CN113609904B (en) Single-target tracking algorithm based on dynamic global information modeling and twin network
WO2022242175A1 (en) Data processing method and apparatus, and terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant