CN109389569B

CN109389569B - Monitoring video real-time defogging method based on improved DehazeNet

Info

Publication number: CN109389569B
Application number: CN201811261910.XA
Authority: CN
Inventors: 陈天悦
Original assignee: Daxiang Intelligent Technology Nanjing Co ltd
Current assignee: Daxiang Intelligent Technology Nanjing Co ltd
Priority date: 2018-10-26
Filing date: 2018-10-26
Publication date: 2021-06-04
Anticipated expiration: 2038-10-26
Also published as: CN109389569A

Abstract

A monitoring video real-time defogging method based on improved DehazeNet comprises the following steps: 1) acquiring a video through video acquisition equipment, cutting the video into a single picture frame by frame, and providing the single picture for a neural network to process; 2) using a trained improved DehazeNet neural network, namely the weights of all layers are known, and processing the input picture in a blocking way to obtain the transmittance t (x) and the atmospheric light constant of each part, and finally forming a transmittance distribution map and an atmospheric light constant distribution map; 3) and acquiring the output of the neural network, solving the fog-free image according to the defogging algorithm based on the atmospheric scattering model, and splicing the fog-free image into a video again. Compared with the traditional single image defogging method, the method and the device have the advantages that the real-time defogging of the video is realized, the defogging effect is ensured, and the problems of supersaturation and astrolabe blurring existing in the traditional defogging method are solved.

Description

Monitoring video real-time defogging method based on improved DehazeNet

Technical Field

The invention belongs to an application technology of a computer technology in video processing, and particularly relates to a monitoring video real-time defogging method based on improved DehazeNet.

Background

Fog is a common atmospheric phenomenon. Dust, smoke, or other particles that are ubiquitous in the air can reduce the clarity of the atmosphere. The contrast of the object in the visual effect is reduced when imaging due to scattering of light by the particles in the atmosphere. Thus, fog often presents a number of problems for photographic imaging. The existence of fog essentially changes the atmospheric transmittance, and the contrast and the color of the outdoor scene image are changed, so that many characteristics contained in the image are covered or blurred, a video monitoring product cannot acquire clear live images, and the safety precaution of important places in cities is seriously influenced.

The transmission of fog in an image tends to be depth dependent, with lower transmission the further the object is from the image capture device. There exist many defogging methods for a single image, which can be divided into two main categories, image restoration and image enhancement. In addition to methods based on histogram, contrast, and saturation, respectively, methods have also been proposed that utilize multiple images of the same scene under different atmospheric conditions and image depth information for defogging. However, in practical applications, the depth information of the picture and multiple fog maps of the same scene are not easy to acquire.

More reasonable assumptions and a priori knowledge are provided in recent years, and the defogging effect of a single image is greatly improved on the basis, but problems still exist in the aspects of image quality and rapidity, such as:

the precision is lower. A defogging method based on the maximization of local contrast of a Markov Random Field (MRF) is likely to cause the supersaturation of an image although it can achieve a good defogging effect, according to the assumption that the local contrast in a fog-free image is higher than that of a fog-containing image.

② the speed is slower. Independent Component Analysis (ICA) based on minimum input is used for defogging work, but the processing time of this method is very long and cannot be used for processing foggy pictures.

Disclosure of Invention

In order to solve the problems of low precision and low speed in the traditional single image algorithm, the overall scheme design is carried out on the new video defogging algorithm by combining the convolutional neural network and the parallel computing structure. The invention provides an improved defogging network DehazeNet based on a convolutional neural network, and the defogging precision of an image is effectively improved by applying an atmospheric scattering model and various prior knowledge; the design structure and the operation amount of each part in the original DehazeNet are analyzed, and the network is adjusted and optimized on the premise of ensuring the precision and the defogging effect, so that the processing time of recovering the image is reduced; meanwhile, the pixel recovery process is parallelized, and the fog image processing process is accelerated, so that the high-definition real-time video defogging function is realized, and the method has the characteristics of good effect, self-adaption defogging and the like.

The specific technical scheme of the invention is illustrated as follows:

a monitoring video real-time defogging method based on improved DehazeNet comprises the following steps:

1) acquiring a video through video acquisition equipment, cutting the video into a single picture frame by frame, and providing the single picture for a neural network to process;

2) using a trained neural network, namely the weights of all layers are known, and carrying out blocking processing on an input picture to obtain the transmittance and the atmospheric light constant of each part so as to finally form a transmittance distribution map and an atmospheric light constant distribution map;

3) and acquiring the output of the neural network, solving the fog-free image according to the atmospheric scattering model, and splicing the fog-free image into a video again.

Specifically, the method comprises the following steps:

defogging algorithm theoretical model

1. Atmospheric scattering model

In order to describe the formation of the fog map, an atmospheric scattering model is proposed in the prior art, which is a basic model for image defogging, and this model is then improved several times.

The atmospheric scattering model is expressed by the following formula (1):

I(x)＝J(x)t(x)+α(1-t(x))， (1)

where I (x) is the observed fog image element, J (x) is the restored real image element, t (x) is the atmospheric transmittance, and α is the global atmospheric light constant. In the formula (1), there are three unknown parameters j (x), t (x), α, and after t (x) and α are estimated, the real scene graph j (x) can be recovered.

The atmospheric transmittance t (x) is used to describe the proportion of light that is not scattered and reaches the camera, and is defined by formula (2):

t(x)＝e^-βd(x)， (2)

wherein d (x) is the distance from the scene point corresponding to the image element to the camera, namely the depth of field, and β is the atmospheric scattering coefficient, which indicates that t (x) tends to 0 when d (x) tends to infinity. The synthesis of (1) and (2) can be obtained

α＝I(x)，d(x)→∞， (3)

In practical imaging, the depth of field d (x) may not be infinite, but a small transmittance t may be defined at a long distance₀. In this case, instead of using the method of obtaining the atmospheric light in equation (3), it is more accurate to estimate the atmospheric light according to the following equation (4):

based on the above discussion, it can be concluded that: accurate estimation of the atmospheric transmittance is key to the recovery of sharp images.

2. Defogging prior knowledge

Based on empirical observations, existing methods propose a variety of assumptions or a priori knowledge for calculating the fog-related characteristics. Defogging of an image can be achieved using these associated features.

Dark channel prior:

dark channel priors are based on extensive observation of outdoor fog-free images. In most fog-free image blocks, at least one color channel has some blocks of pixels with very low intensity values, even close to 0. The dark channel is defined as the minimum of all pixels in the image block:

wherein, I^cIs the RGB color channel of I, Ω_r(x) Refers to an r × r image block centered at x. The prior characteristic of the dark channel is closely related to the concentration of fog in the image and can be directly used for estimating the atmospheric transmittance t (x) · oc 1-D (x).

Maximum contrast ratio:

according to the atmospheric scattering model, the effect of haze on transmittance reduces the contrast of the image:

based on this observation, local contrast (in an s × s sized image block Ω) is used_s(x) The difference between each pixel) and in the r × r size region Ω_rThe local contrast in the inner region is maximized and defogging is performed. The definition is as follows:

wherein | Ω_s(y) | is the cardinality of the local neighborhood, and the relationship between the contrast characteristic and the transmittance t is obvious, so the local contrast maximization defined by the formula enhances the visual effect of the image.

③ color attenuation:

saturation I of an image block^s(x) Will be provided withAffected by the fog and thus sharply reduced, while the brightness value I is at the same time^v(x) Will increase significantly causing an increase in the difference between the two. The difference between brightness and saturation can be used to estimate the fog concentration from the color decay prior:

A(x)＝I^v(x)-I^s(x)， (8)

wherein I^v(x) And I^s(x) Can be represented in HSV color space as

I^v(x)＝max_{c∈{r，g，b}}I^c(x) (9)

The color attenuation characteristic is proportional to the depth of field d (x) and can be easily used for transmittance estimation.

Color difference:

original drawing I (x) and corresponding half-inverse drawing I_si(x) The difference in hue between can be used to detect fog in the image. The definition of the half-inverse is as follows:

I_si(x)＝max_{c∈{r，g，b}}[I^c(x)，1-I^c(x)]， (11)

for fog-free images, the pixel values of the three channels in the half-inverse image are not all inverted, resulting in I_si(x) And i (x), a large hue difference is generated, which is defined as:

where the corner mark 'h' represents the luminance channel in the HSV color space. According to the formula (12), the atmospheric transmittance t (x) propagates in the reverse direction to h (x).

(II) DehazeNet network structure analysis

Existing DehazeNet networks consist of convolutional and pooling layers and utilize a bilateral linear rectification function as the activation function at the end of the network. Depending on the general structure of neural networks, networks can be divided into four major components, feature extraction, multi-scale mapping, local extrema, and nonlinear convergence. The overall structure of the network is shown in fig. 1.

Firstly, feature extraction:

according to the defogging prior knowledge, the characteristics related to the fog in the image comprise a dark channel, color tone inconsistency, maximum contrast, color attenuation and the like of the image. The extraction of picture features is essentially a convolution operation with the picture using appropriate filters, often accompanied by a non-linear mapping. The first part of the network is thus made up of the convolutional layer conv1 and the reshaped layer reshape 1. Where the convolutional layer is used to implement the filter function and the re-shaping layer is to provide a suitable data input form to the subsequent pooling layer. The present invention adds a pooling layer pool1 for further feature extraction.

The effective information of the RGB fog map (each pixel of the color fog map is represented by different proportions of RGB, red, green and blue) is often not uniformly distributed in R, G, B three channels, so the invention utilizes the MAX function of the pooling layer and mainly aims at more fully extracting the characteristics of the image and reasonably simplifying the data. Meanwhile, in order to better achieve the rapidity target required by real-time fog penetration, the design of the pooling layer of the invention does not adopt one-step sliding, but adopts one-step sliding of the size of the pooling layer to extract the characteristics of the image more rapidly. Further, in order to meet the next part of the processing requirements, the whole network is provided with a reshaping layer reshape2 after the pooling layer.

Multi-scale mapping:

the feature mapping of different scales of the existing DehazeNet network plays an important role in the defogging process. Meanwhile, the mapping of different scales is beneficial to further compressing the data structure, reducing the operation burden of the network and meeting the requirement of rapidity. In order to realize multi-scale mapping, the invention proposes a parallel convolution structure as a second part of the neural network. The convolution kernel sizes chosen by the present invention are 1 × 1, 3 × 3, 5 × 5, and 7 × 7, according to general experience in image processing. Through the conv layers conv2/1 × 1, conv2/3 × 3, conv2/5 × 5 and conv2/7 × 7 with different sizes, the corresponding features of four scales in the image are extracted. And then splicing between the four input data is realized through one convolution splicing layer conv 2/output. The splice layer input dimension uses a default setting, i.e., splices with all channel data.

③ local extremum:

in the evaluation criteria for image restoration, spatial invariance is a very important index. The spatial invariance feature may be implemented using a series of pooling operations. In convolutional neural networks, local extrema are the classical methods used to overcome the spatial disparity caused by local sensitivity. Meanwhile, as the image acquisition equipment aims at outdoor road monitoring and security monitoring, the transmittance of the air medium in a small range should not generate huge sudden change, and the operation of local extreme values can effectively eliminate the white noise which possibly exists. The third part of the overall network is thus constituted by a pooling layer pool2 set to MAX, which differs from the pooling layer of the first part in that, given the local sensitivity, it tends to be caused by a single pixel, the pooling layer is here set with a sliding step size of 1 to better preserve the spatial invariance of the whole image.

Nonlinear convergence:

the activation function used for the earliest deep learning is the S-type activation function, as shown in the following equation (13)

Then, in order to increase the convergence rate, in consideration of the problem of gradient attenuation existing in the S-shaped activation function, most of the neural network training starts to use a linear rectification unit as the activation function, as shown in the following formula (14).

The problem with a linear rectification function is that the value of its output is not from 0 to 1, as is the case with an S-shaped activation function, but from 0 to infinity. Such an output is clearly not in accordance with the physical fact that the transmittance cannot exceed 1. Therefore, in order to make the final output between 0 and 1, the present invention selects a bilateral linear rectifying unit as the activation function, as shown in the following equation (15).

In the context of a concept Architecture (Convolutional neural network Architecture), a conventional excitation layer includes a linear rectification unit (ReLU), a Sigmoid function (Sigmoid), a hyperbolic tangent function (TanH), an absolute value function (AbsVal), a weight function (Power), and a binomial natural log-likelihood function (BNLL). The double-sided linear rectifying unit is not included, so that the function of double-sided linear rectification can be realized only by matching the full-connection layer with the linear rectifying unit in the notch frame, and the transmittance required by the invention is finally output.

(III) image restoration acceleration

In order to realize real-time defogging and accelerate the image processing speed, as the Python can call the kernel function to realize the parallel operation of the GPU graphic processor, the invention adopts the Python to call the library function to carry out the multithreading parallel operation and realizes acceleration from the following two aspects:

(1) the global atmospheric light constant a is estimated. In the process of obtaining the atmospheric parameters, after the transmittance distribution is obtained by primarily improving the DehazeNet network which cannot train the atmospheric light constant, the global atmospheric light constant alpha needs to be solved. The invention realizes the merging solution of the minimum value of the transmissivity by using a single block through a shared memory, and stores an intermediate result in the shared memory, thereby reducing the access to a main memory and further realizing the optimization; and similarly, a single block is adopted for the maximum value among the rows, the maximum values of all the rows are obtained through integration, and finally the intensity value of the point with the minimum transmittance value is used as the global atmospheric light constant alpha.

(2) And (5) restoring the image. For the Multitask network which can train the transmittance and the atmospheric light constant at the same time after improvement, the invention calculates the corresponding J (x) with the highest efficiency through parallel calculation. The sharp image J (x) can be restored according to t (x) and a by the following formula:

when parallel calculation is realized on the GPU, the number of parallel threads is the number of pixel points, and each thread calculates a clear image J (x) corresponding to the pixel point x to realize parallel acceleration.

(IV) evaluation criteria determination

1. Neural network convergence criteria

Mean Square Error (MSE) is a function that evaluates the performance of the network. For example, there are n pairs of input and output data, and the output of the network after training is denoted as y_i. The statistical parameter is the mean value of the square sum of the error of the corresponding points of the predicted data and the original data, the calculation formula is,

where n is the number of samples, y_iIs the real data that is to be presented,

is the fitted data, w_iIs greater than 0. As can be seen from the definition of MSE, the closer the MSE value is to 0, the better the model selection and fitting are, the more successful the data estimation is, and the more convincing the network output result is.

2. Evaluation of image restoration Effect

Information entropy: the richness of the color can be quantified by probability. If the probability of each pixel value in the picture is not zero, the picture can be regarded as a colorful picture. The final quantization result of this probability is the value to which the information entropy is fitted. The larger the entropy value is, the better the fog map processing effect is.

Average gradient: the gray levels near the boundary or two sides of the hatched line of the image have obvious difference, namely the gray level change rate is large. The magnitude of this rate of change can be used to represent image sharpness. It reflects the rate of change in contrast of the fine details of the image, characterizing the relative sharpness of the image.

Drawings

FIG. 1 is a diagram of a DehazeNet network architecture;

FIG. 2 is a flow chart of the present defogging method;

FIG. 3 is a structural diagram of a modified DehazeNet of the present defogging method;

FIG. 4 is a schematic diagram of a fogged tile dataset;

FIG. 5-1 is a haze image of a city (not treated by the present invention);

FIG. 5-2 is a haze image of a city (processed using the present invention).

Detailed Description

The present disclosure is further described with reference to the following drawings and detailed description:

overall design of defogging method

Based on the atmospheric scattering model, video defogging is to estimate transmittance and atmospheric light constants to solve the original image under the condition of known fog images. The invention divides the whole real-time defogging process into three parts (the general block diagram is shown in figure 2):

the first part is to collect video by video acquisition equipment, then cut the video frame by frame into a single picture, and provide the single picture for the neural network to process.

And in the second part, the trained neural network is utilized, namely the weights of all layers are known, and the input picture is processed in a blocking mode to obtain the transmittance and the atmospheric light constant of each part, so that a transmittance distribution map and an atmospheric light constant distribution map are finally formed.

And the third part is to obtain the output of the neural network, solve the fog-free image according to the atmospheric scattering model and then splice the fog-free image into a video again.

(II) hardware design and environment construction of defogging system applying the method

1. Image acquisition device invocation

The defogging system of the method selects the network camera as the image acquisition equipment.

The traditional analog camera generally uses a CMOS as a sensor, has lower resolution and illumination, and is suitable for some video telephones and conference telephones with low requirements on image quality; and the analog camera can only carry out unidirectional signal transmission, the video signal is transmitted, and the video signal can be read only by connecting a hard disk video recorder or a monitor during checking. In contrast, the webcam adds a module for compressing and processing video, while having the functions of a camera and a video server. Therefore, the network camera can process data only by one IP network interface, and meets the requirement of real-time monitoring.

In consideration of the requirement of real-time processing of the video, the network camera can directly transmit the video to a computer without a video recorder, so that the speed is higher; meanwhile, a third-party device can be provided for stream taking through a standard protocol RTSP link mode of the camera, and a code stream message on a network is received in real time and decoded into a video.

In the process of realizing PC end installation and program calling, the invention connects the camera with the computer by the network cable. When the network camera is used, whether the IP address of the local computer and the IP address of the network camera are in the same network segment or not is checked, the IP address, the subnet mask and the default gateway of the network camera are changed as required, so that the local computer can identify the network camera, and the monitoring video can be displayed on the computer in real time as the standard for successful configuration of the camera. According to the invention, the IP address is accessed through the Opencv library function, and the data stream transmitted by the network camera is acquired. The network camera is set to be an IP address, the address is accessed through Opencv, so that the camera can be indirectly called, and the acquired data is converted into images to be displayed at a computer terminal. Introducing a cv2 function library into python, obtaining real-time video data transmitted from a network camera by calling a VideoCapture function of Opencv, and processing and displaying a video image in real time.

2. Configuration of upper computer operating environment

The upper computer system realized by the system hardware is ubuntu16.04, the memory space is 7.7GB, and the video memory is 4 GB. The upper computer central processing unit is an Intel i7 core of 2.80GHz, and the used graphics processing unit is Geforce GTX 1050 Ti. The invention uses the Python's classic integrated development environment Pycharm community version. For more convenient matrix operation, the interpreter of Pycharm is set to Anaconda3 to facilitate the use of its own numpy module in Anaconda. In order to realize a series of processing of videos and images, the invention calls a function library in OpenCV3.4.0, and correspondingly introduces a cv2 module in Python. In order to train and test the neural network, the invention compiles a cafe module containing a CUDA9.0 interface and a CuDNN network function, and completes a basic call in Python.

(III) improved DehazeNet construction

1. Network architecture adjustment and improvement

The invention mainly bases on two ideas for adjusting the network structure:

firstly, in order to realize rapidity, the structure and parameters of four parts of the existing DehazeNet are improved;

secondly, in order to achieve a better defogging effect, the problem that the atmosphere illumination is suddenly changed due to the shadow possibly existing in security monitoring is considered, the invention provides a multitask structure under a caffe framework, and meanwhile, the image transmittance and the atmosphere light constant are learned.

The optimized network structure is shown in fig. 3.

It can be seen that in order to reduce the amount of computation in the final picture processing, the present invention sets a parallel convolution layer conv2/1 × 1 with a size of 1 × 1 in the multi-scale mapping part. Accordingly, in the reshaping layer, the present invention divides a new dimension reshape _ a, and in the subsequent convolution connection layer, the present invention also accepts a new bottom structure for incorporating the smaller convolution kernel conv2/1x1 into the network overall architecture.

Furthermore, considering that the size of the existing training set is limited and the existing network weights are known, the present invention uses a fine-tuning training method (fine-tuning). The invention adopts a Zerewinder (Xaiver) initialization method, and the output variance of each layer in the neural network is equal as much as possible by the initialization method, thereby ensuring better information flow in the network.

The design idea of the learning network for atmospheric light radiation is similar to the design of the transmittance network, namely, a neural network layer is used for simulating the traditional algorithm. The invention still extracts the basic features of the picture by using the convolution layer conv _ a and the reshaping layer reshape _ a. Then, a maximum pooling layer pool _ a is used to obtain the atmospheric light constant of the image. Finally, the invention still utilizes the bilateral linear rectification unit to achieve the effect of accelerating convergence.

2. Training set creation and retraining

The training set of the neural network uses a fog image generated by randomly increasing the fog effect of a standard fog-free image based on an atmospheric scattering model.

This training mode has proven to be feasible in the prior art. The haze-free map source of the invention is a newly added 2014-year data set of the Middlebury Stereo database. The invention is trained by dividing pictures into 16x16 image blocks and adding haze effect to the pictures according to equation (1). The transmittance of each picture is randomly generated by the monte carlo method and recorded as a label. The jpg format of the training set is shown in fig. 4, which is then converted to the lmdb format:

during neural network training, the invention selects to use a newly added training set to carry out fine tuning mode training under the existing weight, namely, the existing weight path is added by using the weights command during training, and the training set is newly added, so that the weight is more perfect. The invention adopts the scale of one iteration of 50 pictures for training, and sets the verification interval to be 4. Finally, the network has a certain improvement on the image processing effect, and the evaluation criteria are improved on the basis of the three evaluation criteria selected by the invention.

3. Analysis of treatment effectiveness

After the present invention makes fine adjustments to the network structure and retrains with the new data set, the mean square error of the network is reduced from 0.0127, initially, to 0.0089. It can be seen that the larger data set and the fine-tuned training pattern compensate for the possible non-convergence problem caused by the network structure variation. In order to learn the transmittance and the atmospheric light constant simultaneously, in a multitask mode, the method is trained again, the mean square error is increased along with further complication of a network structure, but under the information entropy and average gradient standards provided by the method, the processing effect on a single image is still improved, and the specific effect is shown in table 1.

TABLE 1 comparison of Single Panel dehazeNet, dehazeNet Fine tuning and Multitask defogging Effect

In addition to quantitative analysis, the defogging effect was also visually perceptible, taking a haze image of a certain city as an example. As is apparent from FIGS. 5-1 and 5-2, the tower crane of FIGS. 5-2 is much more clear.

Then, single images with different sizes are processed by using a network inspection function, the processing time is recorded and listed in table 2, and compared with the processing time required by the defogging of a dark channel, it can be seen that the rapidity is obviously improved by using the trained DehazeNet for defogging, and the improved DehazeNet defogging rapidity with the atmosphere optical constant learning structure is slightly reduced, but a better defogging effect is obtained (as shown in table 1).

TABLE 2 defogging times for different size pictures DCP, DehazeNet and Multitask method

Size of picture	Dark channel prior	DehazeNet	Multitask
				800x600	0.154s	0.067s	0.089s
1280x720	0.298s	0.138s	0.177s
				1600x900	0.815s	0.421s	0.589s
1920x1080	0.926s	0.679s	0.790s

(IV) parallelization of image restoration

And calling the GPU to efficiently solve the global atmospheric light constant alpha. When calculating the global atmospheric light constant, it is necessary to compare the transmittance t (x) of each point in the image and obtain the point with the minimum transmittance. The comparison process may be accelerated in parallel.

First, a kernel function comparison is defined to obtain the point in the image where the transmittance is the minimum. The image to be processed collected by the camera is 1920 × 1080 size, and the total number of the image is 1080 lines. Allocating 36 blocks, making one block process 30 lines of images, allocating 30 threads within one block, one thread processing one line of images. Each thread performs the same kernel function and the comparison yields the point in each row of data where the transmission is the smallest.

Since the processing between rows is parallel, the row-to-row comparison is necessary to obtain the minimum transmittance of the whole image, and thus the minimum value obtained for each row needs to be temporarily stored. The invention takes the problems that a large amount of resources are consumed by calling the memory and time is wasted into consideration, so that the shared memory is adopted, the expenditure is reduced, and the operation speed is accelerated.

The 33 × 33 shared memory, 1089 banks in total, can store 1080 lines of obtained minimum transmittance data. After the comparison in the row is finished, in order to obtain a comparison result between the rows, the method allocates a block, and 30 threads are arranged in the block, wherein each thread is responsible for the comparison and obtains the most value of 36 data; similarly, applying for a3 × 10 bank, recording the intermediate result by using the shared memory; after this calculation, only 30 data remain. At this time, the data processing amount is greatly reduced, and the operation can be quickly completed by only one thread.

Therefore, the solution of the minimum transmittance of an image is completed through three-time merging comparison, the calculation amount is greatly reduced, and the calculation speed is accelerated. The transmittance t (x) and the global atmospheric light constant alpha are obtained, and a restored image can be calculated.

If the serial computation is performed by the CPU, that is, 1920 × 1080 ═ 2073600 cycles are performed, which wastes time and resources, and the processing of each pixel is not complicated, and the processing between the pixel and the pixel is not coupled, so the computation speed can be increased by GPU acceleration.

As can be seen from the formula (16), since the output image can be obtained by only one operation, the thread can be directly output without allocating additional memory. Each block is assigned 8 blocks, each block needs to process 259200 pixels, each block is assigned 12 × 20-240 threads, each thread needs to process 1080 pixels. Each thread executes the same kernel function kernel, and the device side schedules warp to calculate the corresponding output image j (x) with the highest efficiency. Through distribution, the operation speed can be accelerated, and real-time processing is realized.

However, the process of solving the transmissivity is too complex, the calculation speed is difficult to increase even though the GPU is used for acceleration, and the change of the transmissivity in a short time is almost zero under the windless condition, so that the atmospheric transmissivity is calculated once within 10 seconds, unnecessary calculation is reduced, and real-time defogging processing is realized.

The technical difficulties of the invention are mainly reflected in the following three aspects: firstly, selecting a defogging algorithm; secondly, training and constructing a neural network; and thirdly, building a system framework. The surveillance video defogging system combines the traditional defogging algorithm and the novel defogging network, but only uses the data processing algorithm to have limited recovery effect on the dense fog, and once the fog is too dense, the result of the defogging algorithm is no longer credible. Through comparison of a large number of pictures, a proper transmittance threshold value is given by utilizing a 'binary classification method' in a neural network and is used as a final basis for judging whether the final result of the fog penetration algorithm is credible.

Has the advantages that:

the invention improves and retrains the DehazeNet system based on the neural convolution network to obtain the picture transmittance graph, and realizes real-time video defogging by utilizing parallelization calculation. In the construction and training of the neural network, the invention uses the bilateral linear correction unit as an activation function and uses Zeville distribution as weight initialization distribution to reduce the search space and improve the convergence speed. The invention provides a method for improving the defogging effect of a single image, which takes the information entropy and the average gradient as defogging effect indexes, and on the basis, adopts a multitask mode under a caffe framework and designs a neural network for simultaneously learning the transmittance and the atmospheric light constant, so that the single image processing effect is further improved under the evaluation system. After the transmittance graph is obtained, kernel functions are called through the CUDA, parallel calculation is achieved on the GPU to accelerate the image restoration speed, and the requirement of real-time video defogging is met.

Compared with the traditional single image defogging method, the method and the device have the advantages that the real-time defogging of the video is realized, the defogging effect is ensured, and the problems of supersaturation and astrolabe blurring existing in the traditional defogging method are solved. According to the invention, the software defogging treatment is carried out after the video data of the camera is directly called by utilizing Python, compared with the traditional optical fog penetration, the image acquisition equipment does not need to be additionally updated and upgraded, and the capital investment is greatly saved.

Claims

1. A monitoring video real-time defogging method based on improved DehazeNet is characterized by comprising the following steps:

2) using a trained improved DehazeNet neural network, namely the weights of all layers are known, and processing the input picture in a blocking way to obtain the transmittance t (x) and the atmospheric light constant of each part, and finally forming a transmittance distribution map and an atmospheric light constant distribution map;

3) acquiring the output of a neural network, solving a fog-free image according to a defogging algorithm based on an atmospheric scattering model, and splicing the fog-free image into a video again;

the step of step 2) comprises the following steps:

2.1) carrying out feature extraction on the picture obtained in the step 1):

the improved DehazeNet network is that a new convolution layer conv _ a and a reshaping layer reshape _ a are added in the DehazeNet network; convolutional layer conv _ a is parallel to convolutional layer conv1 in the DehazeNet network; the reshaping layer reshape _ a is divided in the reshaping layer reshape1 in the DehazeNet network;

extracting basic features of the picture by the convolution layer conv _ a and the reshaping layer reshape _ a, and extracting an atmospheric light constant of the sampled image by the maximum pooling layer pool _ a;

extracting basic features of the picture by the convolutional layer conv1 and the reshaping layer reshape1, and further extracting the atmospheric transmittance of the image by the pooling layer pool _ 1; a reforming layer reshape2 is arranged behind the pool layer pool _ 1;

2.2) multiscale mapping:

for the output of the reshaping layer reshape2, a parallel convolution structure is adopted, the sizes of selected convolution kernels are 1 × 1, 3 × 3, 5 × 5 and 7 × 7, and the features corresponding to four scales in the image are extracted through conv2/1 × 1, conv2/3 × 3, conv2/5 × 5 and conv2/7 × 7 of the four convolution layers with different sizes; splicing among the four input data is achieved through a convolution splicing layer conv 2/output;

2.3) local extrema:

for the result of step 2.2), further extracting by a pooling layer pool2 set to MAX, the sliding step of pooling layer pool2 is 1;

2.4) non-linear convergence:

for the result of step 2.3), a bilateral linear rectifying unit is used as the activation function, as follows:

in the step 3):

firstly, solving a global atmospheric light constant alpha: comparing the transmittance t (x) of each pixel point x in the image, and obtaining a point with the minimum transmittance value from the transmittance t (x), wherein the intensity value of the point is used as a global atmospheric light constant alpha;

then, the haze-free image is solved according to the atmosphere scattering model-based defogging algorithm, and the clear image J (x) is formed by a formula according to t (x) and alpha

Restoration, where i (x) is the observed fog image element.

2. The real-time defogging method for the surveillance video according to claim 1, wherein in the step 3), the selection process of the fogless map algorithm comprises the following steps:

the atmospheric scattering model is expressed by the following formula (1):

I(x)＝J(x)t(x)+α(1-t(x))， (1)

wherein, I (x) is the observed fog image element, J (x) is the recovered defogged image element, t (x) is the atmospheric transmittance, and alpha is the global atmospheric light constant; in the formula (1), three unknown parameters j (x), t (x), and α exist, and after t (x) and α are estimated, the real scene graph j (x) is recovered;

t(x)＝e^-βd(x)， (2)

wherein d (x) is the distance from the scene point corresponding to the image element to the camera, and beta is the atmospheric scattering coefficient;

obtained by combining (1) and (2)

α＝I(x)，d(x)→∞， (3)

In practical imaging, the depth of field d (x) may not be infinite, but a small transmittance t may be defined at a long distance₀(ii) a In this case, instead of using the method of obtaining the atmospheric light in equation (3), it is more accurate to estimate the atmospheric light according to the following equation (4):

based on the above analysis, it can be concluded that: accurate estimation of the atmospheric transmittance is key to the recovery of sharp images.

3. The real-time defogging method for a surveillance video, as recited in claim 1, wherein in the step 3), the global atmospheric light constant α is solved by: merging and solving the minimum value of the transmissivity by using a single block through a shared memory, and storing an intermediate result in the shared memory; the method is implemented by adopting a single block for the maximum value among the rows, the maximum value of all the rows is obtained through merging, and finally the intensity value of the point with the minimum transmittance value is taken as the global atmospheric light constant alpha, which is specifically as follows:

when calculating the global atmospheric light constant, the transmittance t (x) of each point in the image needs to be compared, and the point with the minimum transmittance is obtained from the transmittance t (x), and the comparison process is accelerated in parallel:

firstly, defining a point with the minimum transmittance in an image obtained by comparing kernel functions, wherein the size of the image to be processed collected by a camera is 1920 x 1080, and the total number of the image is 1080 lines;

allocating 36 blocks, enabling one block to process 30 lines of images, allocating 30 threads in one block, and enabling one thread to process one line of images; each thread executes the same kernel function, and the point with the minimum transmittance in each row of data is obtained through comparison;

since the processing between rows is parallel, to obtain the minimum transmittance of the whole image, the comparison between rows must be made, so that the minimum value obtained by each row needs to be temporarily stored;

applying 33 × 33 for the shared memory, 1089 banks in total, and storing the obtained 1080 lines of minimum transmittance data; after the comparison in the row is finished, in order to obtain a comparison result between the rows, a block is allocated, 30 threads are arranged in the block, and each thread is responsible for the comparison and obtains the most value of 36 data; similarly, applying for a bank of 3 x 10, recording intermediate results using shared memory; after the calculation, only 30 data are left;

when the transmittance t (x) and the global atmospheric light constant α are obtained, the image is restored.

4. The real-time defogging method for a surveillance video, as recited in claim 3, wherein in said step 3), when the image is restored, the corresponding J (x) is obtained through parallel computation; when parallel calculation is realized on a GPU, the number of parallel threads is the number of pixel points, and each thread calculates a clear image J (x) corresponding to the pixel point to realize parallel acceleration;

by

Therefore, the output image can be obtained only by one operation, so that the read is directly output without allocating extra memory;

allocating 8 blocks, wherein each block needs to process 259200 pixels, allocating 12 × 20 to 240 threads for each block, and each thread needs to process 1080 pixels; each thread executes the same kernel function kernel, and the device side schedules warp to calculate the corresponding output image j (x) with the highest efficiency.

5. The real-time defogging method for a surveillance video, according to claim 3, wherein in the step 3), the atmospheric transmittance is calculated once in 10 seconds.