CN112750093A

CN112750093A - Video image defogging method based on time sequence label transmission

Info

Publication number: CN112750093A
Application number: CN202110095486.1A
Authority: CN
Inventors: 苏延召; 崔智高; 李爱华; 王涛; 姜柯; 蔡艳平; 王念
Original assignee: Rocket Force University of Engineering of PLA
Current assignee: Rocket Force University of Engineering of PLA
Priority date: 2021-01-25
Filing date: 2021-01-25
Publication date: 2021-05-04
Anticipated expiration: 2041-01-25
Also published as: CN112750093B

Abstract

The invention discloses a video image defogging method based on time sequence label transmission, which comprises the following steps: firstly, establishing a foggy video image training set; designing a defogging network to defogge the foggy video image in the single group of sample images; thirdly, calculating a pixel-level absolute value loss function of the defogged image; fourthly, updating the weight parameter set; fifthly, taking the next group of sample images, and circulating the steps from the second step to the fourth step until the training is stopped; sixthly, defogging the actual foggy image of the first frame of the test video; seventhly, defogging the actual foggy image of the current non-first frame of the test video; and eighthly, the step seven is circulated for multiple times, and the continuous defogging of the whole video image sequence is realized. The invention uses the defogging results adjacent to the time sequence as the prior, guides the subsequent video frames to defogg, and continuously updates the guide images, thereby realizing the continuous natural change of the defogged video in the time dimension, simultaneously keeping the images capable of accurately defogging in the space dimension, and outputting clear video sequences with consistent space and time.

Description

Video image defogging method based on time sequence label transmission

Technical Field

The invention belongs to the technical field of video image defogging, and particularly relates to a video image defogging method based on time sequence label transmission.

Background

The video image of gathering under the bad weather of fog, haze and so on can because the effect of atmosphere scattering appears the quality degradation phenomenon, makes image color grey white, and the contrast reduces, and the object characteristic is difficult to discern, not only makes the visual effect variation, and the image sight reduces, still can lead to the understanding of image content to appear the deviation, if can lead to unable pedestrian or the lane line that detects the distant place because the influence of haze in the automatic driving field, and then causes the traffic accident. The defogging of the video image means that the adverse effect of the aerosol in the air on the video image is reduced or even eliminated by using a specific method and means, and the image is kept clear in space and transits naturally in time. Unlike single image defogging, video image defogging emphasizes normal defogging of each image, and simultaneously requires that an image sequence cannot have obvious color jump. In addition, the haze video image contains more information than a single haze, haze indication can be provided through the change of continuous frames, and the haze video image is subjected to defogging processing to obtain a clear image.

Existing video defogging algorithms are generally classified into 3 types, the first type is that videos are regarded as a single-frame image set, and defogging processing on the whole video is achieved by independently defogging each frame of image. The defogging algorithm according to a single image is mainly divided into a method based on image enhancement, a method based on a physical model and a method based on deep learning. The video defogging method based on the single frame image only carries out independent calculation on each frame image in the processing process, ignores the interframe time sequence correlation, easily causes the generated video defogging result to be discontinuous, and causes the color and brightness of the adjacent interframe defogging images to be suddenly changed in the background area. The second method is based on the fusion technology to realize video defogging. The method firstly enhances the front background area of each frame of image in the video, and then obtains the defogging result by applying the fusion technology. However, the separation algorithm of the method has dependency on the foreground and background, and usually when haze is severe, the foreground and the background cannot be accurately separated, and in addition, the contrast between the foreground and the background is easily distorted due to the respective enhancement. The third category of methods typically first evaluates a generic defogging model, which is then applied to the defogging process of subsequent frames. The method based on the universal defogging model can improve the solving efficiency of the video defogging algorithm, however, various defogging models depend on different assumption conditions, such as unobvious change of background areas in the video or equal distribution of image fog of each frame of the video, and the assumption of the real video is not satisfied, so that the effect of the method needs to be further improved.

Disclosure of Invention

The technical problem to be solved by the present invention is to provide a video image defogging method based on time sequence label transmission, which uses the defogging results adjacent to the time sequence as a priori, guides the subsequent video frames to defogg, and continuously updates the guide images, thereby realizing continuous natural change of the defogged video in the time dimension, and simultaneously keeping the images capable of accurately defogging in the space dimension, outputting clear video sequences with consistent time and space, and facilitating popularization and use.

In order to solve the technical problems, the invention adopts the technical scheme that: a video image defogging method based on time sequence label transmission is characterized by comprising the following steps:

step one, establishing a foggy video image training set, wherein the process is as follows:

step 101, using a sharp video image data set of known depth { X }_iSynthesizing a fog video image training set { Y) with different haze degrees according to an atmospheric scattering model_iWhere i is a clear video picture frame number and i is 1,2A total number of frames of the clear video image and N is not less than 1000;

step 102, mixing (X)_j,Y_j+t,X_j+t) Forming a group of sample images, and constructing a fog video image training set { (X)_j,Y_j+t,X_j+t) In which, the foggy video image training set { (X)_j,Y_j+t,X_j+t) The number of sample image groups in the image is not less than 1000, t is a random positive integer not greater than 5, and j is a positive integer not greater than N-t;

step two, designing a defogging network to defogg the foggy video images in the single group of sample images, wherein the process is as follows:

step 201, extracting a first group of sample images from the training set of the foggy video images in the step one, overlapping the foggy video images in the group and the clear video images with frame numbers before, and sending the overlapped video images and the clear video images to an encoder to obtain a feature map f of an overlapped image size 1/8¹ ^/8The encoder comprises a plurality of downsampled convolution blocks, the convolution blocks comprising convolution layers, a batch normalization process, and an activation function;

step 202, converting the feature map f^1/8Sending the data to a feature extraction module to obtain an enhanced feature map

The feature extraction module comprises a plurality of feature extraction units, and each feature extraction unit comprises a plurality of residual error modules and a jump connection positioned behind the residual error modules;

the residual error module comprises convolution block superposition identity mapping;

step 203, the characteristic diagram f^1/8Deconvolution to obtain a deconvolution feature map f of the superimposed image size 1/4^1/4Will strengthen the feature map

Deconvolution to obtain a deconvolution enhanced feature map of the superimposed image size 1/4

Deconvoluting the feature map f^1/4And deconvolution enhanced feature maps

Deconvolution is carried out after superposition to obtain a deconvolution enhanced feature map of the superposed image size 1/2

Step 204, deconvoluting the characteristic graph f^1/4Deconvolution to obtain a deconvolution feature map f of the superimposed image size 1/2^1/2Deconvoluting the feature map f^1/2And deconvolution enhanced feature maps

Deconvoluting after superposition to obtain a defogged image I of the foggy video image in the group of sample images_d；

Step three, according to the formula

Calculating defogged image I_dWherein Q is the total number of pixels of the video image with fog, Q is the number of pixels of the video image with fog, and Q is 1, 2.

Defogged image I for fog video image in sample image group_dThe pixel value of the q-th pixel point in the sequence,

the pixel value of the q-th pixel point in the clear video image with the frame number behind in the group of sample images is shown, and z is not more than a positive integer of N-t;

step four, updating the weight parameter set: to remove the fog image I_dSending the pixel-level absolute value loss function L into an Adam optimizer, training and optimizing the defogging network in the step two, and updating a weight parameter set of the defogging network;

step five, taking a next group of sample images, taking the next group of sample images as a first group of sample images, and circulating the steps from step two to step four until the training reaches a preset training step number or the loss value does not decrease or the loss value is less than 0.001, stopping the training, at the moment, obtaining a final weight parameter set of the defogging network, and determining the final defogging network;

step six, defogging of an actual foggy image of a first frame of a test video;

step seven, defogging of the actual foggy image of the current non-first frame of the test video: the defogged image of the actual fogging image of the previous frame and the actual fogging image of the current non-first frame are used as input and sent into a final defogging network for forward reasoning to obtain the defogged image of the actual fogging image of the current non-first frame of the test video;

and step eight, the step seven is repeatedly circulated until the defogging of the whole video image sequence is completed, so that the continuous defogging of the whole video image sequence is realized.

The video image defogging method based on time sequence label transmission is characterized by comprising the following steps: in the fifth step, the preset training steps are 10000-20000.

The video image defogging method based on time sequence label transmission is characterized by comprising the following steps: in step 201, the encoder comprises at least three downsampled convolutional blocks; in step 202, the feature extraction module comprises 5-10 feature extraction units.

The video image defogging method based on time sequence label transmission is characterized by comprising the following steps: the set of clear video image data of known depth comprises a NYU image data set.

The video image defogging method based on time sequence label transmission is characterized by comprising the following steps: and step six, defogging the actual foggy image of the first frame of the test video by utilizing an artificial defogging mode or a single image defogging model.

Compared with the prior art, the invention has the following advantages:

1. the invention adopts the defogging image of the previous frame of the video sequence as a guide to defogge the image of the current frame, and continuously changes the guide defogging image, thereby ensuring the continuity of the defogging image of the video in the time dimension, realizing a clear video sequence with consistent space and time, and being convenient for popularization and use.

2. According to the method, the previous clear image, the current frame foggy image and the current frame clear image are combined into a group of sample images, a defogging network formed by an encoder and a feature extraction module is constructed by utilizing a deep neural network, and weight parameters of a model are obtained through optimization training, so that defogging enhancement of the video image is realized, and the method is reliable and stable and has a good using effect.

3. The method has simple steps, the deep learning model can effectively utilize the preorder defogging result to guide the defogging of the subsequent video image by synthesizing the training video and the structure of the time sequence training sample, and the method combines the time sequence priori knowledge and has the advantages of good image defogging effect, strong scene adaptability, and good defogging result space-time continuity and real-time performance; end-to-end video image defogging learning is carried out by utilizing the constructed training sample, so that the time sequence relation among video image sequences can be effectively learned, the method is suitable for defogging of videos under dynamic change scenes, and meanwhile, the time sequence range of the label images can be expanded, and continuous video defogging is realized; in order to further enhance the feature extraction capability of the encoder, feature enhancement is carried out through the continuous residual block with jump connection, so that the features of the foggy image and the time sequence label image can be effectively extracted, the extraction capability of the model on the time sequence features is enhanced, the defogging effect of the video image is improved, and the method is convenient to popularize and use.

In summary, the invention uses the defogging results adjacent to each other in time sequence as the prior, guides the subsequent video frames to defogg, and continuously updates the guide images, thereby realizing the continuous natural change of the defogged video in the time dimension, simultaneously keeping the images capable of accurately defogging in the space dimension, outputting clear video sequences with consistent time and space, and being convenient for popularization and use.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

FIG. 1 is a block diagram of the process flow of the present invention.

Detailed Description

As shown in fig. 1, a method for defogging a video image based on time-series tag delivery according to the present invention includes the following steps:

step 101, using a sharp video image data set of known depth { X }_iSynthesizing a fog video image training set { Y) with different haze degrees according to an atmospheric scattering model_iWherein i is a clear video image frame number and i is 1,2, N is a total frame number of the clear video image and N is not less than 1000;

in this embodiment, the known depth sharp video image dataset comprises a NYU image dataset.

it should be noted that, a clear image with a certain time interval is added as a time sequence constraint when a group of sample images are constructed, and a pre-sequence image which is defogged can be used as an additional guide during testing, so that the consistency of the video image in time sequence after defogging is maintained.

The characteristics areThe feature extraction module comprises a plurality of feature extraction units, and each feature extraction unit comprises a plurality of residual error modules and a jump connection positioned behind the residual error modules;

in this embodiment, in step 201, the encoder includes at least three downsampled convolution blocks; in step 202, the feature extraction module comprises 5-10 feature extraction units.

Deconvoluting the feature map f^1/4And deconvolution enhanced feature maps

Step three, according to the formula

Calculating defogged image I_dPixel level absolute value loss ofA loss function L, wherein Q is the total number of pixels of the foggy video image, Q is the number of pixels of the foggy video image, and Q is 1, 2., Q,

in the fifth step, the number of the preset training steps is 10000-20000.

Step six, defogging of an actual foggy image of a first frame of a test video;

in this embodiment, in step six, the actual foggy image of the first frame of the test video is defogged by using an artificial defogging mode or a single image defogging model.

When the method is used, the previous frame defogged image of the video sequence is used as a guide to defogg the current frame image, and the guide defogged image is continuously replaced, so that the continuity of the video defogged image on the time dimension is ensured, and a clear video sequence with consistent space-time is realized; the method comprises the steps of forming a group of sample images by using a previous clear image, a current frame foggy image and a current frame clear image, constructing a defogging network formed by an encoder and a feature extraction module by using a deep neural network, and obtaining weight parameters of a model through optimization training to realize defogging enhancement of a video image; through the construction of a synthetic training video and a time sequence training sample, the deep learning model can effectively utilize a preorder defogging result to guide the defogging of a subsequent video image, and the method combines time sequence priori knowledge and has the advantages of good image defogging effect, strong scene adaptability, and good defogging result space-time continuity and real-time performance; end-to-end video image defogging learning is carried out by utilizing the constructed training sample, so that the time sequence relation among video image sequences can be effectively learned, the method is suitable for defogging of videos under dynamic change scenes, and meanwhile, the time sequence range of the label images can be expanded, and continuous video defogging is realized; in order to further enhance the feature extraction capability of the encoder, the features of the foggy image and the time sequence label image can be effectively extracted by performing feature enhancement through the continuous residual block with jump connection, the extraction capability of the model on the time sequence features is enhanced, the defogging effect of the video image is improved, and the continuity and naturalness of the color and the contrast of the video defogged image are realized by guiding the transmission of the defogged image on the time sequence.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and all simple modifications, changes and equivalent structural changes made to the above embodiment according to the technical spirit of the present invention still fall within the protection scope of the technical solution of the present invention.

Claims

1. A video image defogging method based on time sequence label transmission is characterized by comprising the following steps:

step 201, extracting a first group of sample images from the training set of the foggy video images in the step one, overlapping the foggy video images in the group and the clear video images with frame numbers before, and sending the overlapped video images and the clear video images to an encoder to obtain a feature map f of an overlapped image size 1/8^1/8The encoder comprises a plurality of downsampled convolution blocks, the convolution blocks comprising convolution layers, a batch normalization process, and an activation function;

Deconvoluting the feature map f^1/4And deconvolution enhanced feature maps

Step three, according to the formula

the pixel value of the q-th pixel point in the clear video image with the frame number behind in the group of sample images, wherein z is not more than N-tA positive integer;

step six, defogging of an actual foggy image of a first frame of a test video;

2. The method of claim 1 for defogging a video image based on time sequence label delivery, wherein: in the fifth step, the preset training steps are 10000-20000.

3. The method of claim 1 for defogging a video image based on time sequence label delivery, wherein: in step 201, the encoder comprises at least three downsampled convolutional blocks; in step 202, the feature extraction module comprises 5-10 feature extraction units.

4. The method of claim 1 for defogging a video image based on time sequence label delivery, wherein: the set of clear video image data of known depth comprises a NYU image data set.

5. The method of claim 1 for defogging a video image based on time sequence label delivery, wherein: and step six, defogging the actual foggy image of the first frame of the test video by utilizing an artificial defogging mode or a single image defogging model.