CN117078552A

CN117078552A - Mist synthesis method based on scattering model and depth estimation

Info

Publication number: CN117078552A
Application number: CN202311079480.0A
Authority: CN
Inventors: 尚鹏辉; 吴疆; 刘述; 艾百运; 祁凌云; 潘志国; 鲍志勇
Original assignee: Hangzhou Zhiyuan Research Institute Co ltd
Current assignee: Hangzhou Zhiyuan Research Institute Co ltd
Priority date: 2023-08-25
Filing date: 2023-08-25
Publication date: 2023-11-17

Abstract

The application relates to a fog synthesis method based on scattering model and depth estimation, which comprises the steps of obtaining an original RGB image, constructing a continuous random field and modeling unitary potential and paired potential related to scene depth information; constructing a convolutional neural network, and solving unitary potential and paired potential of a continuous random field to obtain a scene depth map; using the scene depth map as input, and generating scene fog distribution through conversion of an atmospheric scattering model; and performing superposition coding by using the scene fog distribution and the original RGB image to obtain a final foggy image. The beneficial effects of the application are as follows: the application has the characteristics of vivid mist synthesis effect, high automation degree, no dependence on a depth camera and lower requirement on hardware. The depth can be deduced from any natural scene image by using the method, and the foggy image can be acquired, so that the method can be widely applied to image defogging research and other related applications.

Description

Mist synthesis method based on scattering model and depth estimation

Technical Field

The application relates to the field of image information processing, in particular to a fog synthesis method based on a scattering model and depth estimation.

Background

Image defogging is an important problem in the field of computer vision, the purpose of which is to recover a sharp image from an image affected by fog. In real scenes, fog is a widely occurring phenomenon that can cause image distortion, affecting visual effects and accuracy of image analysis applications. Therefore, image defogging studies are of great importance for improving image quality, improving visual effects, and enhancing the accuracy of image analysis applications.

With the development of deep learning technology, various neural networks have become reliable solutions for various complex visual problems with excellent feature extraction and representation capabilities. The method based on deep learning has important roles in the field of image defogging research, but one important problem faced by the method of deep learning is that the data used for image defogging research is less and is subject to natural conditions, and paired foggy images and clear images under the same scene and the same illumination condition are difficult to obtain, so that the artificial synthesis of foggy data is the basis of the development of an image defogging algorithm based on deep learning.

Currently, there are works mainly to solve clear images from foggy images using imaging models of foggy days. Koschmieder proposes that the low visibility of the fog image is caused by absorption and scattering of light by airborne particles in the atmosphere. McCartney et al propose that scattering of particles causes attenuation of light during transmission between the target and the camera, adding a layer of atmospheric scattered light. Aiming at the problem of low visibility in foggy days, narasimhan et al explain the imaging process of foggy images and various influencing factors by establishing a mathematical model, the model demonstrates that the main reason of degradation of the imaging effect of a detection system under a strong scattering medium is that a great deal of research on defogging of images is carried out according to the foggy-day imaging model, but because image depth information is difficult to acquire, the current synthetic foggy method mostly depends on a specific depth camera or a binocular camera based on a parallax principle, and the research on foggy synthesis by using a single image is still less.

CN109410135a proposes an image defogging and fogging method based on a generated countermeasure network, which can simulate fog on a clear image, but the obtained foggy image may have a problem of inaccuracy because it does not depend on depth information. CN115100408A proposes a method for constructing a foggy dataset for a sea scene, which can use two images with different view angles to complete the estimation of depth according to the parallax principle so as to synthesize foggy, and the method has the advantage of more accurate depth information acquisition, but has the disadvantages of requiring two images with left and right view angles in the same scene, having higher requirements on image acquisition equipment, and being not convenient and efficient.

Disclosure of Invention

The application aims at overcoming the defects of the prior art, and provides a fog synthesis method based on a scattering model and depth estimation.

In a first aspect, a method for synthesizing fog based on a scattering model and depth estimation is provided, including:

s1, acquiring an original RGB image, constructing a continuous random field and modeling unitary potential and paired potential related to scene depth information of the continuous random field;

s2, constructing a convolutional neural network, and solving unitary potential and paired potential of the continuous random field to obtain a scene depth map;

s3, using the scene depth map as input, and generating scene fog distribution through conversion of an atmospheric scattering model;

s4, performing superposition coding by using the scene fog distribution and the original RGB image to obtain a final foggy image.

Preferably, S1 includes:

s101 for any one RGB image I,dividing it into n pixel blocks whose depth d can be determined by To represent;

s102, let W represent an energy function, then modeling the conditional probability distribution of the data by using the following density function:

wherein S represents a segmentation function, expressed as follows:

S(I)＝∫ _d exp{-W(d,I)}dd

the depth estimation problem of the whole graph is converted into a maximum posterior probability reasoning problem:

d ^* ＝argmax _d (Pr(d∣I))

wherein d ^* Representing the final calculated pixel block depth, argmax _d Let d take the maximum value and Pr the probability;

s103, an energy function W is expressed as follows:

wherein U represents a unitary potential for regressing the depth value of a single pixel block; v represents the pair potential for assigning the same depth to similar adjacent pixel blocks; n represents a set of pixel blocks, a, b represent two adjacent pixel blocks;representing a set of all pairs of pixel blocks.

Preferably, in S103, the unitary potential U is represented as follows:

wherein z is _a Representing the depth of the pixel block a after regression, and θ represents a learnable regression parameter;

the pairing potential V is represented as follows:

wherein R is _ab Is the output of the paired parts of the convolutional neural network for two adjacent pixel blocks a and b, calculated using the fully connected layer, β representing the trainable fully connected layer parameters:

wherein M is ^k Is the kth similarity matrix, representing the similarity between the kth pair of image blocks; the similarity calculation function is expressed as follows:

wherein,and->Represented are specific values of pixel blocks a and b in three investigation dimensions, color histogram distribution differences, pixel level color differences, and texture differences calculated from a local binary pattern,/for each of the three investigation dimensions>Represented by the two norms of the difference between the two pixel blocks.

Preferably, S2 includes:

s201, constructing a convolutional neural network, wherein the convolutional neural network consists of 5 convolutional layers and 3 full-connection layers, a rectifying linear unit is used as an activation function of the five convolutional layers, and an exponential logic function is used as an activation function of the full-connection layers; the last full connection layer is used for model integration, and the output of the convolutional neural network is the depth value of each pixel block;

s202, parameter learning of a convolutional neural network is carried out, the whole energy function and probability distribution function are rewritten, negative condition log likelihood of training data is minimized in the training process of the neural network, and a final optimization function is obtained;

s203, solving the depth of each pixel from the single RGB image by using a training optimization mode.

Preferably, in S3, the atmospheric scattering model is expressed as:

H(p)＝C(p)r(p)+A(1-r(p))

where p= (x, y) is the coordinate vector of a certain pixel position in the image, H (p) is the haze image obtained by the detection system, C (p) is the haze-free image to be recovered, a represents the atmospheric light, and r (p) is the propagation map corresponding to the scene depth.

Preferably, in S3, the propagation map corresponding to the scene depth is expressed as:

r(p)＝e ^-βd(p)

where e is the base of the natural logarithm, β represents the scattering coefficient, d (p) represents the scene depth at a pixel p;

let c denote the color channel, c e { r, g, b }, the atmospheric scattering model for the color channel is expressed as follows:

I _c (p)＝C _c (p)r(p)+A _c (1-r(p))

with this model, the attenuated target object reflection irradiance and scene atmospheric scattered light values are calculated using only the scattering parameters and atmospheric light, knowing the sharp image and scene depth.

Preferably, in S4, the image is encoded in the sRGB color space and the linear intensity values in the rendering model are encoded into the values of the digital image in a non-linear manner, the rules for the sRGB for the color encoding of the sub-channels are as follows:

wherein c _linear Is a linear channel intensity value, c _encoded Is the encoded sRGB value; first obtain [0-1 ]]The encoded values are then mapped into 8-bit digital codes to obtain the final hazy image.

In a second aspect, a fog synthesis system based on a scattering model and depth estimation is provided, which is characterized in that the fog synthesis method based on the scattering model and depth estimation in any one of the first aspects is performed, and the method comprises:

the acquisition module is used for acquiring an original RGB image, constructing a continuous random field and modeling unitary potential and paired potential related to scene depth information of the continuous random field;

the construction module is used for constructing a convolutional neural network, solving the unitary potential and the paired potential of the continuous random field, and obtaining a scene depth map;

the generating module is used for generating scene fog distribution through conversion of an atmospheric scattering model by using the scene depth map as input;

and the encoding module is used for performing superposition encoding by using the scene fog distribution and the original RGB image to obtain a final foggy image.

In a third aspect, a computer storage medium is provided, wherein a computer program is stored in the computer storage medium; the computer program, when run on a computer, causes the computer to perform the method of mist synthesis based on a scattering model and depth estimation as described in any of the first aspects.

The beneficial effects of the application are as follows: the application has the characteristics of vivid mist synthesis effect, high automation degree, no dependence on a depth camera and lower requirement on hardware. The depth can be deduced from any natural scene image by using the method, and the foggy image can be acquired, so that the method can be widely applied to image defogging research and other related applications.

Drawings

FIG. 1 is a flow chart of a method for synthesizing mist based on a scattering model and depth estimation in an example of the application;

FIG. 2 is a visual block diagram of a convolutional neural network used in an example of the present application;

FIG. 3 is a schematic representation of a foggy imaging model used in an example of the application;

FIG. 4 is an original sharp image;

FIG. 5 is a scene depth map calculated according to FIG. 4;

FIG. 6 is a graph of the results of the foggy image synthesized according to FIGS. 4 and 5;

FIG. 7 is another original sharp image;

FIG. 8 is a scene depth map calculated according to FIG. 7;

FIG. 9 is a graph of the results of the foggy image synthesized according to FIGS. 7 and 8;

FIG. 10 is yet another original sharp image;

FIG. 11 is a scene depth map calculated according to FIG. 10;

fig. 12 is a graph showing the result of the foggy image synthesized according to fig. 10 and 11.

Detailed Description

The application is further described below with reference to examples. The following examples are presented only to aid in the understanding of the application. It should be noted that it will be apparent to those skilled in the art that modifications can be made to the present application without departing from the principles of the application, and such modifications and adaptations are intended to be within the scope of the application as defined in the following claims.

Example 1:

in order to solve the problem that a pair of foggy-foggy images are difficult to acquire in image defogging research and improve the accuracy and the authenticity of foggy image synthesis, the application provides a foggy synthesis method based on a scattering model and depth estimation.

Specifically, as shown in fig. 1, the method provided by the present application includes:

s1, acquiring an original RGB image, and constructing a continuous random field to model unitary potential and paired potential related to scene depth information of the continuous random field as shown in FIG. 4, FIG. 7 or FIG. 10.

In S1, a continuous random field model is built for an input single RGB image. Pixels in the image that are close in position are merged and considered as blocks of pixels having the same depth. For all pixels within the same pixel block, the application is approximately considered flat and takes the depth of the centroid of the pixel block as the depth of the entire pixel block.

S1 comprises the following steps:

s101, for any RGB image I, dividing the RGB image I into n pixel blocks, wherein the depth d of the n pixel blocks can be determined by To represent; thus, the classical continuous conditional random field model can be relied upon.

wherein S represents a segmentation function, expressed as follows:

S(I)＝∫ _d exp{-W(d,I)}dd

d ^* ＝argmax _d (Pr(d∣I))

s103, an energy function W is expressed as follows:

wherein U represents unitary potential, and is used for carrying out regression on the depth value of a single pixel block so as to improve the accuracy of depth estimation; v represents paired potential, which is used for endowing similar adjacent pixel blocks with the same depth, thereby being beneficial to reducing the calculated amount; n denotes the set of pixel blocks and a, b denote two adjacent pixel blocks.

In S103, in order to solve the random field model using the convolutional neural network, it is necessary to analyze the potential function thereof. The output of the convolutional neural network can be used to construct a unitary potential with least squares loss, let z _p Representing the depth of the pixel block p after regression, the unitary potential U is represented as follows:

wherein z is _a Representing the depth of the regressed pixel block a; in this network, an input image is first divided into irregular pixel blocks of pixel blocks, then the sizes of all the pixel blocks are adjusted with the centroid of the pixel blocks as the center so that the pixel blocks have uniform regular sizes, and then all the pixel blocks are sent to a convolutional neural network. After the neural network structure, the depth value of each pixel block is finally output.

For the pairwise potential of the continuous random field, the pairwise potential is constructed by observing the similarity between pixel blocks, and consistency information such as the position and depth of adjacent pixel blocks is used to enhance the smoothness of the data.

The pairing potential V is represented as follows:

wherein R is _ab Is the output of the paired parts of the convolutional neural network for two adjacent pixel blocks a and b, calculated using the fully connected layer:

wherein M is ^k Is the kth similarity matrix, representing the similarity between the kth pair of image blocks; as for the measurement of the similarity, the similarity between image blocks will be examined here in three ways, including a color histogram distribution difference, a pixel-level color difference, and a texture difference calculated by a local binary pattern. The similarity calculation function is expressed as follows:

wherein,and->Represented are specific values of pixel blocks a and b in three investigation dimensions, color histogram distribution differences, pixel level color differences, and texture differences calculated from a local binary pattern,/for each of the three investigation dimensions>Represented by the two norms of the difference between the two pixel blocks. The application thus far converts the depth problem of RGB images into a solution problem of continuous random field unitary potential and paired potential.

S2, constructing a convolutional neural network, and solving the unitary potential and the paired potential of the continuous random field to obtain a scene depth map.

S3, using the scene depth map as input, and generating scene fog distribution through conversion of an atmospheric scattering model.

Example 2:

on the basis of embodiment 1, embodiment 2 of the present application provides a more specific mist synthesis method based on a scattering model and depth estimation, comprising:

s1, acquiring an original RGB image, and constructing a continuous random field to model unitary potential and paired potential related to scene depth information.

S2 comprises the following steps:

s201, constructing a convolutional neural network, as shown in FIG. 2, wherein the convolutional neural network consists of 5 convolutional layers and 3 fully-connected layers, a rectifying linear unit is used as an activation function of the five convolutional layers, and an exponential logic function is used as an activation function of the fully-connected layers; the last full connection layer is used for model integration, and the output of the convolutional neural network is the depth value of each pixel block;

s202, parameter learning of the convolutional neural network is carried out, the whole energy function and the probability distribution function are rewritten, negative condition log likelihood of training data is minimized in the training process of the neural network, and a final optimization function is obtained.

Specifically, after defining the unitary and paired potentials, the overall energy function can be expressed as follows:

the application uses L to represent regularized Laplace matrix, I represents unit matrix, R represents R _ab Composed neighbor matrix, D is R _ab Summing a resulting diagonal matrix.

Then there is

L＝I+D-R

To this end, the energy function can be expressed as:

because of the quadratic term of d and the positive qualitative of L in the energy function, the application can analyze and calculate the integral in the segmentation function:

the final segmentation function is:

the application can rewrite the probability distribution function as

In the training process of the neural network, the aim of the application is to minimize the negative condition log likelihood of training data, and the final optimization function can be expressed as:

wherein I is ⁽ⁱ⁾ ,d ⁽ⁱ⁾ Representing the ith training image and corresponding depth map, lambda ₁ And lambda (lambda) ₂ Representing the weight decay parameter, N represents the number of training images. The present application uses a back propagation strategy based on random gradient descent to train the entire neural network.

The method can combine the convolutional neural network and the conditional random field model, and solve the depth of each pixel from a single RGB image by using a training optimization mode, wherein the depth map is shown in figures 5, 8 and 11.

From the derivation of S202, the present application has converted the depth estimation problem of a single RGB image into the reasoning problem of the maximum posterior probability, and the regularized laplace matrix can be expressed as follows:

the application willFor the partial derivative of d set to 0, then there are:

this shows that the maximum a posteriori probability reasoning for the pixel block depth function is a closed-loop solution, which is ultimately expressed as follows:

d ^★ ＝L ^-1 z

the method can combine the convolutional neural network and the conditional random field model, and solve the depth of each pixel from a single RGB image by using a training optimization mode.

The atmospheric scattering model describes the imaging principle of foggy images, is the theoretical basis of many image defogging algorithms, and can restore the real distribution situation of foggy in a scene through the atmospheric scattering model by using depth information. According to the depth map obtained in the step S2 and the original RGB image, the fog distribution can be calculated.

The scattering of particulate matter, particularly micro water droplets, in the atmosphere is a major cause of mist. Whether visual observation or shooting of the obtained image, hazy scenes always have problems of reduced contrast and reduced field of view. In 1925, koschmieder proposed that the low visibility of fog images was caused by absorption and scattering of light by airborne particles in the atmosphere. In 1976, mcCartney et al proposed that scattering of particles caused light to attenuate during transmission between the target and the camera, adding a layer of atmospheric scattered light. In 1999, narasimhan et al explained the imaging process of fog images and various influencing factors by establishing mathematical models, with respect to the problem of low visibility in fog days. The model demonstrates the main reason for the degradation of the imaging effect of the detection system under the strong scattering medium. Ambient light such as sunlight is scattered by scattering media in the atmosphere to form background light, and the intensity of the background light is larger than that of the target light, so that an imaging result of the detection system is blurred.

The application regards a foggy image as the superposition of the reflected light of a photographed object subjected to foggy attenuation and the atmospheric light subjected to foggy scattering, and therefore an imaging model of foggy days can be represented by the following formula:

H(p)＝C(p)r(p)+A(1-r(p))

The propagation map corresponding to the scene depth is expressed as:

r(p)＝e ^-βd(p)

wherein, under medium or thicker fog conditions, the scattering coefficient can be regarded as a constant value in the visible spectrum, denoted as beta, and the scene depth at a certain pixel p as d (p);

I _c (p)＝C _c (p)r(p)+A _c (1-r(p))

In S4, the image is encoded in the sRGB color space and the linear intensity values in the rendering model are encoded into the values of the digital image in a non-linear manner, the rules for the color encoding of sRGB for the channels are as follows:

In this embodiment, the same or similar parts as those in embodiment 1 may be referred to each other, and will not be described in detail in the present disclosure.

Example 3:

on the basis of embodiments 1 and 2, embodiment 3 of the present application provides a mist synthesis system based on a scattering model and depth estimation, including:

Specifically, the system provided in this embodiment is a system corresponding to the method provided in embodiment 1, so that the portions in this embodiment that are the same as or similar to those in embodiment 1 may be referred to each other, and will not be described in detail in this disclosure.

Claims

1. A fog synthesis method based on a scattering model and depth estimation, comprising:

2. The method of mist synthesis based on scattering models and depth estimation according to claim 1, characterized in that S1 comprises:

s101, for any RGB image I, dividing the RGB image I into n pixel blocks, wherein the depth d of the n pixel blocks can be determined by To represent;

wherein S represents a segmentation function, expressed as follows:

S(I)＝∫ _d exp{-W(d,I)}dd

d ^* ＝argmax _d (Pr(d∣I))

s103, an energy function W is expressed as follows:

wherein U represents a unitary potential for regressing the depth value of a single pixel block; v represents the pair potential for assigning the same depth to similar adjacent pixel blocks; n denotes the set of pixel blocks, a, b denote two neighboring pixel blocks,representing a set of all pairs of pixel blocks.

3. The method of mist synthesis based on a scattering model and depth estimation according to claim 2, characterized in that in S103, the unitary potential U is represented as follows:

the pairing potential V is represented as follows:

wherein R is _ab Is a pair-wise pair of parts in a convolutional neural networkThe outputs of two adjacent pixel blocks a and b, calculated using the full connected layer, β represents the trainable full connected layer parameter:

4. The method of mist synthesis based on scattering models and depth estimation according to claim 2, characterized in that S2 comprises:

5. The method of mist synthesis based on a scattering model and depth estimation according to claim 4, wherein in S3, the atmospheric scattering model is expressed as:

H(p)＝C(p)r(p)+A(1-r(p))

6. The method of claim 5, wherein in S3, the propagation map corresponding to the scene depth is expressed as:

r(p)＝e ^-βd(p)

I _c (p)＝C _c (p)r(p)+A _c (1-r(p))

7. The fog synthesis method based on scattering model and depth estimation of claim 6, wherein in S4, the image is encoded in sRGB color space and the intensity values of linearity in the rendering model are encoded into the values of digital image in a nonlinear manner, the rules of sRGB for color coding of the channels are as follows:

8. A fog synthesis system based on a scattering model and depth estimation for performing the fog synthesis method of any one of claims 1 to 7, comprising:

9. A computer storage medium, wherein a computer program is stored in the computer storage medium; the computer program, when run on a computer, causes the computer to perform the method of mist synthesis based on a scattering model and depth estimation as claimed in any of the claims 1 to 7.