CN110349099B

CN110349099B - Complex scene video shadow detection and elimination method

Info

Publication number: CN110349099B
Application number: CN201910523329.9A
Authority: CN
Inventors: 肖春霞; 吴文君
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2019-06-17
Filing date: 2019-06-17
Publication date: 2021-04-02
Anticipated expiration: 2039-06-17
Also published as: CN110349099A

Abstract

The invention discloses a video shadow detection and elimination method based on depth information, which comprises the steps of firstly estimating normal information and point cloud position information of each pixel point by utilizing the depth information of an image, estimating a shadow confidence value of each pixel point by comparing the characteristic similarity between each pixel point and a space-time local neighborhood pixel point in a video stream, further optimizing the shadow confidence by utilizing a Laplace operator to obtain a final shadow detection result, and finally constructing an illumination recovery optimization equation based on the video stream by utilizing the shadow detection result to obtain a final shadow elimination result. The invention has the following advantages: the texture filtering is utilized to effectively reduce the interference of texture information on shadow detection, the Laplace operator is utilized to optimize the initial shadow confidence coefficient to obtain a more perfect shadow detection result, the chromaticity constraint and the correlation of the previous frame and the next frame are utilized to eliminate the shadow, and the chromaticity invariance and the interframe continuity of the result can be effectively ensured.

Description

Complex scene video shadow detection and elimination method

Technical Field

The invention belongs to the technical field of video processing, and particularly relates to a method for detecting and eliminating video shadows of a complex scene.

Background

Shadows are common natural phenomena in our daily lives, and they can provide important information for understanding of visual scenes, such as lighting environment, scene geometry, and the like. The information plays an important role in illumination analysis, relighting, augmented reality and other applications. Therefore, effectively detecting and eliminating shadows is an important topic in the field of computer vision. However, it is a very difficult task to automatically detect and eliminate the shadow, which is not only affected by the local texture material information, but also needs to consider the global structure information and the illumination environment information in the scene. Most of the existing shadow detection and elimination algorithms are used for detecting and classifying shadows based on local chrominance information, gradient information and the like, and global structure information is not considered, so that the algorithms cannot effectively process complex shadows and shadows in complex scenes.

The shadow processing work in the complex scene means that in the complex environment, the shadow is automatically detected and eliminated by utilizing the global information and the local information, and meanwhile, the shading gradual change information in the scene is kept in the shadow elimination result, so that the visual distortion is prevented. The reason for the difficulty in eliminating the shadow of the complex scene mainly has two aspects, firstly, the texture information of the material in the complex scene is rich, the shadow distribution is disordered and not concentrated, the difficulty is increased for the detection work, even with the help of manual interaction priori knowledge, the labeling burden can be increased for the complex shadow scene, and the efficient batch processing is difficult to perform; secondly, the shadow image of the complex scene lacks a corresponding data set due to difficult labeling, and the shadow in the large scene is difficult to eliminate by utilizing a deep learning method. Aiming at the problems, the algorithm provides an automatic detection and elimination algorithm of the shadow of the complex scene based on the image depth information, the algorithm does not need manual interaction and the depth information of the collected image, and the shadow in the complex scene can be detected and eliminated by utilizing the image depth information estimated by the existing image depth estimation algorithm.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a depth information-based complex scene video shadow detection and elimination method.

The technical scheme of the invention is a complex scene video shadow detection and elimination method, which comprises the following steps:

step 1, for an input video stream V, obtaining depth information of the input video stream V;

step 2, inputting a video frame I into each frame, and performing filtering processing by using a texture filtering operator to reduce texture influence and simultaneously keep shadow information in the video frame;

step 3, for each filtered video frame T_iSelecting adjacent related video streams, finding out the initial shadow confidence and the brightness confidence of each pixel point in the video frames, and optimizing the shadow confidence of each frame to obtain a final video shadow detection result;

step 4, further calculating the confidence coefficient of the shadow boundary by using the total variation and the inherent variation of the shadow confidence coefficient and the brightness confidence coefficient;

step 5, after obtaining the shadow detection result of each frame, decomposing the current frame image I into an image F without shadow and a shadow factor beta by using a shadow image model beta as I/F, and constructing a shadow removal optimization equation to constrain and optimize each frame;

and 6, carrying out iterative optimization solution on the shadow removing optimization equation to obtain a final video shadow elimination result F and a shadow factor beta.

Further, the specific implementation of step 3 includes the following sub-steps:

step 3.1, performing point cloud estimation by using depth information of each video frame and combining camera parameters to obtain point cloud information of each pixel point, constructing a k-d tree by using the point cloud information, finding a plurality of point clouds which are most similar to the point cloud of each pixel point, and calculating normal information of a space tangent plane where the pixel point is located by using the similar point clouds;

step 3.2, for each filtered video frame T_iCalculating the point q ∈ R of each pixel point p and the point q ∈ R in the spatial neighborhood by utilizing the Gaussian similarity_pThe chroma similarity, the spatial distance similarity and the normal similarity of the three kinds of similarity are multiplied to obtain the final characteristic similarity alpha_pq(ii) a Wherein R is_pIs the spatial neighborhood of the pixel point p;

step 3.3, comparing each pixel p with all neighbor pixels q e R in the spatial neighborhood by utilizing the similarity between the pixels_pThe weighted average value of the image intensity estimates the shadow confidence of each pixel point

And confidence of brightness

Wherein, the intensity weighted average value of the neighborhood where the p point is located

I_pAnd I_qRespectively representing the intensities of pixels p and q, sigma being an adjustable parameter, | R_pI denotes the neighborhoodDomain range R_pThe number of the pixel points in (1);

step 3.4, in each video frame, utilizing a laplacian operator, and combining the results of the initial shadow confidence coefficient and the brightness confidence coefficient calculated in the step 3.3, constructing an optimization equation to obtain a final shadow detection result S:

wherein, the first two terms are data constraint terms, and the third term is a smoothing term; n is the number of pixel points in the image, S_kIs the shadow confidence optimization result of the kth pixel point, omega_kIs the local window where the kth pixel point is located, S_iAnd S_jIs window omega_kThe shadow confidence degree optimization results corresponding to the two pixel points i and j in the window are used for the pixel points w in the smooth window_ijIs the matching Laplacian value of the i and j points in the neighborhood.

Further, the confidence of the shadow boundary calculated in step 4 is formulated as,

wherein,

and

respectively representing the total variation and the inherent variation of the p-point confidence coefficient graph, wherein the epsilon is a constant;

wherein R (p) is a rectangular neighborhood with p as the center point,

representing a weight function defined by gaussian filtering,

and

q-point shadow confidence and luminance confidence, respectively, theta is the partial derivative sign,

representing confidence in shading

Or confidence of brightness

A partial derivation in the x or y direction is performed.

Further, the de-shadow optimization equation in step 5 is,

E(F,β)＝E_data(F,β)+λ₁E_smooth(F,β)

+λ₂E_chromaticity(β)+λ₃E_const(β)

wherein the data item E_data＝ω_iw∑_c∈{R,G,B}ω_c·|I_c-F_c·β_c|²For constraining each data item of the current frame, using the shadow model to process the data I under different color channels_c，F_c，β_cMake a constraint in which_R,ω_G,ω_BThe constraint weight is the constraint weight of each color channel of RGB; pixel intensity weight ω_iw＝1-ω_intensity1- | I (x) |, where ω is_intensityFor adjustable parameters, i (x) is the pixel intensity of pixel point x;

smoothing term E_smooth＝E_SF+γE_SMWherein gamma is a balance factor, E_SFThe deblocked image F is subjected to a smoothing constraint based on the following assumptions: on the same spatial plane, the pixel points with similar chrominance information, normal information and three-dimensional point cloud position information should have similar pixel values (color values) after the shadow is eliminated,

the first term is smooth constraint between adjacent pixel points of the current frame, the second term utilizes non-shadow pixel points with similar characteristics in the video stream to constrain shadow pixels, and R is_sFor shadow pixel points obtained by shadow detection in the current frame,

representing a point set of all non-shadow pixel points in a T frame in a spatial-temporal local neighborhood, wherein T is the total frame number in the current video stream;

E_SMusing estimated shadow boundary confidence C_boundThe shading factor β is smoothly constrained:

E_chromaticity(β)＝||c(p)-c_F(p)||²carrying out chroma consistent constraint on the original video frame and the video frame with the shadow eliminated by using the assumption that the chroma of the image is not influenced by illumination change, wherein c is the chroma of the current frame I, and c is the chroma of the current frame I_FThe chromaticity which is the shadow elimination result F;

the color of the pixel point of the image non-shadow area after the assumed shadow is eliminated is kept unchanged, namely the shadow factor approaches to 1, and the non-shadow area N is subjected to_bConstraint is carried out, and the non-shadow pixel area is pixels except all shadow points and neighbor pixel points thereofAnd (4) point collection, wherein the shadow pixel points are points with confidence degrees larger than 0.1.

Further, in step 6, an iterative optimization method is adopted to solve, the initial value of F is I, the initial value of β is a shadow confidence coefficient S, and a final result is calculated through iterative optimization, wherein the maximum iteration number is 1000.

Further, in the step 1, for the video shot on site, the Kinect V2 is used for collecting the depth information of the video in real time; for the existing video, the depth information of each frame of the video is estimated by using a deep learning method.

The invention has the following advantages: 1. the method carries out texture filtering pretreatment on each frame when estimating the shadow of the video frame, and can effectively reduce the interference of texture information on shadow detection; 2. according to the method, the initial shadow confidence coefficient is optimized by using the Laplace operator, so that a more complete shadow detection result is obtained, and meanwhile, the relative strength of shadow information and the gradient information of a shadow boundary are kept, and the elimination of a complex shadow and a shadow boundary is facilitated; 3. the shadow elimination optimization algorithm of the invention utilizes the principle that the chroma information of the image is kept unchanged before and after the shadow elimination to restrain the chroma information and ensure the invariance of the result chroma; 4, the shadow elimination algorithm of the invention fully utilizes the correlation of the previous and the next frames, and can effectively ensure the inter-frame continuity of the result while eliminating the shadow.

Drawings

FIG. 1 is a flow chart of video shadow removal of the present invention.

FIG. 2 is a flow chart of video shadow detection of the present invention.

FIG. 3 is a diagram of the effects of an example of the invention processing video. The method comprises the steps of (a) inputting a video stream, (b) depth information corresponding to the video stream, (c) a shadow confidence estimation result of the video stream, (d) an optimized shadow detection result (confidence) of the video stream, and (e) a video stream result after shadow removal.

Detailed Description

The present invention will be described in further detail below with reference to examples of implementation and the accompanying drawings, but the embodiments of the present invention are not limited thereto.

Referring to fig. 1, a flow chart of the present invention, a video shadow elimination method, includes the following steps:

step 1, for an input video stream V, acquiring depth information thereof: for a video shot on site, acquiring depth information of the video in real time by using Kinect V2; for the existing video, the depth information of each frame of the video is estimated by using a deep learning method. As shown in fig. 3(a), (b), which are an input video frame and a corresponding depth map in the example, respectively.

And 2, filtering each frame I in the video by using a texture filtering operator, reducing the influence of small-scale texture on shadow detection, and simultaneously keeping original shadow information.

Step 3, for each filtered video frame T_iSelect its corresponding associated video stream { T }_i-2,T_i-1,T_i,T_i+1,T_i+2Finding out the initial shadow confidence and brightness confidence of each pixel point in the video frame, and optimizing the shadow confidence of each frame to obtain the final video shadow detection result. The step 3 comprises the following steps:

step 3.1, utilizing the depth information of each video frame and combining camera parameters to carry out point cloud estimation; after the point cloud information of each pixel point is obtained, a k-d tree is constructed by using the point cloud information, the most similar 300 point clouds are found for the point clouds of each pixel, and the normal information of the space tangent plane where the pixel point is located is calculated by using the similar point clouds.

Step 3.2, as shown in the flow chart of fig. 2 for shadow detection, for each filtered video frame T_iFinding out the space-time local neighborhood pixel point q E R corresponding to each pixel point p in the related video stream_pAnd calculating the chroma similarity, the spatial distance similarity and the normal similarity of the pixel point p and all the neighbor points q, and multiplying the three similarities to obtain the final characteristic similarity alpha_pq. Wherein R is_pWhich is a spatio-temporal neighborhood of pixel point p, typically a 50 x 5 spatio-temporal pixel block.

Step 3.3, comparing each pixel point p with all neighbor pixel points in the spatial neighborhood thereof by utilizing the similarity between the pixel pointsq∈R_pThe weighted average value m (p) of the image intensity, and the shadow confidence coefficient of each pixel point is estimated

And confidence of brightness

Fig. 3(c) shows the initial shadow confidence estimation result corresponding to each frame in the example.

Step 3.4, optimization equation is utilized

And obtaining the final optimization result of the shadow confidence coefficient S. Wherein, the first two terms are data constraint terms, and the third term is a smoothing term; n is the number of pixel points in the image, S_kIs the shadow confidence optimization result of the kth pixel point, omega_kIs the local window, s, where the k-th pixel is located_iAnd s_jIs window omega_kThe shadow confidence degree optimization results corresponding to the two pixel points i and j in the window are used for the pixel points w in the smooth window_ijIs the matching Laplacian value of the i and j points in the neighborhood. In each video frame, the result of the shadow detection is constrained by using a Laplacian operator in combination with the result of the initial shadow confidence and the brightness confidence calculated in step 3.3, and gradient correlation smoothing is performed by using a Laplacian matting operator (matting Laplacian) to obtain an optimized shadow detection result. Example results are shown in FIG. 3(d), which is the final shadow confidence optimization result.

Step 4, utilizing total variation of shadow confidence coefficient and brightness confidence coefficient

And inherent variation of

The shadow boundary confidence is further calculated:

wherein,

and

for total variation and inherent variation, respectively, ∈ is usually set to 0.001 in the experiment, and in order to prevent the denominator from being 0:

where R (p) is a 7 × 7 rectangular neighborhood with p as the center point,

representing a weight function defined by gaussian filtering,

and

q-point shadow confidence and luminance confidence respectively,

representing confidence in shading

Or confidence of brightness

Performing partial derivatives in the x or y direction, total variation

The inherent variation is obtained by taking the absolute value of the partial derivative result and then carrying out Gaussian filtering

The method is obtained by performing Gaussian filtering on the partial derivative result and then taking an absolute value.

And 5, after obtaining the shadow detection result of each frame, decomposing the current frame image I into an image F without shadow and a shadow factor beta by using a shadow image model beta as I/F, wherein the two are unknown quantities. The invention constructs the following optimization equation, and restrains and optimizes each frame based on the video stream to obtain the final video shadow elimination result F and the shadow factor beta.

E(F,β)＝E_data(F,β)+λ₁E_smooth(F,β)

+λ₂E_chromaticity(β)+λ₃E_const(β)

In experimental operation, we will generally refer to the parameter λ₁，λ₂And λ₃Set to 1, 0.5 and 1, respectively.

Data item E_data＝ω_iw∑_c∈{R,G,B}ω_c·|I_c-F_c·β_c|²For constraining each data item of the current frame, using the shadow model to process the data I under different color channels_c，F_c，β_cAnd (6) carrying out constraint. Wherein, ω is_cThe weight is restricted for three color channels of RGB, and the weight of each channel is { omega_R,ω_G,ω_B}＝{0.299,0.587,0.144}。ω_iwIs a weight related to the intensity of the pixel,ω_iw＝1-ω_intensity1- | I (x) |, where ω is_intensityFor the adjustable parameter, i (x) is the pixel intensity of pixel point x.

Smoothing term E_smooth＝E_SF+γE_SMWhere γ is the balance factor, and is typically set to 1 in experimental work. Based on the assumptions: on the same spatial plane, pixel points with similar chrominance information, normal information and three-dimensional point cloud position information should have similar pixel values (color values) after the shadow is eliminated. And carrying out smooth constraint on the image F after the shadow is removed:

and representing the point set of all non-shaded pixel points in the T frame in the spatial-temporal local neighborhood, wherein T is the total frame number in the current video stream.

E_SM＝∑_p∈I(1-|C_bound(p)|)∑_q∈Rp||β_p-β_q||²Using estimated shadow boundary confidence C_boundThe shading factor beta is smoothly constrained.

E_chromaticity(β)＝||c(p)-c_F(p)||²Carrying out chroma consistent constraint on the original video frame and the video frame with the shadow eliminated by using the assumption that the chroma of the image is not influenced by illumination change, wherein c is the chroma of the current frame I, and c is the chroma of the current frame I_FThe chromaticity of the shadow elimination result F.

The color of the pixel points in the non-shadow area of the image after the assumed shadow is eliminated is kept unchanged,i.e. the shading factor approaches 1, for the non-shaded area N_bAnd (4) carrying out constraint, wherein the shadow pixel points are points with confidence coefficient greater than 0.1, and the non-shadow pixel areas are pixel point sets except all shadow points and neighbor pixel points thereof.

And 6, because the equation contains two unknowns of the image F without shadow and the shadow factor beta, the algorithm adopts an iterative optimization method to solve, the initial value of the F is I, the initial value of the beta is the shadow confidence coefficient S, the final result is calculated through iterative optimization, and the maximum iteration frequency is 1000. As shown in fig. 3(e), the shadow elimination result of the example is shown.

The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims

1. A method for detecting and eliminating complex scene video shadow is characterized by comprising the following steps:

the de-shadowing optimization equation in step 5 is,

E(F，β)＝E_data(F，β)+λ₁E_smooth(F，β)+λ₂E_chromaticity(β)+λ₃E_const(β)

wherein the data item E_data＝ω_iw∑_{c∈{R，G，B}}ω_c·|I_c-F_c·β_c|²For constraining each data item of the current frame, using the shadow model to process the data I under different color channels_c，F_c，β_cMake a constraint in which_R，ω_G，ω_BThe constraint weight is the constraint weight of each color channel of RGB; pixel intensity weight ω_iw＝1-ω_intensity1- | I (x) |, where ω is_intensityFor adjustable parameters, i (x) is the pixel intensity of pixel point x;

smoothing term E_smooth＝E_SF+γE_SMWherein gamma is a balance factor, E_SFThe deblocked image F is subjected to a smoothing constraint based on the following assumptions: on the same spatial plane, the pixel points with similar chrominance information, normal information and three-dimensional point cloud position information should have similar pixel values after the shadow is eliminated,

E_SMusing estimated shadow edgesConfidence of boundary C_boundThe shading factor β is smoothly constrained:

the color of the pixel point of the image non-shadow area after the assumed shadow is eliminated is kept unchanged, namely the shadow factor approaches to 1, and the non-shadow area N is subjected to_bConstraint is carried out, the non-shadow pixel area is a pixel point set except all shadow points and neighbor pixel points thereof, and the shadow pixel points are points with confidence coefficient greater than 0.1;

2. The method as claimed in claim 1, wherein the method comprises: the specific implementation of the step 3 comprises the following substeps:

step 3.2, for each filtered video frame T_iCalculating the point q ∈ R of each pixel point p and the point q ∈ R in the spatial neighborhood by utilizing the Gaussian similarity_pThe chroma similarity, the spatial distance similarity and the normal similarity of the color data, and thenMultiplying the three similarity degrees to obtain the final feature similarity degree alpha_pq(ii) a Wherein R is_pIs the spatial neighborhood of the pixel point p;

And confidence of brightness

I_pAnd I_qRespectively representing the intensities of pixels p and q, sigma being an adjustable parameter, | R_pI represents the neighborhood region R_pThe number of the pixel points in (1);

wherein, the first two terms are data constraint terms, and the third term is a smoothing term; n is a pixel in the imageNumber of dots, S_kIs the shadow confidence optimization result of the kth pixel point, omega_kIs the local window where the kth pixel point is located, S_iAnd S_jIs window omega_kThe shadow confidence degree optimization results corresponding to the two pixel points i and j in the window are used for the pixel points w in the smooth window_ijIs the matching Laplacian value of the i and j points in the neighborhood.

3. The method as claimed in claim 2, wherein the method comprises the steps of: the formula for calculating the confidence of the shadow boundary in step 4 is,

wherein,

and

wherein R (p) is a rectangular neighborhood with p as the center point,

representing a weight function defined by gaussian filtering,

and

q-point shadow confidence and luminance confidence respectively,

is a sign of the partial derivative,

representing confidence in shading

Or confidence of brightness

A partial derivation in the x or y direction is performed.

4. The method as claimed in claim 3, wherein the method comprises the steps of: and 6, solving by adopting an iterative optimization method, setting the initial value of F as I and the initial value of beta as a shadow confidence coefficient S, and calculating a final result through iterative optimization, wherein the maximum iteration number is 1000.

5. The method as claimed in claim 1, wherein the method comprises: in the step 1, for a video shot on site, acquiring depth information of the video in real time by using Kinect V2; for the existing video, the depth information of each frame of the video is estimated by using a deep learning method.