CN112822479A

CN112822479A - Depth map generation method and device for 2D-3D video conversion

Info

Publication number: CN112822479A
Application number: CN202011628929.0A
Authority: CN
Inventors: 张现丰; 刘海军; 王璇章; 庄庄; 聂耳; 钱炫羲; 张雄飞
Original assignee: Beijing Hualu Media Information Technology Co ltd; China Hualu Group Co Ltd
Current assignee: Beijing Hualu Media Information Technology Co ltd; China Hualu Group Co Ltd
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2021-05-18

Abstract

The invention belongs to the technical field of video and multimedia signal processing, and particularly relates to a depth map generation method and a depth map generation device for 2D-3D video conversion, wherein the method comprises the following steps: step 1: processing the 2D video frame by frame to obtain an original video frame sequence; step 2: processing the video frame sequence in the step 1 by adopting a Gaussian background modeling method and a background subtraction method to obtain a foreground image and a background image; and step 3: acquiring a motion vector in the foreground image in the step 2 by adopting an optical flow method, and obtaining a foreground depth map according to the relation between the motion vector and the depth; and 4, step 4: performing depth assignment on the background image in the step 2 by extracting a vanishing line and a vanishing point of the background image by adopting a geometric perspective method to obtain a background depth map; and 5: and fusing the foreground depth map in the step 3 and the background depth map in the step 4 by adopting a depth fusion method to obtain a fusion depth map. The method not only improves the quality of the depth map, but also has wider application range.

Description

Depth map generation method and device for 2D-3D video conversion

Technical Field

The invention belongs to the technical field of video and multimedia signal processing, and particularly relates to a depth map generation method and device for 2D-3D video conversion.

Background

With the rapid development of computer technology and communication technology in this year, the change of scientific technology brings the change of covering the earth to people's life, in the aspect of communication technology, 3D stereoscopic video is characterized by clear and colorful transmitted pictures, wherein things can be drifted outside and can be deeply hidden, and the characteristic of strong spatial sense is deeply pursued by people, and the visual fatigue caused by traditional 2D television plane painting for many years brings incomparable visual enjoyment and strong visual impact to people, so that the 2D video is converted into the 3D video, which has important significance to the development and the transmission of the video.

One of the most important link issues for 2D-3D video conversion is the generation of a depth map. The depth map generation method can be divided into three types of manual, full-automatic and semi-automatic. 1) The method for manually generating the depth map needs to segment each frame in the two-dimensional video and then allocate a depth value to each segmented block, and the depth map obtained by the manual method has high accuracy, but consumes a large amount of manpower and time cost, and is not convenient for large-scale three-dimensional video generation. 2) The principle of the full-automatic depth map generation algorithm by utilizing a plurality of images is to convert a motion vector into a depth map, and the conversion is based on the assumption that an object which moves fast is close to a camera and an object which moves slow is far away from the camera; the overall quality of the depth map obtained by the depth estimation method based on the motion information loses some image information when a smooth image effect is obtained by using the use value filtering, so the effect is relatively poor; although the method can obtain richer depth information, the calculation amount is large, the conversion time consumption is large, and the real-time conversion is not facilitated. 3) The basic principle of the semi-automatic depth map generation algorithm is to divide a video into a plurality of video segments according to scene content, set edge frames between the segments as key frames, and set frames between two key frames as non-key frames.

In view of the above, the present invention provides a depth map generation method and apparatus for 2D-3D video conversion.

Disclosure of Invention

The invention aims to provide a depth map generation method and a depth map generation device for 2D-3D video conversion, which aim to solve the problem that the depth map generation method in the prior art is large in limitation and cannot be widely applied.

The invention provides a depth map generation method for 2D-3D video conversion, which comprises the following steps: step 1: processing the 2D video frame by frame to obtain an original video frame sequence; step 2: processing the video frame sequence by adopting a Gaussian background modeling method and a background subtraction method to obtain a foreground image and a background image; and step 3: acquiring a motion vector in the foreground image by adopting an optical flow method, and obtaining a foreground depth map according to the relation between the motion vector and the depth; and 4, step 4: performing depth assignment on the background image by extracting vanishing lines and vanishing points of the background image by adopting a geometric perspective method to obtain a background depth map; and 5: and fusing the foreground depth map and the background depth map by adopting a depth fusion method to obtain a fusion depth map.

As described above, in the depth map generating method for 2D-3D video conversion, it is further preferable that the step 2 specifically includes: step 2.1: calculating a background update rate based on the mean value and the variance of each pixel point of a current frame in a video frame sequence, and preliminarily separating background pixel points and foreground pixel points based on the background update rate to obtain a foreground separation image and a background separation image; step 2.2: carrying out Gaussian background reconstruction on the background separation image to obtain a background image; step 2.3: and acquiring a differential image of the background image and the original video frame, wherein a binarization image of the differential image is a foreground image.

In the depth map generating method for 2D-3D video conversion as described above, it is further preferable that in step 2.1, the calculation formula of the background update rate is:

wherein p is the background change rate, μ is the mean of the pixels in the original video frame, d is the variance of the pixels in the original video frame, e is a natural constant, and x is the abscissa of the pixels in the original video frame.

As described above, in the depth map generating method for 2D-3D video conversion, it is further preferable that, in step 2.2, gaussian background reconstruction is performed based on the following formula, specifically, the reconstruction formula is:

G_b,t(x,y)＝p*G_b,t-1(x,y)+(1-p)*G_t(x,y)

where p is the background update rate, G_t(x, y) is the point coordinate of the original video frame at time t, G_b,t(x, y) is the point coordinate of the updated background image at time t, G_b,t-1And (x, y) is the point coordinate of the updated background image at the time t-1.

As described above, in the depth map generating method for 2D-3D video conversion, it is further preferable that, in step 2.3, the difference image is obtained based on the following formula, specifically, the formula is:

D_k(x,y)＝|I_k(x,y)-G_b,t(x,y)|，

wherein D is_k(x, y) are the differential image coordinates, G_b,t(x, y) is the updated background image coordinates obtained in step 2.2, I_k(x, y) are the original video frame coordinates.

As described above, in the depth map generating method for 2D-3D video conversion, it is further preferable that step 3 specifically includes: step 3.1: calculating an optical flow motion vector according to an optical flow motion vector calculation formula, wherein the optical flow motion vector calculation formula is as follows:

V＝(A^TW²A)^-1A^TW²b，

wherein,

W＝diag(w₁,w₁,…,w₁),b＝[I_t1,I_t2,…,I_tn]^Tdiag is a diagonal matrix, A^TA transpose matrix of a, where n is an nth pixel included in a neighborhood of pixel (i, j), and n is 1,2,3 … n; v is a motion vector;

step 3.2: calculating a foreground depth value based on a depth information calculation formula; the foreground depth calculation formula is as follows:

wherein,

G_ffor foreground depth, λ is the depth adjustment factor, V (i, j)_xIs the motion vector of the pixel point (i, j) in the x-axis direction, V (i, j)_yIs the motion vector of pixel (i, j) in the y-axis direction.

As described above, in the depth map generating method for 2D-3D video conversion, preferably, the step 4 specifically includes:

step 4.1: performing edge detection on the video frame sequence obtained in the step 1 by using a Sobel operator to obtain a horizontal edge gradient map, a vertical edge gradient map, and an edge gradient map fused with the horizontal edge gradient map and the vertical edge gradient map;

step 4.2: calculating a gradient level threshold of the edge gradient map in the step 4.1, and obtaining an edge image based on the edge gradient map and the gradient level threshold, wherein the gradient of each pixel point in the edge image is greater than the gradient level threshold;

step 4.3: drawing a straight line in the k-b space for each pixel point in the edge image obtained in the step 4.2 based on a geometric perspective method; assigning values to the pixel points based on the number of straight lines passing through each pixel point; traversing k-b space, defining a maximum value point as a vanishing point, and defining straight lines around the vanishing point as vanishing lines;

step 4.4: calculating to obtain the depth value of the background image based on the depth value calculation formula of the line and point vanishing obtained in the step 4.3, wherein the depth value calculation formula is as follows:

G_b＝255-round(round(j*(|x_o-y_o|)/y_o)*255/(|x_o-y_o|))

wherein G is_bIs the background depth, x, of the pixel point between two adjacent blanking lines₀Is the abscissa, y, of the intersection of two adjacent vanishing lines₀J is a constant and round operation is rounding operation.

In the depth map generating method for 2D-3D video conversion as described above, it is further preferable that in step 5, the fusion formula of the depth fusion method is:

wherein G is_dFor the final depth map, I (x, y) is the pixel value of point (x, y), G_f(x, y) is the foreground depth of the pixel (x, y), G_b(x, y) is the background depth of pixel (x, y).

The invention also discloses a depth map generating device for 2D-3D video conversion, which is used for realizing the depth map generating method for 2D-3D video conversion, and comprises the following steps: the video acquisition module is used for processing the 2D video frame by frame to obtain an original video frame sequence; the pixel separation module is used for processing the video frame sequence obtained by the video acquisition module by adopting a Gaussian background modeling method and a background subtraction method to obtain a foreground image and a background image; the foreground depth calculation module is used for acquiring the motion vector in the foreground image acquired by the pixel separation module by adopting an optical flow method and acquiring a foreground depth map according to the relationship between the motion vector and the depth; the background depth calculation module is used for carrying out depth assignment on the background image obtained by the pixel separation module by extracting the vanishing line and the vanishing point of the background image by adopting a geometric perspective method to obtain a background depth map; and the depth fusion module is used for fusing the foreground depth map obtained by the foreground depth calculation module and the background depth map obtained by the background depth calculation module by adopting a depth fusion method to obtain a fusion depth map.

Compared with the prior art, the invention has the following advantages:

the invention discloses a depth map generation method and a depth map generation device for 2D-3D video conversion, which mainly separate a moving foreground from a relatively static background by applying the technology of combining a Gaussian background modeling method and a background subtraction method to an input original 2D video image sequence; then, respectively generating a background depth map by the moving foreground and the background; and finally, fusing the foreground and background depth maps to obtain the depth map generation method based on fusion of the foreground and the background. The method for extracting the depth map not only improves the quality of the depth map, but also has wider application range; in addition, compared with a method for acquiring a depth map from a single depth cue, the method is more perfect, has wider application range, effectively utilizes important depth cues in a video scene, and improves the quality of the depth map.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a flow chart of a depth map generation method for 2D-3D video conversion in the present invention;

fig. 2 is a connection diagram of a frame of a depth map generating apparatus for 2D-3D video conversion according to the present invention.

Detailed Description

Fig. 1 is a flowchart of a depth map generating method for 2D-3D video conversion according to the present invention, and specifically, as shown in fig. 1, the present embodiment discloses a depth map generating method for 2D-3D video conversion, which includes the following steps:

step 1: processing the 2D video frame by frame to obtain an original video frame sequence;

step 2: processing the video frame sequence obtained in the step 1 by adopting a Gaussian background modeling method and a background subtraction method to obtain a foreground image and a background image;

and step 3: acquiring the motion vector in the foreground image obtained in the step 2 by adopting an optical flow method, and obtaining a foreground depth map according to the relation between the motion vector and the depth;

and 4, step 4: performing depth assignment on the background image obtained in the step 2 by extracting the vanishing line and the vanishing point of the background image by adopting a geometric perspective method to obtain a background depth map;

and 5: and (4) fusing the foreground depth map obtained in the step (3) and the background depth map obtained in the step (4) by adopting a depth fusion method to obtain a fusion depth map.

In step 1, the obtained video frame is an image with subtitles removed, that is, the subtitles of the 2D video are removed first, and then the 2D video is processed frame by frame.

Further, step 2 specifically includes:

step 2.1: calculating a background update rate based on the mean value and the variance of each pixel point of the current frame in the video frame sequence obtained in the step 1, and preliminarily separating background pixel points and foreground pixel points based on the background update rate to obtain a foreground separation image and a background separation image;

step 2.2: performing Gaussian background reconstruction on the background separation image obtained in the step 2.1 to obtain a background image;

step 2.3: and (3) acquiring a difference image of the background image obtained in the step (2.2) and the original video frame obtained in the step (1), wherein a binarization image of the difference image is a foreground image.

In step 2.1, for a background image, the distribution of the brightness of specific pixels satisfies gaussian distribution, that is, for the background image, the brightness of each pixel (x, y) satisfies G_b(x, y) to N (u, d), where the mean u and variance d are the unique attributes of each point, and

wherein p is the background change rate, μ is the mean of the pixels in the original video frame, d is the variance of the pixels in the original video frame, e is a natural constant, is a constant in mathematics, is an infinite acyclic decimal number, and has a value of about 2.72; x is the abscissa of the pixel point in the original video frame; the mean u and the variance d are specific attributes of each pixel point, and when the coordinates of the points change, the mean u and the variance d also change.

The mean u and variance d of each point in the sequence of video frames over a period of time are calculated as the background model B. For an arbitrary video frame sequential image G containing the foreground, for each pixel point (x, y) on the image, calculate if:

the point is considered as a background point, otherwise, the point is considered as a foreground point;

wherein T is a constant threshold, the value of T in this embodiment is 3, G (x, y) is a point coordinate in the original video frame image G, and G (x, y) is a point coordinate in the original video frame image G_b(x, y) are point coordinates of the background model B.

In step 2.2, the background pixel points are updated based on the following formula, specifically, the updating formula is:

G_b,t(x,y)＝p*G_b,t-1(x,y)+(1-p)*G_t(x,y)，

wherein p is a background update rate which is a constant, the larger p is, the slower the background update is, and p is 0.004 in this embodiment; g_t(x, y) is the point coordinate of the original video frame at time t, G_b,t(x, y) is the point coordinate of the updated background image at time t, G_b,t-1And (x, y) is the point coordinate of the updated background image at the time t-1. Typically, d changes only slightly after the background update, so d is typically not updated after the background update. Thereby obtaining the result of video background reconstruction.

In step 2.3, the difference image is obtained based on the following formula, specifically:

D_k(x,y)＝|I_k(x,y)-G_b,t(x,y)|，

wherein D is_k(x, y) isDifferential image coordinates, G_b,t(x, y) is the updated background image coordinates obtained in step 2.2, I_k(x, y) are the original video frame coordinates.

For differential image D_k(x, y) is subjected to binarization processing to obtain G_f(x, y), specifically,

G_f(x, y) are foreground image coordinates, D_k(x, y) are the difference image coordinates.

Further, step 3 specifically includes:

step 3.1: calculating an optical flow motion vector according to an optical flow motion vector calculation formula, wherein the optical flow motion vector calculation formula is as follows:

V＝(A^TW²A)^-1A^TW²b，

wherein,

W＝diag(w₁,w₁,…,w₁),b＝[I_t1,I_t2,…,I_tn]^Tdiag is a diagonal matrix; a. the^TThe matrix is a transpose matrix of a, n is n pixels in the neighborhood of pixel (i, j), n is 1,2,3 … n, and V is a motion vector.

Specifically, according to the basic constraint equation of the optical flow method, assuming that the brightness value of a pixel point (I, j) in an image at time t is I (I, j, t), u (I, j) and v (I, j) represent the motion components of the optical flow at this point in the x and y directions, it can be derived:

I_iu+I_jv+I_t＝0，

wherein I is the abscissa of the pixel, j is the ordinate of the pixel, I is the gray level of the pixel, I_iIs the rate of change of the image gray scale with i,

I_jis the rate of change of the image gray scale with j,

is the rate of change of the image gray scale with time t; u denotes a moving speed of the reference point in the i direction,

v represents the speed of movement of the reference point in the j direction,

wherein,

λ is depth adjustment coefficient, V (i, j)_x，V(i,j)_yIs the motion vector of pixel (i, j); v is a motion vector, V is (u, V)^T. λ is a depth adjustment coefficient. The depth of the entire depth frame is resized by adjusting the size of λ. In order to obtain a three-dimensional video with good parallax effect, the method comprises the steps of

Where max (v) is the size of the largest motion vector in the extracted motion vector field, the depth map is a grayscale map with a range of 0-255, and 255 is the maximum value of the depth map. Motion vector V (i, j)_x，V(i,j)_yIs the motion vector for pixel (i, j). And finally obtaining a depth map of the foreground through depth information assignment.

Further, step 4 specifically includes:

G_b＝255-round(round(j*(|x_o-y_o|)/y_o)*255/(|x_o-y_o|))

In step 4.2, the gradient threshold calculation formula is:

C_t＝α*[S_max(x,y)-S_min(x,y)]+S_min(x,y)，

wherein alpha is a weight coefficient and has a value ranging from 0 to 1, S_max(x, y) is the maximum in the edge gradient map, S_min(x, y) is the minimum in the edge gradient map.

In step 4.3, k-b space is a term, and the truncated form is y ═ kx + b, which refers to the plane where the straight line is located.

Further, in step 5, the fusion formula of the depth fusion method is as follows:

wherein G is_dFor the final depth map, I (x, y) is the pixel value of point (x, y).

Depth value G of final depth map_dThe values of (A) are divided into cases: when a certainWhen the pixel value of the point I (x, y) is 255, the moving foreground area is judged, and finally the depth value G of the foreground depth map is obtained_fAssigning the depth value to the final depth map, and when the pixel value of the point is 0, the depth value of the final depth map is the depth value G of the background depth map_b. And finally obtaining the fused depth map.

Fig. 2 is a frame connection diagram of a depth map generating device for 2D-3D video conversion according to the present invention, and as shown in fig. 2, this embodiment further discloses a depth map generating device for 2D-3D video conversion, which is used to implement the depth map generating method for 2D-3D video conversion described in embodiment 1, and includes:

the video acquisition module is used for processing the 2D video frame by frame to obtain an original video frame sequence;

the pixel separation module is used for processing the video frame sequence obtained by the video acquisition module by adopting a Gaussian background modeling method and a background subtraction method to obtain a foreground image and a background image;

the foreground depth calculation module is used for acquiring the motion vector in the foreground image acquired by the pixel separation module by adopting an optical flow method and acquiring a foreground depth map according to the relationship between the motion vector and the depth;

the background depth calculation module is used for carrying out depth assignment on the background image obtained by the pixel separation module by extracting the vanishing line and the vanishing point of the background image by adopting a geometric perspective method to obtain a background depth map;

and the depth fusion module is used for fusing the foreground depth map obtained by the foreground depth calculation module and the background depth map obtained by the background depth calculation module by adopting a depth fusion method to obtain a fusion depth map.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A depth map generation method for 2D-3D video conversion, comprising the steps of:

2. The method according to claim 1, wherein the step 2 specifically comprises:

3. The method of claim 2, wherein in step 2.1, the background update rate is calculated by the formula:

4. The method according to claim 3, wherein in step 2.2, the Gaussian background reconstruction is performed based on the following formula, specifically, the reconstruction formula is:

G_b,t(x,y)＝p*G_b,t-1(x,y)+(1-p)*G_t(x,y)

5. The method according to claim 4, wherein in step 2.3, the difference image is obtained based on the following formula, specifically:

D_k(x,y)＝|I_k(x,y)-G_b,t(x,y)|，

6. The method according to claim 5, wherein step 3 specifically comprises:

V＝(A^TW²A)^-1A^TW²b，

wherein,

wherein,

G_ffor foreground depth, λ is the depth adjustment factor, V (i, j)_xIs the motion vector of the pixel point (i, j) in the x-axis direction, V (i, j)_yIs the motion vector of the pixel point (i, j) in the y-axis direction.

7. The method according to claim 6, wherein step 4 specifically comprises:

G_b＝255-round(round(j*(|x_o-y_o|)/y_o)*255/(|x_o-y_o|))

8. The method of claim 7, wherein in step 5, the fusion formula of the depth fusion method is:

9. A depth map generating apparatus for 2D-3D video conversion, characterized in that, the depth map generating method for 2D-3D video conversion according to any one of claims 1-8 is implemented, and comprises: