WO2015188666A1 - Procédé et dispositif de filtrage vidéo en trois dimensions - Google Patents

Procédé et dispositif de filtrage vidéo en trois dimensions Download PDF

Info

Publication number
WO2015188666A1
WO2015188666A1 PCT/CN2015/077707 CN2015077707W WO2015188666A1 WO 2015188666 A1 WO2015188666 A1 WO 2015188666A1 CN 2015077707 W CN2015077707 W CN 2015077707W WO 2015188666 A1 WO2015188666 A1 WO 2015188666A1
Authority
WO
WIPO (PCT)
Prior art keywords
pixel
filtered
value
filtering
depth
Prior art date
Application number
PCT/CN2015/077707
Other languages
English (en)
Chinese (zh)
Inventor
朱策
王昕�
郑建铧
张玉花
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2015188666A1 publication Critical patent/WO2015188666A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof

Definitions

  • Embodiments of the present invention relate to image processing technologies, and in particular, to a three-dimensional video filtering method and apparatus.
  • 3D video gradually enters people's lives with its unique depth of field effect, and is applied in many fields such as education, military, entertainment and medical.
  • the current 3D video is mainly divided into two categories according to the video content: pure color 3D video and depth based 3D video.
  • Pure color 3D video directly presents multi-channel color video to users, and its viewpoint position and parallax are fixed, which brings certain limitations to people's viewing.
  • depth-based 3D video can synthesize virtual images of arbitrary viewpoints by depth image-based rendering technology. People can select viewpoints and adjust parallax according to personal preference, so as to enjoy better.
  • 3D video brings fun. This free and flexible feature makes depth-based 3D video the currently accepted 3D video format.
  • the depth-based 3D video content consists of a sequence of texture maps and a sequence of depth maps that visually represent the texture features of the surface of the object, and the depth map reflects the distance between the object and the camera.
  • the specified virtual view texture image can be synthesized using the above video content and depth image based rendering techniques.
  • depth maps and texture maps introduce a lot of noise during acquisition, encoding, and transmission.
  • the noise in the depth map and texture map will cause geometric distortion and texture distortion of the composite image, which will seriously affect people's visual experience.
  • the filtering technology can effectively remove these noises and effectively improve the quality of 3D video.
  • the denoising method for the texture map is mainly a bilateral filter, and the filtering result is obtained by using the pixels around the pixel to be filtered as a reference and weighting and averaging the same.
  • the similarity between pixel positional proximity and pixel value in the image is mainly referred to.
  • the filtering method considers that the closer the distance between two pixel points in the image plane is, the stronger the correlation is; the more similar the pixel values of the two pixel points are, the stronger the correlation is.
  • FIG. 1 is a schematic diagram of the computational proximity of a prior art bilateral filter.
  • the problem of the prior art is that since the pixel points in the image are the reproduction of the points in the real three-dimensional space in the two-dimensional image plane, The bilateral filter does not start from the real three-dimensional scene when considering the pixel proximity.
  • the calculation result is not accurate, as shown in Figure 1, where A', B', C' are three points in the real scene.
  • the position in the image plane is A, B, C collected by the camera, and the distance between A and C on the plane is equal to the distance between B and C on the plane.
  • a and B are reference pixels.
  • FIG. 1 it can be clearly seen that in three-dimensional space, the proximity of B and C is stronger, while the bilateral filter considers that the proximity of A and C and B is consistent with C, so the accuracy of the filtering result is not high.
  • the embodiment of the invention provides a three-dimensional video filtering method and device to overcome the problem that the filtering result in the prior art is not high.
  • an embodiment of the present invention provides a three-dimensional video filtering method, including:
  • determining filtering according to the spatial proximity, the similarity of the texel values corresponding to the depth pixels, and the consistency of the motion features a weighted average of the depth pixel values of the reference pixels in the reference pixel set to obtain a filtering result of the depth pixel value of the depth image to be filtered pixel; or
  • the texture pixel values are weighted averaged to obtain a filtering result of the texture pixel values of the texture image to be filtered.
  • the projecting the pixels in the image plane into the three-dimensional space includes:
  • the pixels are projected from the image plane to the three-dimensional space using depth image information, viewpoint position information, and reference camera parameter information provided by the three-dimensional video; the depth image information includes depth pixel values of the pixels.
  • the using the depth image information, the viewpoint position information, and the reference camera parameter information provided by the three-dimensional video to project the pixel from the image plane to Three-dimensional space including:
  • R and t are the rotation matrix and translation vector of the reference camera
  • A is the reference camera parameter matrix.
  • a coordinate value of the pixel in the three-dimensional space d is a depth pixel value of the pixel
  • f x and f y are normalized focal lengths in horizontal and vertical directions, respectively
  • r is a radial distortion coefficient
  • ( o x , o y ) is a coordinate value of a reference point on the image plane
  • the reference point is an intersection of an optical axis of the reference camera and the image plane.
  • the spatial proximity is through the pixel to be filtered in the three-dimensional space and the The distance of the reference pixel is calculated as an input value of the function; the output value of the function increases as the input value decreases;
  • the texture pixel value similarity is calculated by using a difference value of the texel value of the pixel to be filtered and the reference pixel as a function of an input value; an output value of the function increases as the input value decreases;
  • the motion feature consistency is obtained by calculating whether the motion characteristics of the pixel to be filtered and the reference pixel are consistent, including:
  • the spatial proximity, texture pixel value similarity, and motion feature are The consistency determines the weight of the filtering, and performs weighted averaging on the pixel values of the reference pixels in the reference pixel set to obtain the filtering result of the pixel to be filtered, including:
  • f T (T p , T q ) f T (
  • ) is used for calculating texture pixel value similarity of the pixel to be filtered and the reference pixel;
  • T p is the pixel to be filtered
  • q is the reference pixel
  • K is the reference pixel set
  • D p ' is the depth pixel value after p filtering
  • D q is the depth pixel value of q
  • P, Q are p, q in three-dimensional space
  • T p , T q are the texel values of p and q
  • T p ' and T q ' are the texel values of p and q at the same position in the previous frame
  • T p ' is the texture of the p-filtered texture.
  • the pixel value, th is the preset texture pixel difference threshold.
  • an embodiment of the present invention provides a three-dimensional video filtering method, including:
  • the reference pixel set is in the same frame image and the adjacent multi-frame image as the pixel to be filtered;
  • determining filtering according to the spatial proximity, the texel value similarity corresponding to the depth pixel, and the time domain proximity a weighted average of the depth pixel values of the reference pixels in the reference pixel set to obtain a filtering result of the depth pixel value of the depth image to be filtered pixel;
  • the projecting the pixels in the image plane into the three-dimensional space includes:
  • the pixels are projected from the image plane to the three-dimensional space using depth image information, viewpoint position information, and reference camera parameter information provided by the three-dimensional video; the depth image information includes depth pixel values of the pixels.
  • the using the depth image information, the viewpoint location information, and the reference camera parameter information provided by the three-dimensional video, the pixel from the image plane Projected into 3D space including:
  • R and t are the rotation matrix and translation vector of the reference camera
  • A is the reference camera parameter matrix.
  • a coordinate value of the pixel in the three-dimensional space d is a depth pixel value of the pixel
  • f x and f y are normalized focal lengths in horizontal and vertical directions, respectively
  • r is a radial distortion coefficient
  • ( o x , o y ) is a coordinate value of a reference point on the image plane
  • the reference point is an intersection of an optical axis of the reference camera and the image plane.
  • the spatial proximity is through the pixel to be filtered in the three-dimensional space and the The distance of the reference pixel is calculated as an input value of the function; the output value of the function increases as the input value decreases;
  • the texture pixel value similarity is calculated by using a difference value of the texel value of the pixel to be filtered and the reference pixel as a function of an input value; an output value of the function increases as the input value decreases;
  • the time domain proximity is calculated by using a time interval of the pixel to be filtered and a frame in which the reference pixel is located as a function of an input value; an output value of the function increases as the input value decreases.
  • the spatial proximity, texture pixel value similarity, and time domain are And determining, by the weighting average, the weighted average of the pixel values of the reference pixels in the reference pixel set to obtain the filtering result of the pixel to be filtered, including:
  • f tem (i, N) f tem (
  • N is the frame number of the frame in which the pixel to be filtered is located
  • i is the frame number of the frame in which the reference pixel is located
  • i is an integer in the interval [Nm, N+n]
  • m and n are respectively before the frame in which the pixel to be filtered is located
  • p is the pixel to be filtered
  • q i is the reference pixel in the ith frame
  • K i is the reference pixel set in the ith frame
  • D p ' is p Filtered depth pixel value
  • the depth pixel value of q in the i-th frame, P, Q i is p, the coordinate value of q in the three-dimensional space in the i-th frame, T p , They are p, the texel value of q in the i-th frame, and T p ' is the p-filtered texel value.
  • an embodiment of the present invention provides a three-dimensional video filtering apparatus, including:
  • a projection module configured to project pixels in an image plane into a three-dimensional space; the pixels include a pixel to be filtered and a reference pixel set;
  • a calculation module configured to calculate spatial proximity of the pixel to be filtered and the reference pixel in the three-dimensional space according to coordinate values of the reference pixel in the to-be-filtered pixel and the reference pixel set in the three-dimensional space And wherein the reference pixel set is in the same frame image as the pixel to be filtered;
  • the calculating module is further configured to calculate, according to the texel value of the pixel to be filtered and the reference pixel in the reference pixel set, a texture pixel value similarity between the pixel to be filtered and the reference pixel;
  • the calculating module is further configured to calculate, according to the texel value of the pixel to be filtered, the reference pixel in the reference pixel set, and the pixel of the same position in the previous frame image of the frame where the pixel to be filtered is located, Consistency of motion characteristics of the pixel to be filtered and the reference pixel;
  • a filtering module configured to determine a weight of the filtering according to the spatial proximity, the texel value similarity, and the motion feature consistency, and respectively perform pixel values of the reference pixels in the reference pixel set A weighted average obtains a filtering result of the pixel to be filtered.
  • the filtering module is specifically configured to:
  • the depth of the reference pixel in the reference pixel set Performing a weighted average of the pixel values to obtain a filtering result of the depth pixel value of the pixel to be filtered of the depth image;
  • the pixels are projected from the image plane to the three-dimensional space using depth image information, viewpoint position information, and reference camera parameter information provided by the three-dimensional video; the depth image information includes depth pixel values of the pixels.
  • the projection module is specifically configured to:
  • R and t are the rotation matrix and translation vector of the reference camera
  • A is the reference camera parameter matrix.
  • a coordinate value of the pixel in the three-dimensional space d is a depth pixel value of the pixel
  • f x and f y are normalized focal lengths in horizontal and vertical directions, respectively
  • r is a radial distortion coefficient
  • ( o x , o y ) is a coordinate value of a reference point on the image plane
  • the reference point is an intersection of an optical axis of the reference camera and the image plane.
  • the spatial proximity is through the pixel to be filtered in the three-dimensional space and the The distance of the reference pixel is calculated as an input value of the function; the output value of the function increases as the input value decreases;
  • the texture pixel value similarity is calculated by using a difference value of the texel value of the pixel to be filtered and the reference pixel as a function of an input value; an output value of the function increases as the input value decreases;
  • the motion feature consistency is obtained by calculating whether the motion characteristics of the pixel to be filtered and the reference pixel are consistent, including:
  • the threshold value is determined to be consistent with the motion state of the pixel to be filtered and the reference pixel; otherwise, it is determined that the motion state of the pixel to be filtered and the reference pixel are inconsistent.
  • the filtering module is specifically configured to:
  • f T (T p , T q ) f T (
  • ) is used for calculating texture pixel value similarity of the pixel to be filtered and the reference pixel;
  • T p is the pixel to be filtered
  • q is the reference pixel
  • K is the reference pixel set
  • D p ' is the depth pixel value after p filtering
  • D q is the depth pixel value of q
  • P, Q are p, q in three-dimensional space
  • T p , T q are the texel values of p and q
  • T p ' and T q ' are the texel values of p and q at the same position in the previous frame
  • T p ' is the texture of the p-filtered texture.
  • the pixel value, th is the preset texture pixel difference threshold.
  • an embodiment of the present invention provides a three-dimensional video filtering apparatus, including:
  • a projection module configured to project pixels in an image plane into a three-dimensional space; the pixels include a pixel to be filtered and a reference pixel set;
  • a calculation module configured to calculate spatial proximity of the pixel to be filtered and the reference pixel in the three-dimensional space according to coordinate values of the reference pixel in the to-be-filtered pixel and the reference pixel set in the three-dimensional space And the reference pixel set is in the same frame image and the adjacent multi-frame image as the pixel to be filtered;
  • the calculating module is further configured to calculate, according to the texel value of the pixel to be filtered and the reference pixel in the reference pixel set, a texture pixel value similarity between the pixel to be filtered and the reference pixel;
  • the calculating module is further configured to calculate a time domain proximity of the pixel to be filtered and the reference pixel according to a time interval of a frame in which the reference pixel in the pixel to be filtered and the reference pixel in the reference pixel set are located;
  • a filtering module configured to determine a weight of the filtering according to the spatial proximity, the texel value similarity, and the time domain proximity, and perform weighted averaging on the pixel values of the reference pixels in the reference pixel set respectively to obtain the to-be-filtered The filtering result of the pixel.
  • the filtering module is specifically configured to:
  • the depth of the reference pixel in the reference pixel set Performing a weighted average of the pixel values to obtain a filtering result of the depth pixel value of the pixel to be filtered of the depth image;
  • the pixels are projected from the image plane to the three-dimensional space using depth image information, viewpoint position information, and reference camera parameter information provided by the three-dimensional video; the depth image information includes depth pixel values of the pixels.
  • R and t are the rotation matrix and translation vector of the reference camera
  • A is the reference camera parameter matrix.
  • a coordinate value of the pixel in the three-dimensional space d is a depth pixel value of the pixel
  • f x and f y are normalized focal lengths in horizontal and vertical directions, respectively
  • r is a radial distortion coefficient
  • ( o x , o y ) is a coordinate value of a reference point on the image plane
  • the reference point is an intersection of an optical axis of the reference camera and the image plane.
  • the spatial proximity is through the pixel to be filtered in the three-dimensional space and the The distance of the reference pixel is calculated as an input value of the function; the output value of the function increases as the input value decreases;
  • the texture pixel value similarity is calculated by using a difference value of the texel value of the pixel to be filtered and the reference pixel as a function of an input value; an output value of the function increases as the input value decreases;
  • the time domain proximity is calculated by using a time interval of the pixel to be filtered and a frame in which the reference pixel is located as a function of an input value; an output value of the function increases as the input value decreases.
  • the filtering module is specifically configured to:
  • f tem (i, N) f tem (
  • N is the frame number of the frame in which the pixel to be filtered is located
  • i is the frame number of the frame in which the reference pixel is located
  • i is an integer in the interval [Nm, N+n]
  • m and n are respectively before the frame in which the pixel to be filtered is located
  • p is the pixel to be filtered
  • q i is the reference pixel in the ith frame
  • K i is the reference pixel set in the ith frame
  • D p ' is p Filtered depth pixel value
  • the depth pixel value of q in the i-th frame, P, Q i is p, the coordinate value of q in the three-dimensional space in the i-th frame, T p , They are p, the texel value of q in the i-th frame, and T p ' is the p-filtered texel value.
  • the three-dimensional video filtering method and apparatus of the embodiment of the present invention calculates the spatial proximity and texture pixels of the pixel to be filtered and the reference pixel in the three-dimensional space by using the relationship between the pixel to be filtered and the reference pixel in the real three-dimensional space.
  • 1 is a schematic diagram of computational proximity of a prior art bilateral filter
  • Embodiment 1 of a three-dimensional video filtering method according to the present invention
  • FIG. 3 is a schematic diagram of pixel projection according to Embodiment 1 of the method of the present invention.
  • Embodiment 4 is a flowchart of Embodiment 2 of a three-dimensional video filtering method according to the present invention.
  • FIG. 5 is a schematic diagram of reference pixel selection according to Embodiment 2 of the method of the present invention.
  • FIG. 6 is a schematic structural diagram of an embodiment of a three-dimensional video filtering device according to the present invention.
  • FIG. 7 is a schematic structural diagram of an embodiment of a three-dimensional video filtering device according to the present invention.
  • FIG. 2 is a flowchart of Embodiment 1 of a method for filtering a three-dimensional video according to the present invention
  • FIG. 3 is a schematic diagram of pixel projection according to Embodiment 1 of the method of the present invention.
  • the method in this embodiment may include:
  • Step 201 Project a pixel in an image plane to a three-dimensional space; the pixel includes a pixel to be filtered and a reference pixel set.
  • projecting pixels in the image plane into the three-dimensional space includes:
  • the pixels are projected from the image plane to the three-dimensional space using depth image information, viewpoint position information, and reference camera parameter information provided by the three-dimensional video; the depth image information includes depth pixel values of the pixels.
  • the projecting the pixels from the image plane to the three-dimensional space by using the depth image information, the viewpoint position information and the reference camera parameter information provided by the three-dimensional video comprises:
  • R and t are the rotation matrix and translation vector of the reference camera
  • A is the reference camera parameter matrix.
  • a coordinate value of the pixel in the three-dimensional space d is a depth pixel value of the pixel
  • f x and f y are normalized focal lengths in horizontal and vertical directions, respectively
  • r is a radial distortion coefficient
  • ( o x , o y ) is a coordinate value of a reference point on the image plane
  • the reference point is an intersection of an optical axis of the reference camera and the image plane.
  • the plane where the uv coordinates are located is an image plane
  • the pixel positions in the three-dimensional space are represented by coordinates in the world coordinate system
  • p is a pixel in the image plane
  • the coordinates of the pixels in the image plane are for Applying 3D projection technology to project pixels to P points in the world coordinate system
  • A is the reference camera parameter matrix f x and f y are normalized focal lengths in horizontal and vertical directions, respectively, r is a radial distortion coefficient, and (o x , o y ) is a coordinate value of a reference point on the image plane; The intersection of the optical axis of the reference camera and the image plane.
  • Step 202 Calculate spatial proximity of the pixel to be filtered and the reference pixel in the three-dimensional space according to coordinate values of the reference pixel in the pixel to be filtered and the reference pixel in the reference pixel set; wherein, the reference pixel set and the pixel to be filtered are In the same frame image.
  • the spatial proximity is calculated by using a distance of the pixel to be filtered and the reference pixel in a three-dimensional space as a function of an input value; an output value of the function increases as the input value decreases.
  • the spatial distance between two points may reflect the spatial proximity thereof.
  • the coordinate value calculates the spatial distance, and the spatial distance is used as the input value, for example, the spatial proximity is calculated by a Gaussian function.
  • the function for calculating the spatial proximity may also be other functions, but it is necessary to ensure that the output value of the function decreases with the input value. Small and increasing; wherein the reference pixel set in this embodiment is in the same frame as the pixel to be filtered.
  • Step 203 Calculate texture pixel value similarity of the pixel to be filtered and the reference pixel according to the texel value of the reference pixel in the pixel to be filtered and the reference pixel set.
  • the texture pixel value similarity is calculated by using a difference between a texel value of the pixel to be filtered and the reference pixel as a function of an input value; and an output value of the function decreases with an input value. And increase.
  • the degree of difference in the texture features between the two points reflects the degree of similarity.
  • the pixel value is calculated as a difference value, and the difference value is used as an input value to calculate the similarity of the texel value, for example, by a Gaussian function.
  • the function for calculating the similarity of the texel value may also be other functions, but it is necessary to ensure that the output value of the function follows The input value decreases as the value decreases.
  • Step 204 Calculate motion feature consistency of the pixel to be filtered and the reference pixel according to the texel value of the pixel in the same position in the pixel to be filtered, the reference pixel in the reference pixel set, and the previous frame image of the frame in which the pixel to be filtered is located.
  • the motion feature consistency is obtained by calculating whether the motion feature of the pixel to be filtered and the reference pixel are consistent, including:
  • the relationship between the relative motions between the two points also reflects its similarity in motion, and the more similar the motion, the stronger the correlation. Since it is difficult to obtain the motion information of the pixel from the three-dimensional video sequence, the embodiment of the present invention determines whether the pixel moves by using the difference between the pixels of the two positions of the pixels in the image plane at the same position in the image plane, when the difference is greater than a certain value.
  • a preset threshold is used, the motion feature of the pixel is considered to be motion, and conversely, the motion feature of the pixel is considered to be motionless; further, the pixel to be filtered and the reference pixel of the two frames before and after are in the same position in the image plane.
  • the difference is used to determine whether the motion characteristics of the pixel to be filtered and the reference pixel are consistent. When the difference is greater than or less than a certain threshold, the motion characteristics of the pixel to be filtered and the reference pixel are considered to be the same. Inconsistent features. If the motion characteristics of the pixels are consistent, it is considered to be relevant, and vice versa.
  • Step 205 Determine a weight of the filter according to spatial proximity, texture pixel value similarity, and motion feature consistency, and perform weighted average on the pixel values of the reference pixels in the reference pixel set to obtain a filtering result of the pixel to be filtered.
  • determining, according to the spatial proximity, the texel value similarity corresponding to the depth pixel, and the motion feature consistency determining a weight of the filtering, respectively, in the reference pixel set Performing a weighted average of the depth pixel values of the reference pixels to obtain a filtering result of the depth pixel values of the pixels to be filtered by the depth image;
  • determining the weight of the filtering according to the spatial proximity, the texel value similarity, and the motion feature consistency respectively performing weighted averaging on the pixel values of the reference pixels in the reference pixel set to obtain the filtering result of the pixel to be filtered, including:
  • f T (T p , T q ) f T (
  • ) is used for calculating texture pixel value similarity of the pixel to be filtered and the reference pixel;
  • T p is the pixel to be filtered
  • q is the reference pixel
  • K is the reference pixel set
  • D p ' is the depth pixel value after p filtering
  • D q is the depth pixel value of q
  • P, Q are p, q in three-dimensional space
  • T p , T q are the texel values of p and q
  • T p ' and T q ' are the texel values of p and q at the same position in the previous frame
  • T p ' is the texture of the p-filtered texture.
  • the pixel value, th is the preset texture pixel difference threshold.
  • Means for calculating spatial proximity of the pixel to be filtered and the reference pixel Means for calculating spatial proximity of the pixel to be filtered and the reference pixel; an input value of the function is a spatial distance between the pixel to be filtered and the reference pixel; an output value of the function increases as the input value decreases ;
  • f T (T p , T q ) f T (
  • ) is used for calculating the texel value similarity of the pixel to be filtered and the reference pixel;
  • the input value of the function is the pixel to be filtered a difference from a texel value of the reference pixel; an output value of the function increases as the input value decreases;
  • the motion feature consistency of the pixel to be filtered and the reference pixel that is, the difference between the texel value of the pixel to be filtered and the pixel at the corresponding position in the previous frame, and the reference pixel corresponding to the previous frame
  • the difference between the texel values of the pixels of the location is greater than or less than the preset threshold, and is determined to be consistent with the motion features of the pixel to be filtered and the reference pixel; otherwise determined as the pixel to be filtered and the reference pixel
  • the motion characteristics are inconsistent.
  • p is the pixel to be filtered
  • q is the reference pixel
  • K is the reference pixel set. This set is usually taken as a square region centered on the pixel to be filtered, and the size is 5*5 or 7*7
  • D p ' is filtered by p Depth pixel value
  • D q is the depth pixel value of q
  • P Q are the coordinate values of p, q in three-dimensional space
  • T p , T q are texture pixel values of p, q, T p ' , T q '
  • T p ' is the filtered tex value of p
  • th is the preset texture pixel difference Threshold.
  • Th is a threshold value for judging whether the motion characteristics of the pixel points are consistent, and may be selected according to different contents of the three-dimensional video sequence, generally 6 to 20. When the selection is appropriate, the boundary of the moving object can be better distinguished, so that the boundary of the filtered object is further improved. obvious.
  • step 202, step 203, and step 204 are in no particular order.
  • the spatial proximity, the texel value similarity, and the motion feature of the pixel to be filtered and the reference pixel in the three-dimensional space are calculated by using a relationship between the pixel to be filtered and the reference pixel in a real three-dimensional space.
  • the problem that the filtering result in the prior art is not high is solved.
  • FIG. 4 is a flowchart of a second embodiment of a three-dimensional video filtering method according to the present invention.
  • FIG. 5 is a schematic diagram of reference pixel selection according to a second embodiment of the present invention. As shown in FIG. 4, the method in this embodiment may include:
  • Step 401 Project a pixel in an image plane to a three-dimensional space; the pixel includes a pixel to be filtered and a reference pixel set.
  • projecting pixels in the image plane into the three-dimensional space includes:
  • the pixels are projected from the image plane to the three-dimensional space using depth image information, viewpoint position information, and reference camera parameter information provided by the three-dimensional video; the depth image information includes depth pixel values of the pixels.
  • using the depth image information, the viewpoint location information, and the reference camera parameter information provided by the three-dimensional video to project the pixel from the image plane to the three-dimensional space including:
  • R and t are the rotation matrix and translation vector of the reference camera
  • A is the reference camera parameter matrix.
  • a coordinate value of the pixel in the three-dimensional space d is a depth pixel value of the pixel
  • f x and f y are normalized focal lengths in horizontal and vertical directions, respectively
  • r is a radial distortion coefficient
  • ( o x , o y ) is a coordinate value of a reference point on the image plane
  • the reference point is an intersection of an optical axis of the reference camera and the image plane.
  • Step 402 Calculate spatial proximity of the pixel to be filtered and the reference pixel in the three-dimensional space according to coordinate values of the reference pixel in the to-be-filtered pixel and the reference pixel set in the three-dimensional space;
  • the reference pixel set is in the same frame image and the adjacent multi-frame image as the pixel to be filtered.
  • the spatial proximity is calculated by using a distance of the pixel to be filtered and the reference pixel in a three-dimensional space as a function of an input value; an output value of the function increases as the input value decreases.
  • Step 403 Calculate texture pixel value similarity between the pixel to be filtered and the reference pixel according to the texel value of the pixel to be filtered and the reference pixel in the reference pixel set.
  • the texture pixel value similarity is calculated by using a difference between a texel value of the pixel to be filtered and the reference pixel as a function of an input value; and an output value of the function decreases with an input value.
  • Step 404 Calculate time domain proximity of the pixel to be filtered and the reference pixel according to a time interval of a frame in which the pixel to be filtered and a reference pixel in the reference pixel set are located.
  • the time domain proximity is calculated by using a time interval of the pixel to be filtered and a frame where the reference pixel is located as a function input value; an output value of the function increases as the input value decreases. .
  • the present invention extends the selection range of the reference pixel from the frame where the pixel to be filtered is located to its adjacent frame (filtered reference frame) to increase the filtered frame and frame, based on the weight calculation method of the first embodiment. Continuity between. As shown in FIG. 5, in each filtering reference frame, the selected range of the reference pixel is consistent with the selected range of the reference pixel in the frame to be filtered, wherein the Nth frame is the frame where the pixel to be filtered is currently located, and the previous m frame and the subsequent n are selected.
  • the distance between the two points in the time domain reflects the degree of proximity of the time domain. The closer the time domain distance is, the stronger the correlation is, the larger the time domain proximity is, that is, the pixel to be filtered and the reference pixel can be located.
  • the frame calculates the time interval and uses the time interval as an input value to calculate the time domain proximity, for example, by a Gaussian function.
  • the function for calculating the time domain proximity may also be other functions, but it is necessary to ensure that the output value of the function follows the input value. The decrease increases.
  • Step 405 Determine weights of the filtering according to the spatial proximity, the texel value similarity, and the time domain proximity, and perform weighted averaging on the pixel values of the reference pixels in the reference pixel set respectively to obtain the pixels to be filtered. Filter the result.
  • the depth pixel corresponds to The texel value similarity and the time domain proximity determine the weight of the filtering, respectively performing weighted averaging on the depth pixel values of the reference pixels in the reference pixel set to obtain the depth pixel value of the depth image to be filtered pixel Filtered result; or,
  • the filtering results include:
  • f tem (i, N) f tem (
  • N is the frame number of the frame in which the pixel to be filtered is located
  • i is the frame number of the frame in which the reference pixel is located
  • i is an integer in the interval [Nm, N+n]
  • m and n are respectively before the frame in which the pixel to be filtered is located
  • p is the pixel to be filtered
  • q i is the reference pixel in the ith frame
  • K i is the reference pixel set in the ith frame
  • D p ' is p Filtered depth pixel value
  • the depth pixel value of q in the i-th frame, P, Q i is p, the coordinate value of q in the three-dimensional space in the i-th frame, T p , They are p, the texel value of q in the i-th frame, and T p ' is the p-filtered texel value.
  • Means for calculating spatial proximity of the pixel to be filtered and the reference pixel Means for calculating spatial proximity of the pixel to be filtered and the reference pixel; an input value of the function is a spatial distance between the pixel to be filtered and the reference pixel; an output value of the function increases as the input value decreases ;
  • f T (T p , T q ) f T (
  • ) is used for calculating the texel value similarity of the pixel to be filtered and the reference pixel;
  • the input value of the function is the pixel to be filtered a difference from a texel value of the reference pixel; an output value of the function increases as the input value decreases;
  • f tem (i, N) f tem (
  • N is the frame number of the frame in which the pixel to be filtered is located
  • i is the frame number of the frame in which the reference pixel is located
  • m and n are the number of reference frames before and after the frame in which the pixel to be filtered is located
  • m and n may be 1 ⁇ 3, because as the time interval increases, the correlation between frames and frames becomes very small, which can be ignored.
  • p is the pixel to be filtered
  • q i is the reference pixel in the ith frame
  • K i is the ith a set of reference pixels in the frame, which is usually taken as a square region centered on the pixel to be filtered, and has a size of 5*5 or 7*7
  • Dp ' is a depth pixel value after p filtering.
  • the depth pixel value of q in the i-th frame, P, Q i is p, the coordinate value of q in the three-dimensional space in the i-th frame, T p , They are p, the texel value of q in the i-th frame, and T p ' is the p-filtered texel value.
  • step 402, step 403, and step 404 are in no particular order.
  • the spatial proximity of the pixel to be filtered and the reference pixel in the three-dimensional space, the similarity of the texture pixel value, and the time domain are calculated by using the relationship between the pixel to be filtered and the reference pixel in the real three-dimensional space. Proximity; determining weights of the filtering according to the spatial proximity, the texel value similarity, and the time domain proximity, respectively performing weighted averaging on the pixel values of the reference pixels in the reference pixel set to obtain the filtering result of the pixel to be filtered
  • weights, spatial proximity, texel value similarity and time domain proximity are considered.
  • the pixel points between different frames also have correlation, so the weight value considers the continuity between the frame and the frame after the time domain proximity filtering is strong, which improves the accuracy of the filtering result and solves the prior art.
  • the result of the filtering result is not high.
  • FIG. 6 is a schematic structural diagram of an embodiment of a three-dimensional video filtering apparatus according to the present invention.
  • the three-dimensional video filtering apparatus 60 of the present embodiment may include: a projection module 601, a calculation module 602, and a filtering module 603;
  • the projection module 601 is configured to project pixels in the image plane into the three-dimensional space; the pixels include a pixel to be filtered and a reference pixel set;
  • the calculating module 602 is configured to calculate a space of the pixel to be filtered and the reference pixel in the three-dimensional space according to the coordinate value of the pixel to be filtered and the reference pixel in the reference pixel set in the three-dimensional space Proximity; wherein the reference pixel set is in the same frame image as the pixel to be filtered;
  • the calculating module 602 is further configured to calculate, according to the texel value of the pixel to be filtered and the reference pixel in the reference pixel set, a texture pixel value similarity between the pixel to be filtered and the reference pixel;
  • the calculating module 602 is further configured to calculate, according to the texel value of the pixel to be filtered, the reference pixel in the reference pixel set, and the pixel in the same position in the previous frame image of the frame where the pixel to be filtered is located, Having a consistency of motion characteristics of the filtered pixel and the reference pixel;
  • a filtering module 603 configured to determine a weight of the filter according to the spatial proximity, the texel value similarity, and the motion feature consistency, and perform weighted averaging on the pixel values of the reference pixels in the reference pixel set respectively to obtain the to-be-checked The filtered result of the filtered pixel.
  • the filtering module 603 is specifically configured to:
  • the depth of the reference pixel in the reference pixel set Performing a weighted average of the pixel values to obtain a filtering result of the depth pixel value of the pixel to be filtered of the depth image;
  • the projection module 601 is specifically configured to:
  • the pixels are projected from the image plane to the three-dimensional space using depth image information, viewpoint position information, and reference camera parameter information provided by the three-dimensional video; the depth image information includes depth pixel values of the pixels.
  • the projection module 601 is specifically configured to:
  • R and t are the rotation matrix and translation vector of the reference camera
  • A is the reference camera parameter matrix.
  • a coordinate value of the pixel in the three-dimensional space d is a depth pixel value of the pixel
  • f x and f y are normalized focal lengths in horizontal and vertical directions, respectively
  • r is a radial distortion coefficient
  • ( o x , o y ) is a coordinate value of a reference point on the image plane
  • the reference point is an intersection of an optical axis of the reference camera and the image plane.
  • the spatial proximity is calculated by using an input value of the distance between the pixel to be filtered and the reference pixel in a three-dimensional space as a function; an output value of the function increases as the input value decreases;
  • the texture pixel value similarity is calculated by using a difference value of the texel value of the pixel to be filtered and the reference pixel as a function of an input value; an output value of the function increases as the input value decreases;
  • the motion feature consistency is obtained by calculating whether the motion characteristics of the pixel to be filtered and the reference pixel are consistent, including:
  • the threshold value is determined to be consistent with the motion state of the pixel to be filtered and the reference pixel; otherwise, it is determined that the motion state of the pixel to be filtered and the reference pixel are inconsistent.
  • the filtering module 603 is specifically configured to:
  • f T (T p , T q ) f T (
  • ) is used for calculating texture pixel value similarity of the pixel to be filtered and the reference pixel;
  • T p is the pixel to be filtered
  • q is the reference pixel
  • K is the reference pixel set
  • D p ' is the depth pixel value after p filtering
  • D q is the depth pixel value of q
  • P, Q are p, q in three-dimensional space
  • T p , T q are the texel values of p and q
  • T p ' and T q ' are the texel values of p and q at the same position in the previous frame
  • T p ' is the texture of the p-filtered texture.
  • the pixel value, th is the preset texture pixel difference threshold.
  • the device in this embodiment may be used to implement the technical solution of the method embodiment shown in FIG. 2, and the implementation principle and technical effects are similar, and details are not described herein again.
  • the device of the present embodiment is based on the device structure shown in FIG. 6. Further, the projection module 601 in the three-dimensional video filtering device 60 of the present embodiment is used for the image. a pixel in a plane is projected into a three-dimensional space; the pixel includes a pixel to be filtered and a reference pixel set;
  • the calculating module 602 is configured to calculate a space of the pixel to be filtered and the reference pixel in the three-dimensional space according to the coordinate value of the pixel to be filtered and the reference pixel in the reference pixel set in the three-dimensional space Proximity; wherein the reference pixel set is in the same frame image and adjacent multi-frame image as the pixel to be filtered;
  • the calculating module 602 is further configured to calculate, according to the texel value of the pixel to be filtered and the reference pixel in the reference pixel set, a texture pixel value similarity between the pixel to be filtered and the reference pixel;
  • the calculating module 602 is further configured to calculate a time domain proximity of the to-be-filtered pixel and the reference pixel according to a time interval between the to-be-filtered pixel and a frame in which the reference pixel in the reference pixel set is located;
  • a filtering module 603 configured to determine a weight of the filtering according to the spatial proximity, the texel value similarity, and the time domain proximity, and perform weighted averaging on the pixel values of the reference pixels in the reference pixel set respectively to obtain the to-be-determined The filtered result of the filtered pixel.
  • the filtering module 603 is specifically configured to:
  • the depth of the reference pixel in the reference pixel set Performing a weighted average of the pixel values to obtain a filtering result of the depth pixel value of the pixel to be filtered of the depth image;
  • the texture image When filtering the texture image, determining the weight of the filtering according to the spatial proximity, the texture pixel similarity, and the time domain proximity, respectively, the texture of the reference pixel in the reference pixel set The pixel values are weighted and averaged to obtain a filtering result of the texture pixel values of the texture image to be filtered.
  • the projection module 601 is specifically configured to:
  • the pixels are projected from the image plane to the three-dimensional space using depth image information, viewpoint position information, and reference camera parameter information provided by the three-dimensional video; the depth image information includes depth pixel values of the pixels.
  • the projection module 601 is specifically configured to:
  • R and t are the rotation matrix and translation vector of the reference camera
  • A is the reference camera parameter matrix.
  • a coordinate value of the pixel in the three-dimensional space d is a depth pixel value of the pixel
  • f x and f y are normalized focal lengths in horizontal and vertical directions, respectively
  • r is a radial distortion coefficient
  • ( o x , o y ) is a coordinate value of a reference point on the image plane
  • the reference point is an intersection of an optical axis of the reference camera and the image plane.
  • the spatial proximity is calculated by using an input value of the distance between the pixel to be filtered and the reference pixel in a three-dimensional space as a function; an output value of the function increases as the input value decreases;
  • the texture pixel value similarity is calculated by using a difference value of the texel value of the pixel to be filtered and the reference pixel as a function of an input value; an output value of the function increases as the input value decreases;
  • the time domain proximity is calculated by using a time interval of the pixel to be filtered and a frame in which the reference pixel is located as a function of an input value; an output value of the function increases as the input value decreases.
  • the filtering module 603 is specifically configured to:
  • f tem (i, N) f tem (
  • N is the frame number of the frame in which the pixel to be filtered is located
  • i is the frame number of the frame in which the reference pixel is located
  • i is an integer in the interval [Nm, N+n]
  • m and n are respectively before the frame in which the pixel to be filtered is located
  • p is the pixel to be filtered
  • q i is the reference pixel in the ith frame
  • K i is the reference pixel set in the ith frame
  • D p ' is p Filtered depth pixel value
  • the depth pixel value of q in the i-th frame, P, Q i is p, the coordinate value of q in the three-dimensional space in the i-th frame, T p , They are p, the texel value of q in the i-th frame, and T p ' is the p-filtered texel value.
  • the device in this embodiment may be used to implement the technical solution of the method embodiment shown in FIG. 4, and the implementation principle and technical effects are similar, and details are not described herein again.
  • FIG. 7 is a schematic structural diagram of an embodiment of a three-dimensional video filtering device according to the present invention.
  • the three-dimensional video filtering device 70 provided in this embodiment includes a processor 701 and a memory 702.
  • the memory 702 is configured to store execution instructions.
  • the processor 701 communicates with the memory 702, and the processor 701 calls an execution instruction in the memory 702 for executing the method described in any of the method embodiments.
  • the technical solution has similar implementation principles and technical effects, and will not be described here.
  • the disclosed apparatus and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit or module is only a logical function division, and may be implemented in actual implementation.
  • There are additional ways of dividing for example multiple units or modules may be combined or integrated into another system, or some features may be omitted or not performed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or module, and may be electrical, mechanical or otherwise.
  • the modules described as separate components may or may not be physically separated.
  • the components displayed as modules may or may not be physical modules, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • the aforementioned program can be stored in a computer readable storage medium.
  • the program when executed, performs the steps including the foregoing method embodiments; and the foregoing storage medium includes various media that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Generation (AREA)
  • Image Processing (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)

Abstract

L'invention concerne un procédé et un dispositif de filtrage vidéo en trois dimensions, le procédé de filtrage vidéo tridimensionnel consistant à : projeter les pixels dans le plan d'une image dans un espace tridimensionnel; selon les valeurs de coordonnées des pixels à filtrer et des pixels de référence dans l'espace tridimensionnel, calculer la proximité spatiale des pixels à filtrer et des pixels de référence dans l'espace tridimensionnel; en fonction des valeurs de pixel de texture des pixels à filtrer et des pixels de référence, calculer une similarité de valeurs de pixel de texture des pixels à filtrer et des pixels de référence; en fonction des valeurs de pixel de texture des pixels à filtrer des pixels de référence et des pixels se trouvant dans la même position dans la trame précédente de l'image, calculer la cohérence des caractéristiques de mouvement des pixels à filtrer et des pixels de référence; calculer le poids des pixels à filtrer; déterminer le poids de filtrage selon la proximité, la similarité de valeurs de pixel de texture et la cohérence des caractéristiques de mouvement et établir une moyenne pondérée sur les valeurs de pixel des pixels de référence pour obtenir un résultat de filtrage des pixels à filtrer, ce qui permet d'améliorer la précision de filtrage vidéo en trois dimensions.
PCT/CN2015/077707 2014-06-13 2015-04-28 Procédé et dispositif de filtrage vidéo en trois dimensions WO2015188666A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410265360.4A CN104010180B (zh) 2014-06-13 2014-06-13 三维视频滤波方法和装置
CN201410265360.4 2014-06-13

Publications (1)

Publication Number Publication Date
WO2015188666A1 true WO2015188666A1 (fr) 2015-12-17

Family

ID=51370655

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/077707 WO2015188666A1 (fr) 2014-06-13 2015-04-28 Procédé et dispositif de filtrage vidéo en trois dimensions

Country Status (2)

Country Link
CN (1) CN104010180B (fr)
WO (1) WO2015188666A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115187491A (zh) * 2022-09-08 2022-10-14 阿里巴巴(中国)有限公司 图像降噪处理方法、图像滤波处理方法及装置

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104010180B (zh) * 2014-06-13 2017-01-25 华为技术有限公司 三维视频滤波方法和装置
CN104683783B (zh) * 2015-01-08 2017-03-15 电子科技大学 一种自适应深度图滤波方法
CN105959663B (zh) * 2016-05-24 2018-09-21 厦门美图之家科技有限公司 视频帧间信号连续性的优化处理方法、系统及拍摄终端
CN107959855B (zh) * 2016-10-16 2020-02-14 华为技术有限公司 运动补偿预测方法和设备
CN108111851B (zh) * 2016-11-25 2020-12-22 华为技术有限公司 一种去块滤波方法及终端
CN108833879A (zh) * 2018-06-29 2018-11-16 东南大学 具有时空连续性的虚拟视点合成方法
CN109191506B (zh) * 2018-08-06 2021-01-29 深圳看到科技有限公司 深度图的处理方法、系统及计算机可读存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101651772A (zh) * 2009-09-11 2010-02-17 宁波大学 一种基于视觉注意的视频感兴趣区域的提取方法
CN102271262A (zh) * 2010-06-04 2011-12-07 三星电子株式会社 用于3d显示的基于多线索的视频处理方法
TW201215092A (en) * 2010-09-20 2012-04-01 Nat Univ Chung Cheng A method depth information processing and its application device
JP2013059016A (ja) * 2011-08-12 2013-03-28 Sony Corp 画像処理装置および方法、並びにプログラム
CN104010180A (zh) * 2014-06-13 2014-08-27 华为技术有限公司 三维视频滤波方法和装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004070793A (ja) * 2002-08-08 2004-03-04 Ge Medical Systems Global Technology Co Llc 3次元空間フィルタ装置および方法
JP2006185038A (ja) * 2004-12-27 2006-07-13 Ge Medical Systems Global Technology Co Llc 4次元ラベリング装置、n次元ラベリング装置、4次元空間フィルタ装置およびn次元空間フィルタ装置
CN102238316A (zh) * 2010-04-29 2011-11-09 北京科迪讯通科技有限公司 一种3d数字视频图像的自适应实时降噪方案
CN103369209B (zh) * 2013-07-31 2016-08-17 上海通途半导体科技有限公司 视频降噪装置及方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101651772A (zh) * 2009-09-11 2010-02-17 宁波大学 一种基于视觉注意的视频感兴趣区域的提取方法
CN102271262A (zh) * 2010-06-04 2011-12-07 三星电子株式会社 用于3d显示的基于多线索的视频处理方法
TW201215092A (en) * 2010-09-20 2012-04-01 Nat Univ Chung Cheng A method depth information processing and its application device
JP2013059016A (ja) * 2011-08-12 2013-03-28 Sony Corp 画像処理装置および方法、並びにプログラム
CN104010180A (zh) * 2014-06-13 2014-08-27 华为技术有限公司 三维视频滤波方法和装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115187491A (zh) * 2022-09-08 2022-10-14 阿里巴巴(中国)有限公司 图像降噪处理方法、图像滤波处理方法及装置
CN115187491B (zh) * 2022-09-08 2023-02-17 阿里巴巴(中国)有限公司 图像降噪处理方法、图像滤波处理方法及装置

Also Published As

Publication number Publication date
CN104010180B (zh) 2017-01-25
CN104010180A (zh) 2014-08-27

Similar Documents

Publication Publication Date Title
WO2015188666A1 (fr) Procédé et dispositif de filtrage vidéo en trois dimensions
US10474227B2 (en) Generation of virtual reality with 6 degrees of freedom from limited viewer data
CN111656407B (zh) 对动态三维模型的视图进行融合、纹理化和绘制
CN109003325B (zh) 一种三维重建的方法、介质、装置和计算设备
CN115699114B (zh) 用于分析的图像增广的方法和装置
CN109660783B (zh) 虚拟现实视差校正
EP2992508B1 (fr) Effets de réalité diminuée et médiatisée à partir de reconstruction
US9041819B2 (en) Method for stabilizing a digital video
US9237330B2 (en) Forming a stereoscopic video
US20130127988A1 (en) Modifying the viewpoint of a digital image
JP5011168B2 (ja) 仮想視点画像生成方法、仮想視点画像生成装置、仮想視点画像生成プログラムおよびそのプログラムを記録したコンピュータ読み取り可能な記録媒体
US8611642B2 (en) Forming a steroscopic image using range map
US20130129192A1 (en) Range map determination for a video frame
US20110148868A1 (en) Apparatus and method for reconstructing three-dimensional face avatar through stereo vision and face detection
CN107798704B (zh) 一种用于增强现实的实时图像叠加方法及装置
CN106651853B (zh) 基于先验知识和深度权重的3d显著性模型的建立方法
CN102436671B (zh) 一种基于深度值非线性变换的虚拟视点绘制方法
WO2022126674A1 (fr) Procédé et système d'évaluation de la qualité d'une image panoramique stéréoscopique
Luo et al. Foreground removal approach for hole filling in 3D video and FVV synthesis
Bleyer et al. Temporally consistent disparity maps from uncalibrated stereo videos
CN117730530A (zh) 图像处理方法及装置、设备、存储介质
CN114049442B (zh) 三维人脸视线计算方法
JP2013012045A (ja) 画像処理方法、画像処理装置及びコンピュータプログラム
TWI786107B (zh) 用於處理深度圖之設備及方法
JP6799468B2 (ja) 画像処理装置、画像処理方法及びコンピュータプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15806973

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15806973

Country of ref document: EP

Kind code of ref document: A1