CN116385577A

CN116385577A - Virtual viewpoint image generation method and device

Info

Publication number: CN116385577A
Application number: CN202310198659.1A
Authority: CN
Inventors: 于迅博; 汲鲁育; 邢树军; 高鑫; 沈圣; 张泷; 陈硕; 桑新柱; 颜玢玢
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2023-02-24
Filing date: 2023-02-24
Publication date: 2023-07-04

Abstract

The invention relates to the technical field of image processing, and provides a virtual viewpoint image generation method and device, wherein the method comprises the following steps: determining a virtual viewpoint and an imaging plane of the virtual viewpoint based on pose information of an original viewpoint image; determining a search stepping range of light rays from the virtual viewpoint to each pixel point on the imaging plane based on the depth information of the original viewpoint image; searching each ray based on the searching stepping range, and determining a reconstruction point on each ray; coloring the reconstruction points on the light rays based on the color information of the original viewpoint image to generate a virtual viewpoint image. The invention can generate the virtual viewpoint image with high speed and high quality.

Description

Virtual viewpoint image generation method and device

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a virtual viewpoint image generating method and apparatus.

Background

With the continuous development of computer science and display technology, the three-dimensional display technology becomes the forefront and hottest high-new technology in the display field with rich application scenes. The viewpoint drawing is a key link of the three-dimensional display technology, in the process of drawing a task of a virtual viewpoint drawing, the operation amount of the drawing task is increased sharply due to a depth video stream and a color video stream with huge data amount, and the real-time generation of the virtual viewpoint image is difficult to realize by a conventional algorithm.

The traditional viewpoint image generation algorithm has large data processing capacity, the algorithm has high requirements on the computing capacity and the storage capacity of hardware, in addition, a denser acquisition device is required to capture multiple paths of color video streams or RGBD video streams, and the requirements of some scenes on sparse capture devices are difficult to meet. Aiming at a virtual viewpoint image drawing task based on a sparse video stream, the traditional algorithm is influenced by hardware and environmental factors, noise and holes exist in a depth image acquired in the early stage, the quality of the generated virtual viewpoint image is poor, and information deletion and artifacts are easy to cause. The noise of the depth map and the error of the reconstructed surface of the algorithm under the new viewpoint lead the generated new viewpoint to have holes and blurred edges, and the viewing requirement of three-dimensional display is difficult to meet. Therefore, a high-quality real-time virtual viewpoint image generation technique is critical to the development of holographic video communication and real-time three-dimensional display.

Disclosure of Invention

The invention provides a virtual viewpoint image generation method and device, which are used for solving the defects of low generation speed and poor quality of a generated virtual viewpoint image in the prior art and realizing quick and high-quality virtual viewpoint image generation.

The invention provides a virtual viewpoint image generation method, which comprises the following steps:

Determining a virtual viewpoint and an imaging plane of the virtual viewpoint based on pose information of an original viewpoint image;

determining a search stepping range of light rays from the virtual viewpoint to each pixel point on the imaging plane based on the depth information of the original viewpoint image;

searching each ray based on the searching stepping range, and determining a reconstruction point on each ray;

coloring the reconstruction points on the light rays based on the color information of the original viewpoint image to generate a virtual viewpoint image.

According to the method for generating a virtual viewpoint image provided by the invention, the method for determining the search stepping range of the light rays from the virtual viewpoint to each pixel point on the imaging plane based on the depth information of the original viewpoint image comprises the following steps:

performing coordinate transformation on each pixel point on the imaging plane based on a pixel coordinate system corresponding to the depth map of the original viewpoint image;

and carrying out rasterization remapping on each pixel point subjected to coordinate transformation on the imaging plane to obtain a depth value of each pixel point subjected to coordinate transformation in the original viewpoint image, and determining the range of the depth value as a searching stepping range of light rays from the virtual viewpoint to each pixel point on the imaging plane.

According to the virtual viewpoint image generating method provided by the invention, each ray is searched based on the searching stepping range, and the reconstruction point on each ray is determined, which comprises the following steps:

searching on each ray according to a searching step length based on the searching step range, and determining a target distance function value of a point on each ray under each searching step length;

and determining reconstruction points on the light rays based on points on the light rays, at which the target distance function values change in sign.

According to the virtual viewpoint image generating method provided by the invention, the determining the objective distance function value of the point on each ray under each searching step length comprises the following steps:

determining fusion weights of the light rays corresponding to pixel points on the imaging plane based on the depth information of the original viewpoint image;

and weighting the truncated signed distance function value based on the point under each searching step length on each ray based on the fusion weight corresponding to each ray to obtain the target distance function value of the point on each ray.

According to the virtual viewpoint image generating method provided by the invention, the method for generating the virtual viewpoint image by coloring the reconstruction points on each ray based on the color information of the original viewpoint image comprises the following steps:

Determining a corresponding basic color value of the reconstruction point on the color map of the original viewpoint image, a visibility weight of the reconstruction point under the color map of the original viewpoint image and a mixing weight of the reconstruction point under the color map of the original viewpoint image based on the color information of the original viewpoint image;

weighting the basic color value of the reconstruction point based on the visibility weight and the mixed weight to obtain a target color value of the reconstruction point;

coloring the reconstruction points on the light rays based on the target color values to obtain the virtual viewpoint image.

According to the virtual viewpoint image generation method provided by the invention, the determining of the original viewpoint image comprises the following steps:

acquiring an original color video stream and an original depth video stream;

extracting key images from the original color video stream and removing noise from the original depth video stream to obtain a preprocessed color video stream and a preprocessed depth video stream;

and determining a color image corresponding to the same time in the preprocessing color video stream and a depth image corresponding to the same time in the preprocessing depth video stream as the original viewpoint image.

The present invention also provides a virtual viewpoint image generating apparatus, comprising:

the input module is used for determining a virtual viewpoint and an imaging plane of the virtual viewpoint based on pose information of an original viewpoint image;

the analysis module is used for determining a search stepping range of the light rays from the virtual viewpoint to each pixel point on the imaging plane based on the depth information of the original viewpoint image;

the searching module is used for searching each ray based on the searching stepping range and determining a reconstruction point on each ray;

and the generation module is used for coloring the reconstruction points on the light rays based on the color information of the original viewpoint image to generate a virtual viewpoint image.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the virtual viewpoint image generation method according to any one of the above when executing the program.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a virtual viewpoint image generation method as described in any of the above.

The present invention also provides a computer program product comprising a computer program which when executed by a processor implements a virtual viewpoint image generation method as described in any one of the above.

According to the virtual viewpoint image generation method and device, the pose information, the depth information and the color information of the original viewpoint image are comprehensively analyzed in multiple directions, light rays between the virtual viewpoint and an imaging plane are constructed, the coordinates and the colors of the reconstruction points are solved on the light rays, and finally the virtual viewpoint image is generated. The invention can generate the virtual viewpoint image with high speed and high quality, and the generated virtual viewpoint image can generate and display the light field on the three-dimensional display in real time after the light field is encoded, thereby providing a new solution for generating the real-time viewpoint and meeting the requirement of real-time holographic communication.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a virtual viewpoint image generating method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a light field generation method provided by an embodiment of the present invention;

FIG. 3 is a schematic view of light projection and depth fusion according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a virtual viewpoint image generating apparatus provided by an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In the existing method, a few virtual viewpoint generating algorithms can barely achieve 15-30 FPS generation, but the virtual viewpoint image quality is poor, the noise of the depth image and errors of the reconstruction surface of the algorithm under the new viewpoint lead the generated new viewpoint image to have holes and blurred edges, and the viewing requirement of three-dimensional display is difficult to meet. Based on this, an embodiment of the present invention proposes a virtual viewpoint image generating method, and the virtual viewpoint image generating method according to the embodiment of the present invention is described below with reference to fig. 1 to 3, where the method at least includes the following steps as shown in fig. 1:

Step 101, determining a virtual viewpoint and an imaging plane of the virtual viewpoint based on pose information of an original viewpoint image;

step 102, determining a search stepping range of light rays between a virtual viewpoint and each pixel point on an imaging plane based on depth information of an original viewpoint image;

step 103, searching each ray based on the searching stepping range, and determining a reconstruction point on each ray;

and 104, coloring the reconstruction points on each ray based on the color information of the original viewpoint image to generate a virtual viewpoint image.

It should be noted that, in step 101, the original viewpoint image is original image information captured by the camera, such as a multi-channel color and depth video stream, and includes both information of the image itself and information of the subject to be captured, i.e., the camera itself. In this embodiment, the original viewpoint image is an image captured from the viewpoint of the camera. The original viewpoint images often exist in groups, and typically at least two images captured by different viewpoint cameras form a group of original viewpoint images, and a virtual viewpoint is determined based on the position of the original viewpoint or based on the capturing requirement, and is not coincident with the original viewpoint. Each pixel on the imaging plane, its coordinates in the image coordinate system and the distance from the imaging plane to the virtual viewpoint can be calculated from the Field of View (fov) of the camera and the resolution of the output image, i.e., the coordinates of each pixel on the imaging plane can be determined. fov and output image resolution are known parameters given in terms of display parameters and desired display effect.

It should be noted that, in step 102, the light ray in this embodiment is defined under the camera coordinate system where the virtual viewpoint is located, the starting point is the position of the virtual viewpoint (output viewpoint) camera, that is, the (0, 0) point, and the direction is the direction from the starting point to the pixel point on the imaging plane. The mathematical expression of the light is shown in formula 1, o represents the starting point of the light, d represents the direction vector of the light, and the light travels to the r (t) position at the time t.

r (t) =o+td 0<t < ≡1)

For each pixel on the imaging plane of each virtual viewpoint, a ray needs to be generated in the direction from the virtual viewpoint to the pixel. And each ray corresponds to a point on the three-dimensional fusion surface of the target to be finally obtained. In this embodiment, the depth information of the original viewpoint image includes a plurality of depth maps, and for each pixel on the imaging plane, a corresponding depth value of each depth map under the pixel coordinate system is found through coordinate transformation, a union operation is performed on the depth values, and the result is expanded to a certain extent, so that an effective searching stepping range of the light corresponding to the pixel can be obtained. Meanwhile, when the search stepping range of each ray is determined based on the depth map, the range greatly reduces the invalid search area, so that the search does not need to be performed on each point on the ray, and the working efficiency is improved.

For step 103, it should be noted that searching for each ray refers to iterating forward according to a certain step until the position of the reconstruction point on each ray is found. Reconstruction points are points where each ray impinges on the three-dimensional fused surface of the object.

Specifically, the search method may employ a binary search, i.e., a multiple of binary searches to find an exact root and save the three-dimensional coordinates of the point, which is the reconstructed point that is sought. The basic idea of solving the root by using the binary search is as follows: given a closed interval, let f (x) =0, assuming that the function f (x) has only one zero point in the interval, half the interval is continuously folded in the interval, based on the rooted discrimination condition f (x ₁ )×(x ₂ )<0, a root given the accuracy is always found by being continually halved, and it is reasonable to believe that the root x is "accurate" with respect to accuracy.

For step 104, it should be noted that, through the above three steps, the position of the corresponding point of each pixel on the target three-dimensional fusion surface on the virtual viewpoint image is determined, and then the color value of each reconstructed point pixel is calculated according to the color information of the original viewpoint image, so as to obtain the final virtual viewpoint image. The color information of the original viewpoint image is the corresponding pixel values on the color diagrams of different viewpoints in the original viewpoint image, namely the RGB values.

In addition, after a plurality of virtual viewpoint images are obtained, the original viewpoint images can be combined, and the real-time light field generation and display on the three-dimensional display can be realized through light field coding.

The virtual viewpoint image generation method provided by the embodiment of the invention can be used for quickly and high-quality virtual viewpoint image generation, and the generated virtual viewpoint image can be used for generating and displaying a light field on a three-dimensional display in real time after light field coding, so that a new solution is provided for real-time viewpoint generation, and the requirement of real-time holographic communication can be met.

It can be understood that determining a search stepping range of light rays from the virtual viewpoint to each pixel point on the imaging plane based on the depth information of the original viewpoint image includes:

performing coordinate transformation on each pixel point on an imaging plane based on a pixel coordinate system corresponding to a depth map of an original viewpoint image;

and carrying out rasterization remapping on each pixel point subjected to coordinate transformation on the imaging plane to obtain a depth value of each pixel point subjected to coordinate transformation in an original viewpoint image, and determining the range of the depth value as a searching stepping range of light rays between a virtual viewpoint and each pixel point on the imaging plane.

It should be noted that, in the embodiment of the present invention, a ray casting method that merges with a snowball algorithm may be used to obtain the search step range. For each pixel of the new viewpoint image, after coordinate transformation, each pixel is transformed from a pixel coordinate system corresponding to the imaging plane of the virtual viewpoint to a pixel coordinate system corresponding to each path of depth map. And then, carrying out rasterization remapping on each pixel to obtain a depth value stored in different viewpoint depth maps of each pixel on an imaging plane, wherein the minimum value and the maximum value of the depth value form a searching stepping range on light rays.

Searching a stepping range, namely an effective travelling range interval on the light, wherein the specific solving process is as follows: let j epsilon {1,2,3, …, d } represent the depth image corresponding to different view points captured by each frame, according to the calibrated inner and outer parameter matrix, through coordinate system conversion, for each pixel on the new view point image, the corresponding depth value under the pixel coordinate system of each depth image can be found; and performing a union operation on the d groups of depth values to obtain the boundary of the depth value corresponding to each pixel.

In addition, the boundary value may be extended to some extent, so as to eliminate the influence caused by the error, and then the effective search stepping range of the light corresponding to each pixel on the new viewpoint image is obtained through rasterization.

Specifically, unlike the ray casting method, the snowball casting algorithm, i.e. the Splatting algorithm, repeatedly calculates the projection superposition effect of voxels, calculates the influence range of each voxel projection by using a function called footprint, defines the intensity distribution of the pixels in points or small areas by using a gaussian function, thus calculating the overall contribution of the points or small areas to the image, and synthesizes the points or small areas to form the final image. The light projection method belongs to one kind of volume rendering, and has the following basic principle: and (3) starting from each pixel point on the screen, emitting a light ray along the line-of-sight direction, when the light ray passes through the volume data, sampling at equal intervals along the light ray direction, calculating the color value and the opacity of the sampling point by interpolation, then synthesizing the sampling points on the light ray according to the sequence from front to back or from back to front, and calculating the color value of the pixel point on the screen corresponding to the light ray.

Compared with the method for generating the virtual viewpoint image by performing step search on the whole light from the starting point of the light, the method for generating the virtual viewpoint image of the embodiment of the invention integrates the idea of the spatting algorithm into the light projection algorithm, and solves the effective searching step range of the light, so that the range greatly reduces the invalid region, further accelerates the solving process, and simultaneously expands the range to a certain extent to cope with errors, and finally obtains the effective searching step range of the light corresponding to the pixel. Step forward in a search range, which greatly improves the solving speed

It can be appreciated that searching each ray based on the search step range, determining a reconstruction point on each ray includes:

searching on each ray according to the searching step length based on the searching step range, and determining the objective distance function value of the point on each ray under each searching step length;

the reconstruction points on each ray are determined based on the points on each ray where the sign of the target distance function value changes.

In the searching process, the target distance function value is required to be calculated for the point on the light ray after each searching step length, the value fuses the depth information of a plurality of depth maps, the iteration is forward carried out in the effective searching step range of the light ray according to a certain step length, the step length is in direct proportion to the weighted truncated signed distance function value, the iteration gradually approaches the position of the final solution and refines the searching until the positive and negative of the weighted truncated signed distance function value change, and the position of the final solution is obtained by carrying out binary searching for multiple times. The sign change refers to sign change of the target distance function value after the fusion of the multiple paths of depth images.

Specifically, the reconstructed points in the virtual viewpoint direction are the points on the object surface determined after depth fusion, that is, the points which are finally colored, and as shown in fig. 3, the specific solving steps are as follows: calculating a target distance function value s from the starting point of the searching stepping range of the light rays; the method comprises the steps of iterating forward along the light according to a certain step length, wherein the step length is in direct proportion to s until the sign of the value of s changes positively and negatively, finding out an accurate root through multiple binary search, and storing the three-dimensional coordinates of the point, namely the point on the depth fusion surface.

It will be appreciated that determining the objective distance function value for each point on each ray for each search step includes:

determining fusion weights of pixel points on an imaging plane corresponding to light rays based on depth information of an original viewpoint image;

It should be noted that, the fusion weight is used for depth fusion under a multi-path depth map, and in this embodiment, in the process of calculating the fusion weight, by incorporating a block of area around a pixel into the weight calculation, the influence of high variance noise on the image quality of the new viewpoint is further reduced.

The specific operation of calculating the fusion weight is as follows: for each pixel on the imaging plane, starting from the starting point of the effective searching stepping range of the corresponding light ray, for a point p on the light ray, finding the corresponding pixel coordinate of the point p on the depth map j through coordinate transformation, and calculating the fusion weight w of the depth map j by using a pixel area surrounding the pixel _j As shown in formula 2; if the point p is located outside the depth image boundary under the depth image coordinate system after coordinate conversion, let w _j =0, ignoring the effect of the depth image on the final fusion effect.

Wherein d is _i Represents depth value, N _i Representing a block of neighborhood pixels, T is the truncated distance.

In addition, if the point p is located at the edge of the depth image, boundary expansion is performed first, and then w is calculated according to the above idea by combining the expansion value _j . The edge part of the image is subjected to boundary expansion, and the edge part of the final virtual viewpoint image can be smoother, which is characterized in that: the image edge value is taken as an expansion value, and the expansion value is outwards expanded to enlarge the edge range, so that the area can meet the requirementThe requirements of the weight calculation range.

Truncated signed distance function (truncated signed distance function, TSDF) based is a common method of computing the hidden surface in 3D reconstruction. In this embodiment, the depth image pixel value is subtracted from the corresponding depth value of the point in the camera coordinate system, and then truncated, so that the truncated operation can reduce the calculation of a part of invalid data.

The method for defining and calculating the signed distance function value in the embodiment of the invention comprises the following steps: the corresponding pixel value of the p point under each depth image pixel coordinate system is a depth value d _j Subtracting d from the depth value in the camera coordinate system _j The signed distance function value s is obtained _j 。

The truncated signed distance function value in the embodiment of the invention is obtained by truncating the signed distance function value, and the specific calculation method comprises the following steps: for signed distance function value s _j The signed distance function value tsdf based on the truncation can be obtained by the truncation _j This operation is performed by means of a clamp (s _j -T, T) function implementation, where T represents the truncation distance, which value may depend on the final generation effect. clamp(s) _j The specific operation of the, -T, T) function is shown in equation 3:

the fused signed distance function value based on the truncation is obtained by fusing the weight w _j For tsdf _j The value is weighted as shown in equation 4:

in particular, when s _j <at-T, let w _j =0, in order to avoid unnecessary computations.

It can be understood that coloring the reconstructed point on each ray based on the color information of the original viewpoint image generates a virtual viewpoint image, including:

based on the color information of the original viewpoint image, determining a corresponding basic color value of a reconstruction point on the color image of the original viewpoint image, a visibility weight of the reconstruction point under the color image of the original viewpoint image and a mixed weight of the reconstruction point under the color image of the original viewpoint image;

coloring the reconstruction points on each ray based on the target color value to obtain a virtual viewpoint image.

It should be noted that, the acquisition of the final color requires the original color information, the visibility weight, and the blending weight, which are determined based on the pose information of the original viewpoint image and the virtual viewpoint image. At the same time, the coloring depends on an internal reference matrix of the camera, wherein the internal reference matrix of the camera is an inherent hardware parameter of the camera. The re-coloring of the solved points on each ray comprises the following sub-steps:

obtaining corresponding basic color values on different viewpoint color maps through coordinate transformation;

carrying out Shadow Map calculation under the view point of the color Map, and carrying out sampling and averaging through a PCF algorithm, so as to obtain PCF visibility weights of corresponding pixels on the color Map of different view points;

solving an included angle between the normal line of the point on the light ray and the viewpoint of the color image, and further obtaining a mixed weight;

multiplying the RGB value, the visibility weight value and the mixed weight value under each viewpoint color map, and then adding multiple paths of color values to obtain weighted RGB values, namely corresponding pixel colors on a final virtual viewpoint image;

The above operation is performed for each pixel, and a high-quality virtual viewpoint image is obtained.

As an example, the basic color value is calculated by: and obtaining the values of the pixels corresponding to the points obtained by solving on each ray on different viewpoint color charts, namely the initial RGB color values through coordinate transformation.

As an example, the method for calculating the visibility weight is: and for each path of color Map, carrying out Shadow Map calculation under the color Map viewpoint on the point obtained by solving the light, and carrying out sampling and averaging through a PCF algorithm, thereby obtaining PCF visibility weights of the corresponding points on the color maps of different view points.

Specifically, shadow Map is a technology for generating real-time shadows, and the basic principle is as follows: a camera is first placed at the light source, a depth map is generated by the camera, and then the depth of the currently rendered point is compared with the depth in the depth map when rendering the object that needs to be shadow-accepting. Of course, the depth of the current point needs to be converted into the light source viewpoint coordinate system. If the current rendering point is located at a distance from the light source in the light source viewpoint coordinate system that is farther than the depth value in the depth map, this indicates that this position cannot be illuminated by the light source, i.e. the point is in shadow, otherwise the point can be illuminated. In general, the weight of a point in the shadow can be reset to 0 and the weight of a point that can be illuminated to 1, thereby achieving a colored visibility weighting. However, the Shadow Map has only weights of 0 and 1, which causes a jaggy feeling at the image edge, so the solution of this embodiment is to smooth the jaggies by sampling using PCF algorithm. The basic idea of the PCF algorithm in this embodiment is: the sampling method is used for sampling pixels near a specific pixel of the Shadow Map and taking an average value of the pixels, so that a number of 0-1 can be obtained, and the number is used as a weight to participate in calculation so that saw teeth at the edge are smoother. In the PCF algorithm, the depth of the point is found not only, but also the depth of the points around the point is found on the depth map, a region around the point is taken, the depth of each point in the region is compared with the distance of the point under the light source viewpoint coordinate system (the value after each comparison is 1 or 0,1 indicates that the point is visible, 0 indicates that the point is invisible, the obtained values in the region are weighted and averaged, and the final value is a value between 0 and 1, so that the visibility weight is finally obtained.

As an example, the method for calculating the mixing weight is: and solving the included angle between the normal line of each point and each color camera viewpoint for the point obtained by solving on each ray, thereby obtaining the mixed weight. The point cloud normal algorithm comprises the following steps: and calculating a normal direction after constructing the mesh grid and directly calculating the normal direction through the point cloud coordinates.

Preferably, the point cloud normal calculation method directly calculates a normal direction through point cloud coordinates, and the method specifically comprises the following steps: polynomial surface local fitting algorithm and least square method plane fitting estimation algorithm. The embodiment adopts a least square method plane fitting estimation algorithm, the calculation time of the method is short, the requirement of the method on real-time performance can be met, and the calculation flow is as follows: the problem of determining a point normal to the surface approximates the problem of estimating a tangent surface normal to the surface, and therefore becomes a least squares plane fit estimation problem after conversion; and calculating the normal direction of each point and the direction of the point and the view point of the color camera, and solving cosine value of the included angle of the normal direction of each point and the direction of the point and the view point of the color camera as the mixing weight of the point under the view point of the color camera.

By way of example, the weighting process flows as follows: multiplying the RGB value, the visibility weight value and the mixing weight value under each view color map to obtain a color mixing value corresponding to each view color map; and then adding and fusing the color mixed values corresponding to each viewpoint color map to obtain the color value weighted by the final multipath color map, namely the target color value.

According to the virtual viewpoint image generation method, for the reconstruction points obtained by solving on each ray, RGB values corresponding to different viewpoint color charts are obtained through coordinate transformation, shadow Map calculation under the viewpoint of the color charts is performed, sampling and averaging are performed through a PCF algorithm, and PCF visibility weights of pixels corresponding to the different viewpoint color charts are obtained. Because the Shadow Map value can be used for representing the shielding information, and the PCF algorithm reduces the variance of the weight by fusing the values around the target pixels, so that the information transition on the new viewpoint image is smoother, the visibility weight can make up for the hole under the specific view angle, and the final new viewpoint image contains more effective information. And the view information is introduced by considering the mixing weight, so that the color value is more accurate. The color value of the generated virtual viewpoint image is weighted by the visibility weight and the mixing weight, and the color value is integrated with shielding information and view angle information, so that the color information is richer and more real.

It will be appreciated that determining the original viewpoint image includes:

acquiring an original color video stream and an original depth video stream;

extracting key images from an original color video stream and removing noise from the original depth video stream to obtain a preprocessed color video stream and a preprocessed depth video stream;

And determining a color image corresponding to the same time in the preprocessing color video stream and a depth image corresponding to the same time in the preprocessing depth video stream as an original viewpoint image.

Extracting key images of the collected multipath color video streams, removing noise of the multipath depth video streams, and optimizing the overall running time through frame buffering comprises the following steps:

the color video stream is composed of a plurality of color images. The color image, the pixel is the basic unit, each pixel point contains a plurality of color components, each color component is called a channel, the number of channels of all pixels in the image is consistent, and the pixels of the RGB color image comprise three color components of red, green and blue. The depth video stream is composed of multiple frames of depth images. A depth image, each pixel value of which stores a depth value, reflects the distance of an object in the scene from the camera. The same time may correspond to multiple color images and multiple depth images, and all the image frames at the same time correspond to the original viewpoint images which are processed later.

The key image extraction function is to extract the effective part, reduce the processing amount of the subsequent data, and further improve the efficiency and speed. For example, for a video communication scene, the active portion is a portrait region, and for other scenes, the active portion is a different region of interest. The key image extraction utilizes a real-time matting algorithm,

Optionally, the real-time matting algorithm includes a priori information based and no priori information is required.

Preferably, the real-time matting algorithm is based on prior information. Optionally, the real-time matting algorithm based on the prior information includes Deep Image Matting algorithm, background Matting algorithm, background Matting V algorithm and the like. Background Matting V2 algorithm, based on the finer network and the base network, the processing flow in this embodiment includes: the base network processes the low-resolution image, and the finer network selects a specific image block on the original high-resolution image to process according to the processing result of the base network; the base network inputs the image and the background which are sampled by c times, and outputs a rough result through encoding and decoding; the refine network is a two-stage network, the input is firstly output in a first stage through partial CBR operation, the output is combined with the n multiplied by n image blocks extracted from the original input and input in a second stage, and finally the image blocks after refine are exchanged with the result obtained by the base to obtain a final result.

The noise removal is used for removing the dark holes of the depth map caused by the capturing principle of the acquisition equipment and the limitation of hardware and the influence of the acquisition environment, removing noise points existing in the depth image and retaining the edge details of objects so as to generate the depth map with relatively high quality.

Optionally, the noise removal may utilize denoising filtering algorithms including median filtering, gaussian filtering, joint bilateral filtering and bilateral filtering, or a combination of filtering algorithms.

Preferably, the filtering algorithms are median filtering and spatial neighborhood filtering. The denoising algorithm based on median filtering and spatial neighborhood filtering comprises the following processing flows in the embodiment: if the gray level of a certain pixel of the current frame is not 0, the pixel is not processed, if the gray level of the certain pixel is 0, the pixels at the corresponding positions of the images of each frame of the image frame group are taken out and put into an array, the pixels are inserted and sequenced, the median value is taken to replace the original pixel value, and most of black holes are filled after median filtering; because each frame is not far apart when the data set is acquired, black holes are formed in all frames of the frame group in some places, and a spatial neighbor filtering method is adopted at the moment, and points with pixel values different from 0 are searched around the current pixel position to fill the black holes.

According to the virtual viewpoint image generation method, for the color video stream, the extraction of the key image part is completed through the real-time matting algorithm on the color image, the redundancy information is reduced to further accelerate, and the matting algorithm firstly operates on the low-resolution image to achieve the real-time effect. And for the depth video stream, removing noise points existing in the depth image and reserving edge details of the object by a two-step real-time denoising algorithm. Meanwhile, 2D points on the depth map are converted to 3D points under a world coordinate system, and then the 3D points are projected onto the color image to realize the alignment of the depth map and the color map, so that the influence caused by the spatial position deviation of the depth camera and the color camera is reduced.

It can be understood that, as shown in fig. 2, the embodiment of the invention also discloses a light field generating method, which comprises at least the following steps:

step 201, after calibrating the acquisition equipment, acquiring multiple paths of color video streams and depth video streams;

step 202, extracting key images of the collected multi-channel color video streams, removing noise of the multi-channel depth video streams, and aligning the color video streams and the depth video streams to generate an original viewpoint image;

step 203, determining a virtual viewpoint and an imaging plane of the virtual viewpoint, and generating a light ray for each pixel on the imaging plane along the direction from the viewpoint to the pixel;

step 204, obtaining an effective searching stepping range of each ray through pixel remapping combined with a spatting algorithm;

step 205, performing boundary expansion on the edge part of the imaging plane, and calculating the fusion weight corresponding to the depth image of the point under different view angles from the starting point of the effective searching stepping range;

step 206, iterating forward according to a certain step length until the truncated signed distance function value changes sign, and obtaining the position of the reconstruction point in the virtual viewpoint direction through multiple binary search;

Step 207, calculating the visibility weight and the mixing weight of the reconstruction point under each color image viewpoint, weighting with the basic color value, and drawing a virtual viewpoint image;

step 208, performing light field coding on the multiple virtual viewpoint images generated by the high-speed parallel GPU architecture to realize real-time light field generation and display on the three-dimensional display.

It should be noted that, the GPU architecture with high speed parallelism is used to accelerate the computation process by performing GPU parallel programming through CUDA. CUDA, compute Unified Device Architecture, unified computing device architecture, is a general parallel computing architecture derived by NVIDIA, which enables GPU (Graphics processing unit) to solve complex computing problems, and includes a CUDA Instruction Set Architecture (ISA) and a parallel computing engine inside a GPU, CUDA is a GPU model of NVIDIAD, which uses C language to write programs, the written programs can run with ultra-high performance on a processor supporting CUDA, and CUDA3.0 has already begun to support c++ and FORTRAN; compared to a CPU, a GPU has a larger memory bandwidth, has more execution units, is suitable for highly parallelized work, has lower demands on complex flow control due to the same operation on each data unit, and because many data units are processed and has a high operation density, memory read latency can be hidden by operation.

The light field coding is used for generating a light field coding image suitable for three-dimensional display, and the specific flow is as follows: generating a corresponding number of virtual viewpoint images according to the requirement of the final three-dimensional display for displaying the number of the viewpoints, and extracting specific sub-pixels in each generated viewpoint image sequence, wherein the generated new images are called light field coding images in a certain regular arrangement; for a grating 3D display, a light field coding image is displayed on a 2D display panel in the grating 3D display, different viewpoint display areas are formed in space by light rays emitted by sub-pixels under the light control action of the grating, and when left eyes and right eyes of a viewer are in the different viewpoint areas, the image with a stereoscopic effect is seen, and the process is called a stereoscopic image reproduction process; for integrated imaging 3D display, the light field coding mode may be selected from the group consisting of two-shot method, multi-layer synthesis method, viewpoint synthesis method, and back tracking synthesis method.

The light field is a four-dimensional concept of light rays in space propagation, the light field is a parameterized representation of a four-dimensional optical radiation field containing position and direction information in space, and is the total of optical radiation functions of all light rays in space.

The three-dimensional display can reconstruct a light field in space, so that a user can obtain a stereoscopic visual effect and feel more real visual experience.

The light field generation method provided by the embodiment of the invention can preprocess the depth video stream and the color video stream which are acquired in real time and then generate the virtual viewpoint image with high speed and relatively high quality. After the virtual viewpoint image is coded by the light field, the light field can be generated and displayed on the three-dimensional display in real time. A new solution is provided for real-time viewpoint generation, and the requirements of real-time holographic communication can be met.

The virtual-viewpoint-image generating apparatus provided by the present invention will be described below, and the virtual-viewpoint-image generating apparatus described below and the virtual-viewpoint-image generating method described above may be referred to in correspondence with each other. As shown in fig. 4, the virtual viewpoint image generating apparatus includes:

an input module 401 for determining a virtual viewpoint and an imaging plane of the virtual viewpoint based on pose information of an original viewpoint image;

an analysis module 402, configured to determine a search stepping range of light rays from the virtual viewpoint to each pixel point on the imaging plane based on depth information of the original viewpoint image;

the searching module 403 is configured to search each ray based on the search stepping range, and determine a reconstruction point on each ray;

the generating module 404 is configured to color the reconstructed points on each ray based on the color information of the original viewpoint image, and generate a virtual viewpoint image.

The virtual viewpoint image generating device provided by the embodiment of the invention can generate the virtual viewpoint image with high speed and high quality, and the generated virtual viewpoint image can generate and display the light field on the three-dimensional display in real time after the light field is encoded, so that a new solution is provided for generating the real-time viewpoint, and the requirement of real-time holographic communication can be met.

It can be understood that the device also comprises an acquisition module, and the acquisition module is of an array topological structure. And after calibrating the acquisition equipment, acquiring multiple paths of color and depth video streams.

It should be noted that, the array topology refers to arrangement of the acquisition devices in the scene, and by setting the array topology, the sparse acquisition device array can capture target scenes with different view angles in a larger range, so as to capture enough scene information with fewer acquisition devices as much as possible. The calibration is used for obtaining an internal reference matrix and an external reference matrix of the acquisition equipment. The internal reference matrix and the external reference matrix are used for coordinate system conversion in the subsequent process, namely mutual conversion among a world coordinate system, a camera coordinate system, an image coordinate system and a pixel coordinate system.

Specifically, the acquisition module is composed of at least one set of acquisition equipment and is connected with a computer for carrying out subsequent data processing through a connecting wire. Optionally, the collecting device includes: an integrated RGBD camera, such as Microsoft's Kinect series camera, intel's RealSense series camera; a combined acquisition module combining a color camera and a depth camera, etc.

In addition, the world coordinate system is used for describing the position of any object in the environment, and one reference coordinate system can be arbitrarily selected as the world coordinate system in the environment. The camera coordinate system, namely the viewpoint coordinate system, is a coordinate system taking a viewpoint (optical center) as an origin and taking the direction of a sight line as the positive direction of a Z+ axis, and the world coordinate system to the camera coordinate system only relates to rotation and translation, belongs to rigid transformation, does not relate to deformation, and the transformation between the two is realized through an external parameter matrix. The image coordinate system, the camera coordinate system and the camera coordinate system belong to perspective projection relation, and are converted from 3D to 2D. The pixel coordinate system is a rectangular coordinate system u-v which is established by taking the upper left corner of the image as an origin and takes pixels as units, and the conversion between the camera coordinate system and the pixel coordinate system is realized through an internal reference matrix.

As an example, the algorithm for calibrating the acquisition device includes a linear calibration method, a nonlinear calibration method, and a two-step calibration method.

Preferably, the calibration algorithm is a two-step calibration method. Optionally, the two-step calibration algorithm includes: tsai algorithm, a two-step planar template method, a calibration algorithm based on PnP and BA algorithms, and the like. The calibration algorithm flow based on PnP and BA algorithms is as follows: the internal parameters of the acquisition equipment are obtained through calibration of a Camera Calibrator tool kit of MATLAB; the external parameters of the acquisition equipment are approximately solved through a PnP algorithm, and then an accurate solution is obtained through a BA algorithm.

The main functions of the acquisition module in this embodiment include: collecting a plurality of frames of color images containing calibration plates required by camera calibration, wherein the images are sent to a calibration unit in an image preprocessing module through a transmission line; collecting color video streams under the viewpoint, wherein the color video streams comprise color information required by the subsequent drawing of new viewpoints, and the color video streams are sent to a portrait extraction unit in an image preprocessing module through a transmission line; and collecting a depth video stream under the viewpoint, wherein the depth video stream contains depth information required by the subsequent new viewpoint drawing, and the depth video stream is sent to a noise removal unit in the image preprocessing module through a transmission line. The placement position of each path of acquisition equipment in the acquisition module can be determined through calculation and experiment of the coverage range of the visual angle of the camera. The array topology structure of the acquisition equipment is determined through calculation and experiments, so that the sparse acquisition equipment array can capture target scenes with different visual angles in a larger range.

It will be appreciated that the apparatus may also include an image preprocessing module, which is constructed by code, which runs on a computer connected to the acquisition module. The image preprocessing module comprises four processing units, namely: the calibration unit is used for calibrating the camera; the portrait extraction unit is used for real-time matting; a noise removal unit for real-time denoising filtering; and a frame buffer unit for shortening the overall operation time. The calibration unit only needs to run once after the system determines the placement position of the acquisition equipment, generates the camera internal and external parameters required by the operation of the subsequent video stream, and the unit does not participate in the calculation in the process of the subsequent video stream.

The calibration unit is used for processing the multi-frame color image containing the calibration plate and obtained by the acquisition module, and finally generating an internal reference matrix and an external reference matrix of the camera through the operation of an internal reference calibration algorithm and an external reference calibration algorithm.

The image extraction unit is used for processing the color video stream acquired by the acquisition module, removing invalid background information through the operation of a real-time matting algorithm, and only reserving an effective area containing the image, thereby reducing the data volume of subsequent operation and further improving the overall efficiency and speed of the method.

The noise removing unit is used for processing the depth video stream acquired by the acquisition module, removing noise in the depth video stream through the operation of a real-time denoising filter algorithm, and thus obtaining a high-quality depth map filled with holes.

The frame buffer unit is used for storing the acquired color image frames and depth image frames into a memory area in advance, so that the follow-up fetch of the follow-up modules is realized, and the shortening of the whole running time is realized.

It can be understood that the frame buffer unit is connected with the input module, and the original viewpoint image is stored through the frame buffer, so that the technology can offset the reading time of the color video stream and the depth stream, and improve the overall operation speed.

It should be noted that, the frame buffer can optimize the overall operation time, and the acquired color image frames and depth image frames are shortened by the frame buffer technology, and the premise of the operation is that the time of viewpoint drawing is longer than the time of reading the images from the acquired memory, and the experimental result shows that the method meets the condition.

Optimizing overall run time, in addition to the above approach, uses some additional acceleration strategies to further meet the real-time requirements, such as: the branch judgment statement is reduced as much as possible; using register memory, local memory and texture memory to accelerate data reading; for large-scale scenarios, a GPU-CUDA clustering scheme, etc., may be employed.

It will be appreciated that the analysis module is constructed by code, which is connected to the image preprocessing module described above, belonging to the next stage of data processing. The color video stream after portrait extraction and the depth video stream after noise removal generated by the image preprocessing module are directly transferred and processed by the analysis module in the computer memory. Specifically, the analysis module includes two units: a light generating unit and a stepping range solving unit.

The light generating unit is used for generating a light ray taking a virtual viewpoint (output viewpoint) camera as a starting point and taking the starting point to the pixel point as a direction for the pixel point on each imaging plane according to the parameters of the required output virtual viewpoint image.

The stepping range solving unit is used for finding the corresponding depth value under the pixel coordinate system of each depth image for each pixel on the new viewpoint image through coordinate conversion, carrying out union operation on the depth values, and expanding the result to a certain extent, so that the effective searching stepping range of the light corresponding to the pixel can be obtained.

It will be appreciated that the search module is also constructed by code, and is connected to the image analysis module, and performs a step search on the light based on the result output by the analysis module. Specifically, the analysis module includes two units: and the fusion weight calculation unit and the dynamic iteration unit.

The fusion weight calculation unit is used for starting from the starting point of the effective searching stepping range of the corresponding light ray for each pixel on the imaging plane, finding the corresponding pixel coordinate of the point on each viewpoint depth map through coordinate conversion for one point on the light ray, and calculating the fusion weight under the corresponding depth map by utilizing a pixel area around the pixel. If the point is located at the edge position of the depth image, boundary expansion is firstly carried out, and then the calculation of the fusion weight is carried out.

The dynamic iteration unit is used for carrying out forward iteration on the light rays corresponding to each pixel on the new viewpoint image according to a certain step length until the calculated weighted truncated signed distance function value is subjected to sign change, and then obtaining a point on the fusion surface as a reconstruction point through multiple binary search.

It is understood that the generating module includes three processing units, respectively: a visibility weight calculation unit, a mixing weight calculation unit, and a color weighting unit. The operational relationship between the three processing units is serial.

The visibility weight calculation unit is used for carrying out Shadow Map calculation under the view point of the color Map on the point obtained by solving the light for each path of color Map, and carrying out sampling and averaging through the PCF algorithm, so as to obtain the visibility weight of the point corresponding to the color Map of different view points.

The mixed weight calculation unit is used for solving the reconstruction point obtained by solving each ray, solving the included angle between the normal line of the point and each color camera viewpoint, and further obtaining the mixed weight.

The color weighting unit is used for solving the reconstruction point obtained by solving each ray through coordinate transformation to obtain the value of the corresponding pixel on the color chart of the different view points, namely the basic color value. Multiplying the basic color value, the visibility weight value and the mixing weight value under each viewpoint color chart to obtain a color mixing value corresponding to each viewpoint color chart. And then adding and fusing the color mixed values corresponding to each viewpoint color map to obtain the target color value weighted by the final multipath color map, namely the corresponding pixel value on the final virtual viewpoint image.

The above operation is performed for each pixel on the virtual viewpoint image, and a complete virtual viewpoint image is generated.

As an example, the present embodiment discloses a light field generating device, which may further include a light field generating and displaying module based on the virtual viewpoint generating device, where the light field generating and displaying module is constructed by codes, and is connected to the generating module and runs on a computer. The input of the light field generation and display module is a plurality of view images generated by the module. The computer is connected with the three-dimensional display, so that the light field coding diagram is input to the three-dimensional display in real time, and the real-time light field generation and display on the three-dimensional display can be realized.

The configuration of the light field generating means is as follows: the acquisition module comprises five color cameras and four depth cameras, wherein one color camera is used for amplifying the face details of the person, so that the face part of the finally generated virtual viewpoint image is clearer. In addition, the light field generating device is also required to be provided with a high-performance computer and a three-dimensional display, wherein the high-performance computer can generate a virtual viewpoint image and perform light field coding through running an algorithm program, and the three-dimensional display is further used for generating a light field in real time based on a light field coding image input in real time, so that the effect of naked eye true three-dimensional display is achieved. The display viewing angle of the three-dimensional display is set to 60 degrees. And placing each camera within a 60-degree range with the portrait as the center, and enabling the camera to be annularly aligned with a main body in a scene, namely a person, so that the visual angle coverage range of each camera can contain the trunk of the person. The camera for enlarging the face of the person is arranged, and the focal length of the camera is pulled in, so that the view angle of the camera can cover the face of the person and enlarge the face details as much as possible. The five color cameras and the depth camera are aligned, and then the internal parameters and the external parameters of the camera are obtained through a camera calibration algorithm: the internal parameters of the camera can be obtained through calibration of a Camera Calibrator toolkit of MATLAB; the external parameters of the camera obtain a rough solution through a PnP algorithm, and then an accurate solution is obtained through a BA algorithm.

The preprocessing module extracts the portrait part of five paths of color video streams captured by the color camera through a real-time matting algorithm based on a neural network, the algorithm uses two neural networks, the base network calculates a low-resolution result, the result is operated with high resolution on a selective patch by a second network, and the subsequent operation is only carried out on the portrait part.

The preprocessing module eliminates noise points existing in a depth image and reserves edge details of objects through a real-time denoising algorithm based on median filtering and space neighborhood filtering for four paths of depth video streams captured by a depth camera, and the specific operation comprises the following steps: if the gray level of a certain pixel of the current frame is not 0, the pixel is not processed, if the gray level of the pixel is 0, the pixel of the corresponding position of each frame image of the image frame group is taken out and put into an array, the pixels are inserted and sequenced, the median value is taken to replace the original pixel value, most of black holes are filled after median filtering, and as the difference of each frame is not far when a data set is acquired, some places are black holes in all frames of the frame group, a spatial neighbor filtering method is adopted at this time, and points with the pixel value which is not 0 are searched around the current pixel position to fill the black holes.

The input module can calculate the position of the pixel point under the pixel coordinate system (imaging plane) of the output viewpoint according to the fov of the required output view camera and the resolution of the output image. Therefore, for each pixel, a ray may be defined which takes the output viewpoint (new viewpoint) camera as the starting point and takes the direction from the starting point to the pixel.

The analysis module can find the corresponding depth value of each depth image pixel coordinate system for each pixel on the new viewpoint image through coordinate system conversion according to the camera internal and external parameter matrix obtained by calibration for the depth image corresponding to different viewpoints captured by each frame. The four groups of depth values are combined to obtain the boundary of the depth value corresponding to each pixel, boundary expansion is carried out to a certain extent to cope with errors, the boundary expansion value is set to be plus or minus 5cm, and the effective searching stepping range of the light corresponding to each pixel is obtained through rasterization.

The analysis module calculates a fusion weight w for a block of 8 x 8 pixel regions surrounding each ray for a point p on each ray starting from the start of the effective search step range for that ray _j J e {1,2,3,4}, to further reduce the effect of high variance noise on the algorithm. If the point p is located outside the depth image, then w will be _j =0, ignoring the effect of the depth image on the final TSDF fusion. If the point p is positioned at the edge of the depth image, firstly performing boundary expansion, and then calculating w according to the thought by combining expansion values _j This step may make the edge portion of the final viewpoint image smoother.

The pixel value corresponding to the point p calculated in the search module under each depth image pixel coordinate system is the depth value d _j . Subtracting d from z-coordinate in camera coordinate system _j Becomes signed distance function value s _j . The fused truncated-based signed distance function value is weighted, as shown in equation 3, where T is the truncated distance of the TSDF, which may be dependent on the final generated effect, and is typically set to 2cm. In particular, when s _j <at-T, let w _j =0, in order to avoid unnecessary computations. The iteration step length of the light is set to be 0.7s, the light is iterated forward according to the step length in the effective searching stepping range of the light until the sign of the s value changes positively and negatively, then an accurate root is found through multiple binary search, and the three-dimensional coordinate of the point is recorded.

The generation module obtains RGB values of pixels corresponding to each ray on the color Map of different view points through coordinate transformation for the obtained points, performs Shadow Map calculation under the view point of the color Map, and selects a 7*7 area around the pixels through PCF algorithm to perform sampling and averaging, so that PCF visibility weights of the pixels corresponding to the color Map of different view points are obtained. Solving the included angle between the normal line of the point and the view point of the color camera, taking the cosine value of the included angle as the mixed weight of the color map, multiplying the RGB value, the visibility weight value and the mixed weight value under each view point color map, and adding the values corresponding to the color map of the multiple view points to obtain the weighted RGB value, namely the corresponding pixel color on the final new view point image. The above operation is performed for each pixel on the new view image, and a new view image at a specific view angle is generated. The number of viewpoints of the three-dimensional display in the 60-degree viewing angle range is set to 30, that is, 30 new viewpoints are set in the 60-degree range, the above viewpoint generation algorithm is executed according to the pose of the 30 viewpoints, and 30 new viewpoint images are finally generated.

Because the time for generating a new video image is longer than the time for collecting and reading a video stream and a color stream video into a memory, the device adopts a frame buffer technology to reduce the total running time. The algorithm is realized through a highly parallel GPU architecture, so that the time consumed by the algorithm can be further reduced. After 30 new viewpoint images are obtained, the multi-viewpoint images are subjected to light field coding according to hardware design and arrangement of the three-dimensional display, the multi-viewpoint images after light field coding are obtained, and the multi-viewpoint images are imported into the three-dimensional display, so that the real-time generation and display of the light field are realized. Through testing, the real-time generation frame rate of the light field content meets more than 25FPS, and meets the requirement of real-time generation.

Fig. 5 illustrates a physical schematic diagram of an electronic device, as shown in fig. 5, which may include: processor 510, communication interface (Communications Interface) 520, memory 530, and communication bus 540, wherein processor 510, communication interface 520, memory 530 complete communication with each other through communication bus 540. Processor 510 may invoke logic instructions in memory 530 to perform a virtual view image generation method comprising:

determining a virtual viewpoint and an imaging plane of the virtual viewpoint based on the view angle information of the original viewpoint image;

Determining a search stepping range of light rays from the virtual viewpoint to each pixel point on an imaging plane based on the depth information of the original viewpoint image;

coloring the reconstruction points on each ray based on the color information of the original viewpoint image to generate a virtual viewpoint image.

Further, the logic instructions in the memory 530 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method of the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, the computer program product including a computer program, the computer program being storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of executing the methods provided by the above methods to perform a virtual viewpoint image generation method, the method comprising:

In still another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the methods provided above to execute a virtual viewpoint image generation method, the method comprising:

The apparatus embodiments described above are merely illustrative, wherein elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on such understanding, the foregoing technical solutions may be embodied essentially or in part in the form of a software product, which may be stored in a computer-readable storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the various embodiments or methods of some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A virtual viewpoint image generation method, characterized by comprising:

2. The method according to claim 1, wherein determining a search step range of light rays from the virtual viewpoint to each pixel point on the imaging plane based on the depth information of the original viewpoint image comprises:

3. The virtual viewpoint image generation method according to claim 1 or 2, wherein the searching each ray based on the search stepping range, determining a reconstruction point on each ray, comprises:

4. A virtual viewpoint image generating method according to claim 3, wherein the determining the objective distance function value of the point on each ray for each search step comprises:

5. The virtual-viewpoint-image generating method according to claim 1, wherein the coloring the reconstructed points on the respective rays based on the color information of the original viewpoint image, generates a virtual viewpoint image, comprises:

6. The virtual-viewpoint-image generating method of claim 1, wherein determining the original viewpoint image comprises:

acquiring an original color video stream and an original depth video stream;

and determining a color image corresponding to the same moment in the preprocessing color video stream and a depth image corresponding to the same moment in the preprocessing depth video stream as the original viewpoint image.

7. A virtual viewpoint image generating apparatus, comprising:

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the virtual viewpoint image generation method of any of claims 1 to 6 when the program is executed by the processor.

9. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the virtual viewpoint image generation method according to any of claims 1 to 6.

10. A computer program product comprising a computer program which, when executed by a processor, implements the virtual viewpoint image generation method of any one of claims 1 to 6.