WO2022022548A1 - Free viewpoint video reconstruction and playing processing method, device, and storage medium - Google Patents

Free viewpoint video reconstruction and playing processing method, device, and storage medium Download PDF

Info

Publication number
WO2022022548A1
WO2022022548A1 PCT/CN2021/108827 CN2021108827W WO2022022548A1 WO 2022022548 A1 WO2022022548 A1 WO 2022022548A1 CN 2021108827 W CN2021108827 W CN 2021108827W WO 2022022548 A1 WO2022022548 A1 WO 2022022548A1
Authority
WO
WIPO (PCT)
Prior art keywords
viewpoint
virtual
background
texture map
video frame
Prior art date
Application number
PCT/CN2021/108827
Other languages
French (fr)
Chinese (zh)
Inventor
王荣刚
蔡砚刚
顾嵩
盛骁杰
Original Assignee
阿里巴巴集团控股有限公司
北京大学深圳研究生院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司, 北京大学深圳研究生院 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2022022548A1 publication Critical patent/WO2022022548A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/271Image signal generators wherein the generated image signals comprise depth maps or disparity maps
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/111Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/111Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation
    • H04N13/117Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation the virtual viewpoint locations being selected by the viewers or determined by viewer tracking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/282Image signal generators for generating image signals corresponding to three or more geometrical viewpoints, e.g. multi-view systems

Definitions

  • the embodiments of this specification relate to the technical field of video processing, and in particular, to a method, device, and storage medium for free-view video reconstruction and playback processing.
  • Free viewpoint video is a technology that can provide a high degree of freedom viewing experience. Users can adjust the viewing angle through interactive operations during the viewing process, and watch from the free viewpoint they want to watch, which can greatly improve the viewing experience.
  • DIBR Depth Image Based Rendering
  • DIBR technology is mainly divided into the steps of selecting viewpoints, preprocessing, mapping, view fusion and postprocessing.
  • viewpoints there may be a situation that the part of the background texture in the reference viewpoint that is occluded by the foreground object is invisible in the reference viewpoint, but is visible in the virtual viewpoint. Therefore, after view fusion, there are still some unfilled hollow areas in the virtual viewpoint.
  • the embodiments of this specification provide a free-view video reconstruction and playback processing method, device, and storage medium, which can improve the quality of hole filling, thereby improving the image quality of the free-view video.
  • the embodiments of this specification provide a free-view video reconstruction method, including:
  • the video frame includes the original texture maps of multiple original viewpoints and the original depth maps of the corresponding viewpoints;
  • post-processing is performed on the void area in the texture map of the virtual viewpoint to obtain a reconstructed image of the virtual viewpoint.
  • the acquiring the background texture map and the background depth map of the viewpoint corresponding to the target video frame includes:
  • Temporal filtering is performed on the reference texture map sequence and the reference depth map sequence, respectively, to obtain a background texture map and a background depth map of the viewpoint corresponding to the target video frame.
  • the temporal filtering is performed on the reference texture map sequence and the reference depth map sequence, respectively, to obtain the background texture map and the background depth map of the viewpoint corresponding to the target video frame, including:
  • Temporal median filtering is performed on the pixels in the reference texture map sequence and the reference depth map sequence, respectively, to obtain a background texture map and a background depth map of the viewpoint corresponding to the target video frame.
  • the texture map of the virtual viewpoint is synthesized.
  • the acquiring the background texture map and the background depth map of the viewpoint corresponding to the target video frame includes:
  • Temporal filtering is performed on the reference texture map sequence and the reference depth map sequence, respectively, to obtain a background texture map and a background depth map of the selected corresponding original viewpoint.
  • the acquiring the background texture map and the background depth map of the viewpoint corresponding to the target video frame includes:
  • the background depth map of the corresponding viewpoint is acquired according to the background texture map in which the corresponding viewpoint does not have a foreground object in the field of view targeted by the target video frame.
  • the background texture map of the virtual viewpoint is used to perform post-filling post-processing on the hollow area in the texture map of the virtual viewpoint to obtain the reconstructed image of the virtual viewpoint, including:
  • the background texture map of the virtual viewpoint is used, and a joint bilateral filtering method is used to perform interpolation processing on the hollow area in the texture map of the virtual viewpoint to obtain a reconstructed image of the virtual viewpoint.
  • the method further includes:
  • Filtering is performed on the foreground edge in the texture map of the virtual viewpoint obtained after the post-processing of hole filling, so as to obtain the reconstructed image of the virtual viewpoint.
  • the embodiments of this specification also provide a free-view video playback processing method, including:
  • post-processing is performed on the void area in the texture map of the virtual viewpoint to obtain a reconstructed image of the virtual viewpoint.
  • the determining a virtual viewpoint includes at least one of the following:
  • the virtual viewpoint is determined based on the virtual viewpoint position information contained in the video stream.
  • the method further includes:
  • the virtual information image and the reconstructed image of the virtual viewpoint are synthesized and displayed.
  • the acquiring the virtual information image generated based on the augmented reality special effect input data of the virtual rendering target object includes:
  • the embodiments of this specification also provide a free-viewpoint video reconstruction device, including:
  • a video frame acquisition unit adapted to acquire a free-view video frame, the video frame including the original texture maps of multiple original viewpoints and the original depth maps of the corresponding viewpoints;
  • a target video frame determination unit adapted to obtain the target video frame corresponding to the virtual viewpoint
  • a virtual viewpoint texture map synthesis unit adapted to use the original texture maps and corresponding original depth maps of multiple original viewpoints in the target video frame to synthesize the texture maps of the virtual viewpoints;
  • a virtual viewpoint background texture map synthesis unit adapted to obtain the background texture map and background depth map of the viewpoint corresponding to the target video frame, and obtain the background texture of the virtual viewpoint according to the background texture map and background depth map of the corresponding viewpoint picture;
  • the post-processing unit is adapted to use the background texture map of the virtual view point to perform post-processing of hole filling on the hole area in the texture map of the virtual view point to obtain a reconstructed image of the virtual view point.
  • the embodiments of this specification also provide a free-viewpoint video playback processing device, including:
  • a virtual viewpoint determination unit adapted to determine a virtual viewpoint
  • a target video frame determination unit adapted to determine a target video frame according to the virtual viewpoint
  • a virtual viewpoint texture map synthesis unit adapted to use the original texture maps and corresponding original depth maps of multiple original viewpoints in the target video frame to synthesize the texture maps of the virtual viewpoints;
  • a virtual viewpoint background texture map synthesis unit adapted to obtain the background texture map and background depth map of the viewpoint corresponding to the target video frame, and obtain the background texture of the virtual viewpoint according to the background texture map and background depth map of the corresponding viewpoint picture;
  • the post-processing unit is adapted to use the background texture map of the virtual view point to perform post-processing of hole filling on the hole area in the texture map of the virtual view point to obtain a reconstructed image of the virtual view point.
  • Embodiments of the present specification further provide an electronic device, including a memory and a processor, wherein the memory stores computer instructions that can be executed on the processor, wherein the processor executes the aforementioned computer instructions when the processor executes the computer instructions.
  • the embodiments of this specification also provide an electronic device, including: a communication component, a processor, and a display component, wherein:
  • the communication component adapted to obtain free-view video
  • the display component is adapted to display the reconstructed image of the virtual viewpoint obtained after processing by the processor.
  • inventions of the present specification further provide a computer-readable storage medium on which computer instructions are stored, wherein, when the computer instructions are executed, the steps of the methods described in any of the foregoing embodiments are executed.
  • the solution of the embodiment of the present specification wherein the complete background texture map of the virtual viewpoint is obtained by reconstruction, and the texture map of the virtual viewpoint corresponding to the synthesized target video frame is subjected to hole filling post-processing.
  • the scheme of filtering the texture based on the filter can avoid the artifacts and blurring caused by incomplete hole filling, improve the quality of hole filling, and then improve the image quality of free-view video.
  • the original texture maps and corresponding original depth maps of multiple original viewpoints in the target video frame are selected according to preset rules as reference texture maps and reference depth maps, respectively, for synthesizing the
  • the texture map of the virtual viewpoint can reduce the amount of data processing in the video reconstruction process and improve the video reconstruction efficiency.
  • the background texture map and the background depth map of the viewpoint corresponding to the target video frame are obtained.
  • the reference texture map sequence and reference depth map sequence of the viewpoint corresponding to the target video frame that is, the texture information and depth information in the temporal domain of the viewpoint corresponding to the target video frame, not only based on the texture in the spatial domain of the target video frame. Therefore, the integrity and authenticity of the obtained background texture map and background depth map can be improved, and the artifacts and blurring caused by foreground occlusion can be avoided, thereby improving the quality of image hole filling.
  • FIG. 1 is a schematic diagram of a specific application system of a free-view video display in an embodiment of this specification
  • FIG. 2 is a schematic diagram of an interactive interface of a terminal device in an embodiment of this specification
  • FIG. 3 is a schematic diagram of a setting mode of a collection device in an embodiment of the present specification
  • FIG. 4 is a schematic diagram of another terminal device interaction interface in the embodiment of this specification.
  • FIG. 5 is a schematic diagram of a free-viewpoint video data generation process in an embodiment of the present specification
  • 6 is a schematic diagram of the generation and processing of a kind of 6DoF video data in the embodiment of this specification;
  • FIG. 7 is a schematic structural diagram of a data header file in an embodiment of the present specification.
  • FIG. 8 is a schematic diagram of a user side processing 6DoF video data in an embodiment of the present specification
  • FIG. 9 is a flowchart of a free-viewpoint video reconstruction method in an embodiment of the present specification.
  • FIG. 10 is a schematic diagram of a free-viewpoint video reconstruction method for a specific application scenario in an embodiment of the present specification
  • FIG. 11 is a flowchart of a free-viewpoint video playback processing method in an embodiment of the present specification
  • FIG. 13 to 17 are schematic diagrams of display interfaces of an interactive terminal in the embodiments of this specification.
  • FIG. 18 is a schematic structural diagram of a device for free-view video reconstruction in an embodiment of the present specification.
  • 19 is a schematic structural diagram of a free-viewpoint video playback processing device in an embodiment of the present specification.
  • 21 is a schematic structural diagram of another electronic device in the embodiment of this specification.
  • FIG. 22 is a schematic structural diagram of a video processing system in an embodiment of the present specification.
  • a specific application system for free-view video display in an embodiment of the present invention may include a collection system 11 of multiple collection devices, a server 12 , and a display device 13 , wherein the collection system 11 can collect images of the area to be viewed.
  • the acquisition system 11 or the server 12 can process the acquired synchronized multiple texture maps to generate multi-angle free viewing angle data that can support the display device 13 to perform virtual viewpoint switching.
  • the display device 13 can display reconstructed images generated based on multi-angle free viewing angle data, the reconstructed images correspond to virtual viewpoints, and can display reconstructed images corresponding to different virtual viewpoints according to user instructions, and switch the viewing position and viewing angle.
  • the process of performing image reconstruction to obtain a reconstructed image may be implemented by the display device 13, or may be implemented by a device located in a content delivery network (Content Delivery Network, CDN) by means of edge computing.
  • CDN Content Delivery Network
  • the user can view the area to be viewed through the display device 13 , and in this embodiment, the area to be viewed is a basketball court. As mentioned earlier, the viewing position and viewing angle can be switched.
  • the user can swipe across the screen to switch virtual viewpoints.
  • the virtual viewpoint for viewing can be switched.
  • the position of the virtual viewpoint before sliding may be VP 1
  • the position of the virtual viewpoint may be VP 2 .
  • the reconstructed image displayed on the screen may be as shown in FIG. 4 .
  • the reconstructed image may be obtained by performing image reconstruction based on multi-angle free viewing angle data generated from images collected by multiple collection devices in an actual collection situation.
  • the image viewed before switching may also be a reconstructed image.
  • the reconstructed images may be frame images in the video stream.
  • the manner of switching the virtual viewpoint according to the user's instruction may be various, which is not limited here.
  • the viewpoint can be represented by coordinates of 6 degrees of freedom (DoF), wherein the spatial position of the viewpoint can be represented as (x, y, z), and the viewing angle can be represented as three rotation directions Accordingly, based on the coordinates of 6 degrees of freedom, a virtual viewpoint, including position and viewing angle, can be determined.
  • DoF degrees of freedom
  • the multi-angle free viewing angle data may include depth map data, which is used to provide third-dimensional information outside the plane image. Compared with other implementations, such as providing three-dimensional information through point cloud data, the data volume of the depth map data is smaller.
  • the switching of the virtual viewpoints may be performed within a certain range, which is a multi-angle free viewing angle range. That is, within the multi-angle free viewing angle range, the position and viewing angle of the virtual viewpoint can be switched arbitrarily.
  • the multi-angle free viewing angle range is related to the arrangement of the acquisition device.
  • the wider the shooting coverage of the acquisition device the larger the multi-angle free viewing angle range.
  • the quality of the picture displayed by the terminal device is related to the number of collection devices. Generally, the more collection devices are set, the fewer empty areas in the displayed picture.
  • the range of multi-angle free viewing angles is related to the spatial distribution of the acquisition devices.
  • the range of multi-angle free viewing angles and the interaction mode with the display device on the terminal side can be set based on the spatial distribution relationship of the collection devices.
  • texture map acquisition and depth map calculation are required, including three main steps, namely Multi-camera Video Capturing, camera internal and external parameter calculation (Camera Parameter Estimation), and Depth Map Calculation.
  • Multi-camera Video Capturing it is required that the video captured by each camera can be aligned at the frame level.
  • the texture image (Texture Image) can be obtained through the video acquisition of multiple cameras;
  • the camera parameters (Camera Parameter) can be obtained through the calculation of the internal and external parameters of the camera, and the camera parameters can include the internal parameter data of the camera and the external parameter data;
  • the depth map Through the depth map calculation, The depth map, multiple synchronized texture maps, depth maps and camera parameters corresponding to the viewing angle can be obtained to form 6DoF video data.
  • the texture map collected from multiple cameras, the camera parameters of all cameras, and the depth map of each camera are obtained.
  • These three parts of data can be referred to as data files in the multi-angle free-view video data, and can also be referred to as 6DoF video data. Because of these data, the client can generate virtual viewpoints according to the virtual 6 degrees of freedom (DoF) position, thereby providing a 6DoF video experience.
  • DoF degrees of freedom
  • 6DoF video data and indicative data can reach the user side through compression and transmission, and the user side can obtain the user side 6DoF expression according to the received data, that is, the aforementioned 6DoF video data and metadata.
  • the indicative data may also be called metadata, wherein the video data includes texture map and depth map data of each viewpoint corresponding to multiple cameras, and the texture map and depth map can be spliced according to certain splicing rules or splicing modes , forming a stitched image.
  • Metadata can be used to describe the data pattern of 6DoF video data, specifically can include: stitching pattern metadata (Stitching Pattern metadata), used to indicate the pixel data of multiple texture maps and depth map data in the stitched image. Storage rules; edge protection metadata (Padding pattern metadata), which can be used to indicate the way of edge protection in stitched images, and other metadata (Other metadata).
  • stitching Pattern metadata used to indicate the pixel data of multiple texture maps and depth map data in the stitched image.
  • Storage rules edge protection metadata (Padding pattern metadata), which can be used to indicate the way of edge protection in stitched images
  • other metadata Other metadata
  • the user side obtains 6DoF video data, which includes camera parameters, stitched images (texture map and depth map), and description metadata (metadata), in addition to user-end interactive behavior data .
  • 6DoF video data which includes camera parameters, stitched images (texture map and depth map), and description metadata (metadata), in addition to user-end interactive behavior data .
  • DIBR Depth Image-Based Rendering
  • the user side can use Depth Image-Based Rendering (DIBR, Depth Image-Based Rendering) for 6DoF rendering, so as to generate a virtual viewpoint image at a specific 6DoF position generated according to user behavior, that is, according to the user's behavior.
  • Indicate determine the virtual viewpoint of the 6DoF position corresponding to the indication.
  • the effective texture information around the hole is usually used for filtering.
  • the effect of the actual hole repair is not ideal, and it is easy to produce artifacts and blurring, resulting in the reconstructed free-view video. image quality is poor.
  • the embodiments of this specification provide a free-view video reconstruction scheme, by reconstructing the complete background texture map of the virtual viewpoint, to perform hole filling post-processing on the texture map of the virtual viewpoint corresponding to the synthesized target video frame Compared with the scheme that only uses the texture around the hole for filtering, it can avoid the artifacts and blurring caused by incomplete hole filling, improve the quality of hole filling, and then improve the image quality of free-view video.
  • Free viewpoint video reconstruction is performed as follows:
  • S91 Acquire a free-view video frame, where the video frame includes original texture maps of multiple original viewpoints and original depth maps of corresponding viewpoints that are synchronized.
  • a free-view video frame may include synchronized original texture maps of multiple original viewpoints and original depth maps of corresponding viewpoints.
  • a free-view video frame may be obtained based on the aforementioned 6DoF video data, where the corresponding viewing angle is also the corresponding viewpoint.
  • a free-view video stream can be downloaded through a network, or a free-view video frame can be obtained from a locally stored free-view video file.
  • the virtual viewpoint may be determined according to user interaction behavior or preset. If it is determined based on the user interaction behavior, the virtual viewpoint position at the corresponding interaction moment can be determined by acquiring the trajectory data corresponding to the user interaction operation, and the virtual viewpoint can be determined.
  • the location information of the virtual viewpoint corresponding to the corresponding video frame may also be preset on the server (such as the server or the cloud), and the set virtual viewpoint is transmitted in the header file of the free viewpoint video.
  • the location information of the viewpoint may also be preset on the server (such as the server or the cloud), and the set virtual viewpoint is transmitted in the header file of the free viewpoint video.
  • the corresponding video frame in the free viewpoint video corresponding to the virtual viewpoint may be determined as the target video frame.
  • the original texture maps and corresponding original depth maps of all viewpoints included in the target video frame may be used to synthesize the texture map of the virtual viewpoint.
  • the original texture map and corresponding original depth of some viewpoints in the target video frame can also be selected based on the position information of the virtual viewpoint. map for synthesizing the texture map of the virtual viewpoint.
  • the original texture map and the corresponding original depth map of the corresponding original viewpoint in the target video frame may be selected according to preset rules, and then the selected original texture map and corresponding original viewpoint of the corresponding original viewpoint are used.
  • the original depth map is synthesized, and the texture map of the virtual viewpoint is synthesized.
  • an original texture map and a corresponding original depth map of a corresponding original viewpoint satisfying a preset distance condition from the virtual viewpoint may be selected based on the spatial positional relationship between the virtual viewpoint and the positions of each original viewpoint.
  • an original texture map and a corresponding original depth map of a corresponding original viewpoint that satisfy a preset spatial position relationship with the virtual viewpoint and satisfy a preset number threshold may be selected.
  • S94 Acquire a background texture map and a background depth map of a viewpoint corresponding to the target video frame, and acquire a background texture map of the virtual viewpoint according to the background texture map and background depth map of the corresponding viewpoint.
  • the background texture map and the background depth map of the viewpoint corresponding to the target video frame there are various ways to obtain the background texture map and the background depth map of the viewpoint corresponding to the target video frame.
  • the time domain filtering method can be used, or the pre-collection method can be used.
  • the specific implementation will be described in detail later in combination with specific application scenarios.
  • virtual viewpoint synthesis may be performed in the same manner as in step S93 to obtain the background texture map of the virtual viewpoint.
  • post-processing of hole filling may be performed on the background texture map of the virtual viewpoint, so as to enhance the image quality of the background texture map of the virtual viewpoint.
  • the method of joint bilateral filtering can be used to perform post-processing of hole filling on the background texture map of the virtual viewpoint.
  • background texture maps and background depth maps of multiple viewpoints may be used, wherein the selected viewpoint may be higher than the density of the viewpoint corresponding to the target video frame. bigger.
  • Using the background texture map of the virtual viewpoint there may be various ways to perform post-filling and post-processing on the hollow area in the texture map of the virtual viewpoint.
  • the texture map of the virtual viewpoint and the background texture map of the virtual viewpoint may be compared pixel by pixel, and it is determined that the background area in the texture map of the virtual viewpoint is inconsistent with the background texture map of the virtual viewpoint. , or determine a pixel whose pixel value difference is greater than a preset threshold, and modify the value of the corresponding pixel in the texture map of the virtual viewpoint to the value of the corresponding pixel in the background texture map of the virtual viewpoint.
  • a joint bilateral filtering method may be used to perform interpolation processing on the hollow area in the texture map of the virtual viewpoint to obtain the reconstructed image of the virtual viewpoint.
  • a special joint bilateral filter may be used for implementation, or a corresponding software execution logic may be invoked for implementation.
  • the background texture map of the virtual viewpoint is used as a guide image, and a guided filtering method is used to fill the hollow area in the texture map of the virtual viewpoint to obtain a reconstructed image of the virtual viewpoint .
  • filtering methods such as bilateral filtering method, median smoothing filtering method, etc.
  • the texture map of the virtual viewpoint is subjected to hole filling post-processing based on the input background texture map of the virtual viewpoint, No more examples.
  • step S95 after the hole filling and post-processing is performed on the hole region in the texture map of the virtual viewpoint, in order to further improve the quality of the reconstructed image, the foreground edge in the texture map of the virtual viewpoint obtained after the hole filling and post-processing can also be processed.
  • a filtering process is performed to obtain a reconstructed image of the virtual viewpoint.
  • the texture map of the virtual viewpoint corresponding to the synthesized target video frame is subjected to hole-filling post-processing.
  • the solution can avoid artifacts and blurring caused by incomplete hole filling, improve the quality of hole filling, and then improve the image quality of free-view video.
  • Example 1 Select a reference texture map sequence and a reference depth map sequence of a viewpoint corresponding to the target video frame, and then obtain a background texture map and a background depth map of the viewpoint corresponding to the target video frame.
  • Manner 1 For any original viewpoint among all original viewpoints corresponding to the original texture map and the original depth map included in the target video frame, obtain the corresponding reference texture map sequence and reference depth map sequence.
  • the target video frame includes the original texture maps and corresponding original depth maps of 30 viewpoints, for these 30 viewpoints, corresponding reference texture map sequences and reference depth map sequences are obtained respectively.
  • a reference texture map sequence and a corresponding reference depth map sequence of the original viewpoint selected for synthesizing the texture map of the virtual viewpoint are used.
  • a reference texture map sequence and a reference depth map sequence corresponding to the selected original viewpoint can be obtained, and temporal filtering is performed on the reference texture map sequence and the reference depth map sequence respectively to obtain the selected corresponding original viewpoint background texture map and background depth map.
  • the reference texture map sequence and the reference depth map sequence of the original viewpoint can reduce the amount of data operation and improve the generation efficiency of the background texture map of the virtual viewpoint.
  • the selected reference texture map sequence and reference depth map sequence may be selected from a video clip independent of the target video frame, or may be selected from a video clip including the target video frame.
  • temporal filtering may be performed on the reference texture map sequence and the reference depth map sequence, respectively, to obtain the corresponding viewpoint of the target video frame.
  • Background texture map and background depth map may be performed on the reference texture map sequence and the reference depth map sequence, respectively, to obtain the corresponding viewpoint of the target video frame.
  • the average filtering method can be used, and more specifically, arithmetic average filtering, median value average filtering, moving average filtering and other methods can be used.
  • a median filter method can be used. Specifically, temporal median filtering may be performed on the pixels in the reference texture map sequence and the pixels in the corresponding reference depth map sequence, respectively, to obtain the background texture map and the background depth map of the viewpoint corresponding to the target video frame.
  • a video frame sequence from time t 1 to t 2 can be selected from a video X of the same viewpoint as the target depth map, as the reference texture map sequence for this time period, and the reference texture map sequence corresponds to
  • the reference depth map sequence of the reference depth map sequence, the sampling values of the corresponding pixel positions in the reference texture map sequence and the reference depth map sequence can be arranged according to the size, and the median value is taken as the valid value of the corresponding pixel position of the background texture map and the background depth map respectively.
  • the number of images in the reference texture map sequence and the corresponding reference depth map sequence sampled from time t 1 to t 2 should be odd, for example, take 3 consecutive frames, 5 frames, 7 frames, etc.
  • the formula can be expressed as follows:
  • P(x t ) represents any pixel in the background texture map or background depth map
  • I x,i represents the same pixel as P(x t ) in the reference texture map sequence or reference depth map sequence from t 1 to t 2
  • the sequence of pixel values of the position, med means to take the intermediate value in I x, i .
  • time-domain filtering methods can also be used, for example, limiting filtering, first-order lag filtering and other methods can also be used.
  • Example 2 Pre-collect the background texture map of the target video frame corresponding to the viewpoint without the foreground object, and then corresponding to the background depth map of the viewpoint.
  • a background texture map in which there is no foreground object at the corresponding viewpoint in the field of view targeted by the target video frame may be pre-collected, and according to the background texture map in which there is no foreground object at the corresponding viewpoint in the field of view targeted by the target video frame Texture map to obtain the background depth map of the corresponding viewpoint.
  • the background in the image is a fixed object relative to the acquisition viewpoint, based on this, in some embodiments of this specification, when there is no foreground object at the corresponding viewpoint in the field of view targeted by the target video frame, If the texture image is pre-collected at the corresponding viewpoint, the texture image contains only background texture information. Therefore, the texture image pre-collected at the corresponding viewpoint can be used as the background texture image of the corresponding viewpoint, and then according to the background texture image Texture map, the background depth map of the corresponding viewpoint can be obtained.
  • one or more images without foreground objects can be collected at the corresponding viewpoint before the game starts. If one image is collected, this image can be directly used as the background texture map. If multiple images are collected, the multiple images can be used as a reference texture map sequence, and time domain filtering can be performed to obtain the background texture map of the corresponding viewpoint. Correspondingly, through depth calculation, the reference depth map of the collected reference texture map can be estimated. If the reference depth map corresponding to a single reference texture map can be directly used as the background depth map, for multiple reference texture map sequences, it can be obtained. The corresponding reference depth map sequence, and then through the temporal filtering operation, the corresponding background depth map can be obtained.
  • a plurality of free-viewpoint video frames I can be obtained first, wherein any free-viewpoint video frame I includes a synchronized
  • the background texture map Tb and the background depth map Db of the viewpoint corresponding to the target video frame can be obtained, and based on the background texture map Tb and the background depth map Db of the corresponding viewpoint, the virtual viewpoint reconstruction can obtain the For the background texture map Tb0 of the virtual viewpoint, the texture map T 0 of the virtual viewpoint is post-processed by filling holes by using the background texture map Tb0 of the virtual viewpoint to obtain the final free viewpoint video reconstruction image Te.
  • S111 Determine a virtual viewpoint, and determine a target video frame according to the virtual viewpoint.
  • the virtual viewpoint may be generated in real time during the playback of the free viewpoint video, or may be preset. More specifically, the virtual viewpoint may be determined in response to a user's gesture interaction operation. For example, the virtual viewpoint at the corresponding interaction moment is determined by acquiring the trajectory data corresponding to the user interaction operation.
  • the position information of the virtual viewpoint corresponding to the corresponding video frame can be preset on the server (such as the server or the cloud), and the set position information of the virtual viewpoint can be transmitted in the header file of the free-view video stream. The virtual viewpoint is determined based on the virtual viewpoint position information contained in the video stream.
  • the corresponding frame moment and the video frame at the corresponding frame moment may be determined as the target video frame according to the virtual viewpoint.
  • the original texture map of some original viewpoints in the target video frame and the original texture map of the corresponding viewpoint can be selected according to preset rules.
  • the depth map is combined and rendered, and the texture map of the virtual viewpoint is synthesized.
  • original texture maps and original depth maps corresponding to 2 to N viewpoints closest to the virtual viewpoint position in the target video frame may be selected.
  • N is the number of original texture images in the target video frame, that is, the number of acquisition devices corresponding to the original texture images.
  • the quantitative relationship value may be fixed or variable.
  • S113 Acquire a background texture map and a background depth map of a viewpoint corresponding to the target video frame, and acquire a background texture map of the virtual viewpoint according to the background texture map and background depth map of the corresponding viewpoint.
  • step S113 For the method of acquiring the background texture map and the background depth map of the viewpoint corresponding to the target video frame, and the specific implementation method of acquiring the background texture map of the virtual viewpoint according to the background texture map and the background depth map of the corresponding viewpoint, please refer to The introduction of step S113 and the specific implementation manner in the foregoing embodiment will not be described here again.
  • any one or more filtering methods such as bilateral filtering, joint bilateral filtering, guided filtering, etc. may be used to perform hole filling and post-processing on the hole region in the texture map of the virtual view point, and obtain the virtual view point. Rebuild the image.
  • the texture maps of the corresponding virtual viewpoints are based on the texture maps of the synthesized virtual viewpoints corresponding to the original viewpoints
  • the reference texture map and the reference depth map are obtained by temporal filtering. Therefore, the background texture map of the virtual viewpoint contains stable and complete background texture information.
  • the post-processing of hole filling can improve the reconstructed image quality of the virtual viewpoint.
  • the position of the camera can be configured through a specific viewpoint configuration algorithm or system.
  • the three-dimensional space information of the field of view can be obtained, the number of selectable viewpoints, the internal and external parameters of the camera (including the horizontal field of view, vertical field of view and other parameters of the camera), matching and matching according to the preset configuration model.
  • the operation can output the suggested camera arrangement and the corresponding camera position.
  • AR Augmented Reality
  • the implantation of AR special effects can be implemented in the following manner:
  • certain objects in the image of the free-view video may be determined as virtual rendering target objects based on certain indication information, and the indication information may be generated based on user interaction, or may be based on certain preset trigger conditions or a third party. command is obtained.
  • the virtual rendering target object in the reconstructed image of the virtual viewpoint may be acquired in response to the special effect generating the interactive control instruction.
  • S122 Acquire a virtual information image generated based on the augmented reality special effect input data of the virtual rendering target object.
  • the implanted AR special effects are presented in the form of virtual information images.
  • the virtual information image may be generated based on augmented reality special effect input data of the target object.
  • a virtual information image generated based on the augmented reality special effect input data of the virtual rendering target object may be acquired.
  • the virtual information image corresponding to the virtual rendering target object may be generated in advance, or may be generated immediately in response to the special effect generation instruction.
  • a virtual information image matching the position of the virtual rendering target object can be obtained based on the position of the virtual rendering target object obtained by 3D calibration in the reconstructed image, so that the obtained virtual information image can be matched with the virtual information image.
  • the position of the virtual rendering target object in the three-dimensional space is more matched, and the displayed virtual information image is more in line with the real state in the three-dimensional space, so the displayed composite image is more realistic and vivid, and the user's visual experience is enhanced.
  • a virtual information image corresponding to the target object may be generated according to a preset special effect generation method based on the augmented reality special effect input data of the virtual rendering target object.
  • the augmented reality special effect input data of the target object may be input into a preset three-dimensional model, and the output matches the virtual rendering target object based on the position of the virtual rendering target object obtained by the three-dimensional calibration in the image.
  • the augmented reality special effects input data of the virtual rendering target object can be input into a preset machine learning model, and the position of the virtual rendering target object in the image obtained based on the three-dimensional calibration can be output and the same as that of the virtual rendering target object.
  • the virtual information image and the reconstructed image of the virtual viewpoint can be synthesized and displayed in various ways, and two specific implementation examples are given below:
  • Example 1 The virtual information image and the corresponding reconstructed image are fused to obtain a fused image, and the fused image is displayed;
  • Example 2 The virtual information image is superimposed on the corresponding reconstructed image to obtain a superimposed composite image, and the superimposed composite image is displayed.
  • the obtained composite image may be displayed directly, or the obtained composite image may be inserted into a video stream to be played for playback and display.
  • the fused image may be inserted into the video stream to be played for display.
  • the free viewpoint video may include a special effect display identifier.
  • the superimposed position of the virtual information image in the image of the virtual viewpoint may be determined based on the special effect display identifier, and then the virtual information image may be placed in the image of the virtual viewpoint. The determined superposition position is displayed in superposition.
  • the interactive terminal T1 plays the video in real time.
  • the video frame P1 is displayed.
  • the video frame P2 displayed by the interactive terminal includes a plurality of special effects display identifiers such as the special effect display identifier I1.
  • the video frame P2 is represented by an inverted triangle symbol pointing to the target object, such as Figure 14. It can be understood that, the special effect display logo may also be displayed in other manners.
  • the terminal user touches and clicks on the special effect display identifier I1, then the system automatically acquires the virtual information image corresponding to the special effect display identifier I1, and superimposes the virtual information image on the video frame P3 and displays it in the video frame P3, as shown in FIG.
  • the position of the site where Q1 stands is the center, and a three-dimensional ring R1 is rendered.
  • the end user touches and clicks the special effect display identifier I2 in the video frame P3, and the system automatically acquires the virtual information image corresponding to the special effect display identifier I2, and displays the virtual information image in a superimposed manner.
  • the hit rate information display board M0 displays the number position, name and hit rate information of the target object, namely the athlete Q2.
  • the end user can continue to click other special effect display signs displayed in the video frame to watch the video showing the AR special effect corresponding to each special effect display sign.
  • the embodiments of this specification also provide a corresponding free-viewpoint video reconstruction apparatus.
  • the free-viewpoint video reconstruction apparatus 180 may include: a video frame obtaining unit 181 , a target video frame The determination unit 182, the virtual viewpoint texture map synthesis unit 183, the virtual viewpoint background texture map synthesis unit 184 and the post-processing unit 185, wherein:
  • the video frame obtaining unit 181 is adapted to obtain a free-view video frame, the video frame including the original texture maps of multiple original viewpoints and the original depth maps of the corresponding viewpoints;
  • the target video frame determining unit 182 is adapted to obtain the target video frame corresponding to the virtual viewpoint;
  • the virtual viewpoint texture map synthesis unit 183 is adapted to use the original texture maps and corresponding original depth maps of multiple original viewpoints in the target video frame to synthesize the texture maps of the virtual viewpoints;
  • the virtual viewpoint background texture map synthesis unit 184 is adapted to obtain the background texture map and background depth map of the viewpoint corresponding to the target video frame, and obtain the virtual viewpoint according to the background texture map and background depth map of the corresponding viewpoint The background texture map of ;
  • the post-processing unit 185 is adapted to use the background texture map of the virtual viewpoint to perform post-processing for filling voids in the texture map of the virtual viewpoint to obtain a reconstructed image of the virtual viewpoint.
  • the complete background texture map of the virtual viewpoint obtained by reconstruction is used to perform hole filling post-processing on the texture map of the virtual viewpoint corresponding to the synthesized target video frame.
  • the scheme of filtering the texture based on the filter can avoid the artifacts and blurring caused by incomplete hole filling, improve the quality of hole filling, and then improve the image quality of free-view video.
  • each unit in the virtual viewpoint video reconstruction apparatus may be implemented by using the specific method examples and specific manners of the corresponding steps in the aforementioned free viewpoint video reconstruction method.
  • each unit in the virtual viewpoint video reconstruction apparatus may be implemented by using the specific method examples and specific manners of the corresponding steps in the aforementioned free viewpoint video reconstruction method.
  • the embodiments of this specification also provide a corresponding free-viewpoint video playback processing apparatus.
  • the free-viewpoint video playback The processing device 190 may include: a virtual viewpoint determination unit 191, a target video frame determination unit 192, a virtual viewpoint texture map synthesis unit 193, a virtual viewpoint background texture map synthesis unit 194, and a post-processing unit 195, wherein:
  • a target video frame determining unit 192 adapted to determine a target video frame according to the virtual viewpoint
  • the virtual viewpoint texture map synthesizing unit 193 is adapted to use the original texture maps and corresponding original depth maps of multiple original viewpoints in the target video frame to synthesize the texture maps of the virtual viewpoints;
  • the virtual viewpoint background texture map synthesis unit 194 is adapted to obtain the background texture map and background depth map of the viewpoint corresponding to the target video frame, and obtain the background texture map and background depth map of the corresponding viewpoint according to the background texture map and background depth map of the virtual viewpoint. texture map;
  • the post-processing unit 195 is adapted to use the background texture map of the virtual viewpoint to perform post-processing for filling voids in the texture map of the virtual viewpoint to obtain a reconstructed image of the virtual viewpoint.
  • the texture map of the virtual viewpoint corresponding to the synthesized target video frame is subjected to hole-filling post-processing.
  • the scheme of filtering the texture based on the filter can avoid the artifacts and blurring caused by incomplete hole filling, improve the quality of hole filling, and then improve the image quality of free-view video.
  • each unit in the virtual viewpoint video playback processing device can be implemented by using the specific method examples and specific manners of the corresponding steps in the aforementioned free viewpoint video reconstruction method, and details can be found in the foregoing embodiments.
  • each specific unit such as the virtual viewpoint video reconstruction apparatus, the virtual viewpoint video playback processing apparatus, etc. may be implemented by software, hardware, or a combination of software and hardware.
  • the electronic device 200 may include a memory 201 and a processor 202 .
  • computer instructions running on the processor 202 wherein, when the processor executes the computer instructions, the steps of the method described in any of the foregoing embodiments can be performed.
  • the electronic device may also include other electronic components or assemblies.
  • the electronic device 210 may include a communication component 211, a processor 212, and a display component 213, wherein:
  • the communication component 211 is adapted to obtain free-view video
  • the processor 212 is adapted to execute the steps of the method in any of the foregoing embodiments;
  • the display component 213 is adapted to display the reconstructed image of the virtual viewpoint obtained after processing by the processor.
  • the display component 213 may specifically be one or more of a display, a touch screen, a projector, and the like.
  • the communication component 211 and the display component 213 may be components disposed inside the electronic device 210, or may be external devices connected through expansion components such as expansion interfaces, docking stations, and expansion cables.
  • the processor 212 can use a central processing unit (Central Processing Unit, CPU) (such as a single-core processor, a multi-core processor), a CPU group, a graphics processing unit (Graphics Processing Unit, GPU), artificial intelligence (Artificial Intelligence, AI) chip, Field Programmable Gate Array (Field Programmable Gate Array, FPGA) chip, etc. any one or more of them are implemented collaboratively.
  • CPU Central Processing Unit
  • CPU group such as a single-core processor, a multi-core processor
  • GPU graphics processing unit
  • artificial intelligence Artificial Intelligence, AI
  • Field Programmable Gate Array Field Programmable Gate Array, FPGA
  • the memory, the processor, the communication component and the display component in the electronic device may communicate through a bus network.
  • the video processing system A0 includes a collection array A1 composed of multiple collection devices, a data processing device A2, a server cluster A3 in the cloud, a playback control device A4, a playback terminal A5 and an interactive terminal A6.
  • each acquisition device in the acquisition array A1 can be placed in a fan shape at different positions in the on-site acquisition area, and can synchronously acquire video data streams from corresponding angles in real time.
  • the collection device may also be arranged in the ceiling area of the basketball stadium, on the basketball hoop, and the like.
  • the collection devices can be arranged and distributed along a straight line, a fan shape, an arc line, a circle or an irregular shape.
  • the specific arrangement can be set according to one or more factors such as the specific on-site environment, the number of acquisition devices, the characteristics of the acquisition devices, and imaging effect requirements.
  • the collection device may be any device with a camera function, such as a common camera, a mobile phone, a professional camera, and the like.
  • the data processing device A2 can be placed in a non-acquisition area on site, and can be regarded as an on-site server.
  • the data processing device A2 may send a stream pull instruction to each acquisition device in the acquisition array A1 through a wireless local area network, respectively, and each acquisition device in the acquisition array A1 will obtain a stream based on the stream pull instruction sent by the data processing device A2.
  • the video data stream is transmitted to the data processing device A3 in real time.
  • each acquisition device in the acquisition array A1 can transmit the obtained video data stream to the data processing device A2 in real time through the switch A7.
  • the acquisition array A1 and the switch A7 together form an acquisition system.
  • the data processing device A2 When the data processing device A2 receives the video frame interception instruction, it intercepts the video frame at the specified frame moment from the received multi-channel video data stream to obtain frame images of multiple synchronized video frames, and uses the obtained specified Multiple synchronized video frames at frame moments are uploaded to the server cluster A3 in the cloud.
  • the server cluster A3 in the cloud uses the received original texture maps of multiple synchronized video frames as an image combination, determines the parameter data corresponding to the image combination and the original depth map corresponding to each original texture map in the image combination, and Based on the corresponding parameter data of the image combination, the pixel data of the texture map and the depth data of the corresponding depth map in the image combination, image stitching is performed based on the acquired virtual viewpoint to obtain corresponding multi-angle free-view video data.
  • the server can be placed in the cloud, and in order to process data in parallel more quickly, a server cluster A3 in the cloud can be composed of multiple different servers or server groups according to different data processed.
  • the cloud server cluster A3 may include: a first cloud server A31, a second cloud server A32, a third cloud server A33, and a fourth cloud server A34.
  • the first cloud server A31 can be used to determine the corresponding parameter data of the image combination;
  • the second cloud server A32 can be used to determine the estimated depth map of the original texture map of each viewpoint in the image combination and perform depth map correction processing
  • the third cloud server A33 can be based on the position information of the virtual viewpoint, based on the corresponding parameter data of the image combination, the texture map and the depth map of the image combination, use the virtual viewpoint reconstruction based on the depth map (Depth Image Based Rendering, DIBR ) algorithm to reconstruct frame images to obtain images of virtual viewpoints;
  • the fourth cloud server A34 can be used to generate free viewpoint videos (multi-angle free viewpoint videos).
  • first cloud server A31, the second cloud server A32, the third cloud server A33, and the fourth cloud server A34 may also be a server group composed of a server array or a server sub-cluster, which is not required in this embodiment of the present invention. limit.
  • the playback control device A4 can insert the received free-view video frame into the to-be-played video stream, and the playback terminal A5 receives the to-be-played video stream from the playback control device A4 and plays it in real time.
  • the playback control device A4 may be a manual playback control device or a virtual playback control device.
  • a dedicated server that can automatically switch video streams can be set up as a virtual playback control device to control the data source.
  • a broadcast director control device such as a broadcast director station, can be used as a playback control device A4 in the embodiment of the present invention.
  • the interaction device A6 can play free-view video based on user interaction.
  • each acquisition device in the acquisition array A1 and the data processing device A2 can be connected through a switch A7 and/or a local area network, and the number of playback terminals A5 and interactive terminals A6 can be one or more,
  • the playback terminal A5 and the interactive terminal A6 may be the same terminal device,
  • the data processing device A2 may be placed in a non-collection area or in the cloud according to specific scenarios, and the server cluster A3 and playback control device A4 may be based on specific scenarios. It is placed in the non-collection area of the site, on the cloud or terminal access side, and this embodiment is not used to limit the specific implementation and protection scope of the present invention.
  • inventions of the present specification further provide a computer-readable storage medium on which computer instructions are stored, wherein, when the computer instructions are executed, the steps of the methods described in any of the foregoing embodiments may be performed.
  • computer instructions when executed, the steps of the methods described in any of the foregoing embodiments may be performed.
  • the computer-readable storage medium may be various suitable readable storage mediums such as an optical disc, a mechanical hard disk, and a solid-state hard disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Processing Or Creating Images (AREA)

Abstract

Free viewpoint video reconstruction and playing processing methods, a device, and a storage medium. The video reconstruction method comprises: acquiring a free viewpoint video frame, wherein the video frame comprises synchronous original texture images of a plurality of original viewpoints, and original depth images of the corresponding viewpoints; acquiring a target video frame corresponding to a virtual viewpoint; using original texture images of the plurality of original viewpoints and corresponding original depth images in the target video frame to synthesize a texture image of the virtual viewpoint; acquiring background texture images and background depth images of the corresponding viewpoints in the target video frame, and acquiring a background texture image of the virtual viewpoint according to the background texture images and the background depth images of the corresponding viewpoints; and using the background texture image of the virtual viewpoint to perform hole filling on a hole region in the texture image of the virtual viewpoint, and then performing processing to obtain a reconstructed image of the virtual viewpoint. By means of the scheme, the hole filling quality can be improved, thereby improving the image quality of a free viewpoint video.

Description

自由视点视频重建及播放处理方法、设备及存储介质Free viewpoint video reconstruction and playback processing method, device and storage medium
本公开要求2020年07月31日递交的申请号为202010759861.3、发明名称为“自由视点视频重建及播放处理方法、设备及存储介质”中国专利申请的优先权,其全部内容通过引用结合在本申请中。This disclosure claims the priority of the Chinese patent application filed on July 31, 2020 with the application number 202010759861.3 and the invention titled "Free-view video reconstruction and playback processing method, device and storage medium", the entire contents of which are incorporated herein by reference middle.
技术领域technical field
本说明书实施例涉及视频处理技术领域,尤其涉及一种自由视点视频重建及播放处理方法、设备及存储介质。The embodiments of this specification relate to the technical field of video processing, and in particular, to a method, device, and storage medium for free-view video reconstruction and playback processing.
背景技术Background technique
自由视点视频是一种能够提供高自由度观看体验的技术,用户可以在观看过程中通过交互操作,调整观看视角,从想观看的自由视点角度进行观看,从而可以大幅提升观看体验。Free viewpoint video is a technology that can provide a high degree of freedom viewing experience. Users can adjust the viewing angle through interactive operations during the viewing process, and watch from the free viewpoint they want to watch, which can greatly improve the viewing experience.
为实现自由视点观看,可以采用虚拟视点合成技术。在虚拟视点合成技术中,基于深度图的图像绘制(Depth Image Based Rendering,DIBR)技术成为虚拟视点合成的一种重要方法,其仅需要参考视点的纹理图及对应的深度图,经过三维坐标变换即可得到原本不存在相机的视点的视图。In order to realize free viewpoint viewing, virtual viewpoint synthesis technology can be used. In the virtual viewpoint synthesis technology, the Depth Image Based Rendering (DIBR) technology has become an important method of virtual viewpoint synthesis, which only needs to refer to the texture map of the viewpoint and the corresponding depth map, after three-dimensional coordinate transformation You can get a view of the viewpoint where the camera does not originally exist.
DIBR技术主要分为选取视点、预处理、映射、视图融合和后处理等步骤。其中,在映射过程中,可能存在参考视点中被前景物体遮挡的背景纹理部分在参考视点是不可见,但在虚拟视点中却是可见的情况。因此,在视图融合之后,虚拟视点仍然存在一些没有填充的空洞区域。DIBR technology is mainly divided into the steps of selecting viewpoints, preprocessing, mapping, view fusion and postprocessing. Among them, during the mapping process, there may be a situation that the part of the background texture in the reference viewpoint that is occluded by the foreground object is invisible in the reference viewpoint, but is visible in the virtual viewpoint. Therefore, after view fusion, there are still some unfilled hollow areas in the virtual viewpoint.
针对前景物体遮挡区域的空洞填补,目前存在一些利用空洞周围的有效纹理信息进行滤波的方法。然而这些方法效果并不理想,容易产生伪影和模糊现象,导致重建得到的自由视点视频的图像质量较差。For the hole filling in the occluded area of the foreground object, there are currently some filtering methods using the effective texture information around the hole. However, these methods are not ideal, and are prone to artifacts and blurring, resulting in poor image quality of reconstructed free-viewpoint videos.
发明内容SUMMARY OF THE INVENTION
有鉴于此,本说明书实施例提供自由视点视频重建及播放处理方法、设备及存储介质,能够提高空洞填补质量,进而可以提高自由视点视频的图像质量。In view of this, the embodiments of this specification provide a free-view video reconstruction and playback processing method, device, and storage medium, which can improve the quality of hole filling, thereby improving the image quality of the free-view video.
首先,本说明书实施例提供了一种自由视点视频重建方法,包括:First, the embodiments of this specification provide a free-view video reconstruction method, including:
获取自由视点视频帧,所述视频帧包括同步的多个原始视点的原始纹理图和对应视点的原始深度图;Obtaining a free-view video frame, the video frame includes the original texture maps of multiple original viewpoints and the original depth maps of the corresponding viewpoints;
获取虚拟视点对应的目标视频帧;Obtain the target video frame corresponding to the virtual viewpoint;
采用所述目标视频帧中多个原始视点的原始纹理图和对应的原始深度图,合成所述虚拟视点的纹理图;Using the original texture maps of multiple original viewpoints and the corresponding original depth maps in the target video frame to synthesize the texture maps of the virtual viewpoints;
获取所述目标视频帧对应视点的背景纹理图和背景深度图,并根据所述对应视点的背景纹理图和背景深度图,获取所述虚拟视点的背景纹理图;Obtain the background texture map and the background depth map of the viewpoint corresponding to the target video frame, and obtain the background texture map of the virtual viewpoint according to the background texture map and the background depth map of the corresponding viewpoint;
采用所述虚拟视点的背景纹理图,对所述虚拟视点的纹理图中的空洞区域进行空洞填补后处理,得到所述虚拟视点的重建图像。Using the background texture map of the virtual viewpoint, post-processing is performed on the void area in the texture map of the virtual viewpoint to obtain a reconstructed image of the virtual viewpoint.
可选地,所述获取所述目标视频帧对应视点的背景纹理图和背景深度图,包括:Optionally, the acquiring the background texture map and the background depth map of the viewpoint corresponding to the target video frame includes:
选择所述目标视频帧对应视点的参考纹理图序列和参考深度图序列;selecting the reference texture map sequence and the reference depth map sequence of the viewpoint corresponding to the target video frame;
对所述参考纹理图序列和参考深度图序列分别进行时域滤波,得到所述目标视频帧对应视点的背景纹理图和背景深度图。Temporal filtering is performed on the reference texture map sequence and the reference depth map sequence, respectively, to obtain a background texture map and a background depth map of the viewpoint corresponding to the target video frame.
可选地,所述对所述参考纹理图序列和参考深度图序列分别进行时域滤波,得到所述目标视频帧对应视点的背景纹理图和背景深度图,包括:Optionally, the temporal filtering is performed on the reference texture map sequence and the reference depth map sequence, respectively, to obtain the background texture map and the background depth map of the viewpoint corresponding to the target video frame, including:
对所述参考纹理图序列和参考深度图序列中的像素分别进行时域中值滤波,得到所述目标视频帧对应视点的背景纹理图和背景深度图。Temporal median filtering is performed on the pixels in the reference texture map sequence and the reference depth map sequence, respectively, to obtain a background texture map and a background depth map of the viewpoint corresponding to the target video frame.
可选地,所述采用所述目标视频帧中多个原始视点的原始纹理图和对应的原始深度图,合成所述虚拟视点的纹理图,包括:Optionally, using the original texture maps and corresponding original depth maps of multiple original viewpoints in the target video frame to synthesize the texture maps of the virtual viewpoints, including:
基于所述虚拟视点,按照预设规则选择所述目标视频帧中相应原始视点的原始纹理图和对应的原始深度图;Based on the virtual viewpoint, select the original texture map and the corresponding original depth map of the corresponding original viewpoint in the target video frame according to a preset rule;
采用所选择的相应原始视点的原始纹理图和对应的原始深度图,合成所述虚拟视点的纹理图。Using the selected original texture map of the corresponding original viewpoint and the corresponding original depth map, the texture map of the virtual viewpoint is synthesized.
可选地,所述获取所述目标视频帧对应视点的背景纹理图和背景深度图,包括:Optionally, the acquiring the background texture map and the background depth map of the viewpoint corresponding to the target video frame includes:
获取与所选择的原始视点相应的参考纹理图序列和参考深度图序列;Obtain a reference texture map sequence and a reference depth map sequence corresponding to the selected original viewpoint;
对所述参考纹理图序列和参考深度图序列分别进行时域滤波,得到所选择的相应原始视点的背景纹理图和背景深度图。Temporal filtering is performed on the reference texture map sequence and the reference depth map sequence, respectively, to obtain a background texture map and a background depth map of the selected corresponding original viewpoint.
可选地,所述获取所述目标视频帧对应视点的背景纹理图和背景深度图,包括:Optionally, the acquiring the background texture map and the background depth map of the viewpoint corresponding to the target video frame includes:
预先采集所述目标视频帧所针对的视场中对应视点不存在前景对象的背景纹理图;Pre-collecting a background texture map in which there is no foreground object at the corresponding viewpoint in the field of view targeted by the target video frame;
根据所述目标视频帧所针对的视场中对应视点不存在前景对象的背景纹理图,获取对应视点的背景深度图。The background depth map of the corresponding viewpoint is acquired according to the background texture map in which the corresponding viewpoint does not have a foreground object in the field of view targeted by the target video frame.
可选地,所述采用所述虚拟视点的背景纹理图,对所述虚拟视点的纹理图中的空洞区域进行空洞填补后处理,得到所述虚拟视点的重建图像,包括:Optionally, the background texture map of the virtual viewpoint is used to perform post-filling post-processing on the hollow area in the texture map of the virtual viewpoint to obtain the reconstructed image of the virtual viewpoint, including:
采用所述虚拟视点的背景纹理图,使用联合双边滤波方法对所述虚拟视点的纹理图中的空洞区域进行插值处理,得到所述虚拟视点的重建图像。The background texture map of the virtual viewpoint is used, and a joint bilateral filtering method is used to perform interpolation processing on the hollow area in the texture map of the virtual viewpoint to obtain a reconstructed image of the virtual viewpoint.
可选地,在对所述虚拟视点的纹理图中的空洞区域进行空洞填补后处理之后,得到所述虚拟视点的重建图像之前,还包括:Optionally, after the post-processing of hole filling is performed on the hole region in the texture map of the virtual viewpoint, and before the reconstructed image of the virtual viewpoint is obtained, the method further includes:
对空洞填补后处理后得到的虚拟视点的纹理图中的前景边缘进行滤波处理,以得到所述虚拟视点的重建图像。Filtering is performed on the foreground edge in the texture map of the virtual viewpoint obtained after the post-processing of hole filling, so as to obtain the reconstructed image of the virtual viewpoint.
本说明书实施例还提供了一种自由视点视频播放处理方法,包括:The embodiments of this specification also provide a free-view video playback processing method, including:
确定虚拟视点,根据虚拟视点,确定目标视频帧;Determine the virtual viewpoint, and determine the target video frame according to the virtual viewpoint;
采用所述目标视频帧中多个原始视点的原始纹理图和对应的原始深度图,合成所述虚拟视点的纹理图;Using the original texture maps of multiple original viewpoints and the corresponding original depth maps in the target video frame to synthesize the texture maps of the virtual viewpoints;
获取所述目标视频帧对应视点的背景纹理图和背景深度图,并根据所述对应视点的背景纹理图和背景深度图,获取所述虚拟视点的背景纹理图;Obtain the background texture map and the background depth map of the viewpoint corresponding to the target video frame, and obtain the background texture map of the virtual viewpoint according to the background texture map and the background depth map of the corresponding viewpoint;
采用所述虚拟视点的背景纹理图,对所述虚拟视点的纹理图中的空洞区域进行空洞填补后处理,得到所述虚拟视点的重建图像。Using the background texture map of the virtual viewpoint, post-processing is performed on the void area in the texture map of the virtual viewpoint to obtain a reconstructed image of the virtual viewpoint.
可选地,所述确定虚拟视点,包括以下至少一种:Optionally, the determining a virtual viewpoint includes at least one of the following:
响应于用户交互行为,确定虚拟视点;determining a virtual viewpoint in response to user interaction;
基于视频流中包含的虚拟视点位置信息,确定虚拟视点。The virtual viewpoint is determined based on the virtual viewpoint position information contained in the video stream.
可选地,所述方法还包括:Optionally, the method further includes:
获取所述虚拟视点的重建图像中的虚拟渲染目标对象;acquiring the virtual rendering target object in the reconstructed image of the virtual viewpoint;
获取基于所述虚拟渲染目标对象的增强现实特效输入数据所生成的虚拟信息图像;acquiring a virtual information image generated based on the augmented reality special effect input data of the virtual rendering target object;
将所述虚拟信息图像与所述虚拟视点的重建图像进行合成处理并展示。The virtual information image and the reconstructed image of the virtual viewpoint are synthesized and displayed.
可选地,所述获取基于所述虚拟渲染目标对象的增强现实特效输入数据所生成的虚拟信息图像,包括:Optionally, the acquiring the virtual information image generated based on the augmented reality special effect input data of the virtual rendering target object includes:
根据三维标定得到的所述虚拟渲染目标对象在所述虚拟视点的重建图像中的位置,得到与所述虚拟渲染目标对象位置匹配的虚拟信息图像。According to the position of the virtual rendering target object in the reconstructed image of the virtual viewpoint obtained by 3D calibration, a virtual information image matching the position of the virtual rendering target object is obtained.
本说明书实施例还提供了一种自由视点视频重建装置,包括:The embodiments of this specification also provide a free-viewpoint video reconstruction device, including:
视频帧获取单元,适于获取自由视点视频帧,所述视频帧包括同步的多个原始视点的原始纹理图和对应视点的原始深度图;a video frame acquisition unit, adapted to acquire a free-view video frame, the video frame including the original texture maps of multiple original viewpoints and the original depth maps of the corresponding viewpoints;
目标视频帧确定单元,适于获取虚拟视点对应的目标视频帧;a target video frame determination unit, adapted to obtain the target video frame corresponding to the virtual viewpoint;
虚拟视点纹理图合成单元,适于采用所述目标视频帧中多个原始视点的原始纹理图和对应的原始深度图,合成所述虚拟视点的纹理图;a virtual viewpoint texture map synthesis unit, adapted to use the original texture maps and corresponding original depth maps of multiple original viewpoints in the target video frame to synthesize the texture maps of the virtual viewpoints;
虚拟视点背景纹理图合成单元,适于获取所述目标视频帧对应视点的背景纹理图和背景深度图,并根据所述对应视点的背景纹理图和背景深度图,获取所述虚拟视点的背景纹理图;A virtual viewpoint background texture map synthesis unit, adapted to obtain the background texture map and background depth map of the viewpoint corresponding to the target video frame, and obtain the background texture of the virtual viewpoint according to the background texture map and background depth map of the corresponding viewpoint picture;
后处理单元,适于采用所述虚拟视点的背景纹理图,对所述虚拟视点的纹理图中的空洞区域进行空洞填补后处理,得到所述虚拟视点的重建图像。The post-processing unit is adapted to use the background texture map of the virtual view point to perform post-processing of hole filling on the hole area in the texture map of the virtual view point to obtain a reconstructed image of the virtual view point.
本说明书实施例还提供了一种自由视点视频播放处理装置,包括:The embodiments of this specification also provide a free-viewpoint video playback processing device, including:
虚拟视点确定单元,适于确定虚拟视点;a virtual viewpoint determination unit, adapted to determine a virtual viewpoint;
目标视频帧确定单元,适于根据所述虚拟视点,确定目标视频帧;a target video frame determination unit, adapted to determine a target video frame according to the virtual viewpoint;
虚拟视点纹理图合成单元,适于采用所述目标视频帧中多个原始视点的原始纹理图和对应的原始深度图,合成所述虚拟视点的纹理图;a virtual viewpoint texture map synthesis unit, adapted to use the original texture maps and corresponding original depth maps of multiple original viewpoints in the target video frame to synthesize the texture maps of the virtual viewpoints;
虚拟视点背景纹理图合成单元,适于获取所述目标视频帧对应视点的背景纹理图和背 景深度图,并根据所述对应视点的背景纹理图和背景深度图,获取所述虚拟视点的背景纹理图;A virtual viewpoint background texture map synthesis unit, adapted to obtain the background texture map and background depth map of the viewpoint corresponding to the target video frame, and obtain the background texture of the virtual viewpoint according to the background texture map and background depth map of the corresponding viewpoint picture;
后处理单元,适于采用所述虚拟视点的背景纹理图,对所述虚拟视点的纹理图中的空洞区域进行空洞填补后处理,得到所述虚拟视点的重建图像。The post-processing unit is adapted to use the background texture map of the virtual view point to perform post-processing of hole filling on the hole area in the texture map of the virtual view point to obtain a reconstructed image of the virtual view point.
本说明书实施例还提供了一种电子设备,包括存储器和处理器,所述存储器上存储有可在所述处理器上运行的计算机指令,其中,所述处理器运行所述计算机指令时执行前述任一实施例所述方法的步骤。Embodiments of the present specification further provide an electronic device, including a memory and a processor, wherein the memory stores computer instructions that can be executed on the processor, wherein the processor executes the aforementioned computer instructions when the processor executes the computer instructions. The steps of the method of any one of the embodiments.
本说明书实施例还提供了一种电子设备,包括:通信组件、处理器和显示组件,其中:The embodiments of this specification also provide an electronic device, including: a communication component, a processor, and a display component, wherein:
所述通信组件,适于获取自由视点视频;the communication component, adapted to obtain free-view video;
所述处理器,适于执行前述任一实施例所述方法的步骤;the processor, adapted to perform the steps of the method in any of the foregoing embodiments;
所述显示组件,适于显示所述处理器处理后得到的虚拟视点的重建图像。The display component is adapted to display the reconstructed image of the virtual viewpoint obtained after processing by the processor.
本说明书实施例还提供了一种计算机可读存储介质,其上存储有计算机指令,其中,所述计算机指令运行时执行前述任一实施例所述方法的步骤。The embodiments of the present specification further provide a computer-readable storage medium on which computer instructions are stored, wherein, when the computer instructions are executed, the steps of the methods described in any of the foregoing embodiments are executed.
与现有技术相比,本说明书实施例的技术方案具有以下有益效果:Compared with the prior art, the technical solutions of the embodiments of this specification have the following beneficial effects:
本说明书实施例的方案,其中,采用重建得到所述虚拟视点的完整的背景纹理图,来对合成得到的目标视频帧对应的虚拟视点的纹理图进行空洞填补后处理,相对于仅利用空洞周围的纹理进行滤波的方案,可以避免空洞填补不完整而导致的伪影和模糊的现象,提高空洞填补质量,进而可以提高自由视点视频的图像质量。The solution of the embodiment of the present specification, wherein the complete background texture map of the virtual viewpoint is obtained by reconstruction, and the texture map of the virtual viewpoint corresponding to the synthesized target video frame is subjected to hole filling post-processing. The scheme of filtering the texture based on the filter can avoid the artifacts and blurring caused by incomplete hole filling, improve the quality of hole filling, and then improve the image quality of free-view video.
进一步地,基于所述虚拟视点,按照预设规则选择所述目标视频帧中多个原始视点的原始纹理图和对应的原始深度图,分别作为参考纹理图和参考深度图,用于合成所述虚拟视点的纹理图,可以减小视频重建过程中的数据处理量,提高视频重建效率。Further, based on the virtual viewpoint, the original texture maps and corresponding original depth maps of multiple original viewpoints in the target video frame are selected according to preset rules as reference texture maps and reference depth maps, respectively, for synthesizing the The texture map of the virtual viewpoint can reduce the amount of data processing in the video reconstruction process and improve the video reconstruction efficiency.
进一步地,通过对所选择的所述目标视频帧对应视点的参考纹理图序列和参考深度图序列分别进行时域滤波,得到所述目标视频帧对应视点的背景纹理图和背景深度图,由于考虑了所述目标视频帧对应视点的参考纹理图序列和参考深度图序列,即所述目标视频帧对应视点时域上的纹理信息和深度信息,而非仅基于所述目标视频帧空间域的纹理信息和深度信息,因此可以提高所得到的背景纹理图和背景深度图的完整性及真实性,避免前景遮挡产生的伪影和模糊现象,从而可以提高图像空洞填补质量。Further, by performing temporal filtering on the reference texture map sequence and the reference depth map sequence of the viewpoint corresponding to the selected target video frame, respectively, the background texture map and the background depth map of the viewpoint corresponding to the target video frame are obtained. The reference texture map sequence and reference depth map sequence of the viewpoint corresponding to the target video frame, that is, the texture information and depth information in the temporal domain of the viewpoint corresponding to the target video frame, not only based on the texture in the spatial domain of the target video frame. Therefore, the integrity and authenticity of the obtained background texture map and background depth map can be improved, and the artifacts and blurring caused by foreground occlusion can be avoided, thereby improving the quality of image hole filling.
附图说明Description of drawings
图1是本说明书实施例中一种自由视点视频展示的具体应用系统示意图;1 is a schematic diagram of a specific application system of a free-view video display in an embodiment of this specification;
图2是本说明书实施例中一种终端设备交互界面示意图;2 is a schematic diagram of an interactive interface of a terminal device in an embodiment of this specification;
图3是本说明书实施例中一种采集设备设置方式的示意图;FIG. 3 is a schematic diagram of a setting mode of a collection device in an embodiment of the present specification;
图4是本说明书实施例中另一种终端设备交互界面示意图;4 is a schematic diagram of another terminal device interaction interface in the embodiment of this specification;
图5是本说明书实施例中一种自由视点视频数据生成过程的示意图;5 is a schematic diagram of a free-viewpoint video data generation process in an embodiment of the present specification;
图6是本说明书实施例中一种6DoF视频数据的生成及处理的示意图;6 is a schematic diagram of the generation and processing of a kind of 6DoF video data in the embodiment of this specification;
图7是本说明书实施例中一种数据头文件的结构示意图;7 is a schematic structural diagram of a data header file in an embodiment of the present specification;
图8是本说明书实施例中一种用户侧对6DoF视频数据处理的示意图;8 is a schematic diagram of a user side processing 6DoF video data in an embodiment of the present specification;
图9是本说明书实施例中一种自由视点视频重建方法的流程图;9 is a flowchart of a free-viewpoint video reconstruction method in an embodiment of the present specification;
图10是本说明书实施例中一具体应用场景的自由视点视频重建方法的示意图;10 is a schematic diagram of a free-viewpoint video reconstruction method for a specific application scenario in an embodiment of the present specification;
图11是本说明书实施例中一种自由视点视频播放处理方法的流程图;11 is a flowchart of a free-viewpoint video playback processing method in an embodiment of the present specification;
图12是本说明书实施例中另一种自由视点视频播放处理方法的流程图;12 is a flowchart of another free-viewpoint video playback processing method in the embodiment of this specification;
图13至图17是本说明书实施例中一种交互终端的显示界面示意图;13 to 17 are schematic diagrams of display interfaces of an interactive terminal in the embodiments of this specification;
图18是本说明书实施例中一种自由视点视频重建装置的结构示意图;18 is a schematic structural diagram of a device for free-view video reconstruction in an embodiment of the present specification;
图19是本说明书实施例中一种自由视点视频播放处理装置的结构示意图;19 is a schematic structural diagram of a free-viewpoint video playback processing device in an embodiment of the present specification;
图20是本说明书实施例中一种电子设备的结构示意图;20 is a schematic structural diagram of an electronic device in an embodiment of this specification;
图21是本说明书实施例中另一种电子设备的结构示意图;21 is a schematic structural diagram of another electronic device in the embodiment of this specification;
图22是本说明书实施例中一种视频处理系统的结构示意图。FIG. 22 is a schematic structural diagram of a video processing system in an embodiment of the present specification.
具体实施方式detailed description
为使本领域技术人员更好地理解和实施本说明书中的实施例,以下首先结合附图及具体应用场景对自由视点视频的实现方式进行示例性介绍。In order to enable those skilled in the art to better understand and implement the embodiments in this specification, the following first exemplarily introduces the implementation of free-view video with reference to the accompanying drawings and specific application scenarios.
参考图1,本发明实施例中一种自由视点视频展示的具体应用系统,可以包括多个采集设备的采集系统11、服务器12和显示设备13,其中采集系统11,可以对待观看区域进行图像采集;采集系统11或者服务器12,可以对获取到的同步的多个纹理图进行处理,生成能够支持显示设备13进行虚拟视点切换的多角度自由视角数据。显示设备13可以展示基于多角度自由视角数据生成的重建图像,重建图像对应于虚拟视点,根据用户指示可以展示对应于不同虚拟视点的重建图像,切换观看的位置和观看角度。Referring to FIG. 1 , a specific application system for free-view video display in an embodiment of the present invention may include a collection system 11 of multiple collection devices, a server 12 , and a display device 13 , wherein the collection system 11 can collect images of the area to be viewed. The acquisition system 11 or the server 12 can process the acquired synchronized multiple texture maps to generate multi-angle free viewing angle data that can support the display device 13 to perform virtual viewpoint switching. The display device 13 can display reconstructed images generated based on multi-angle free viewing angle data, the reconstructed images correspond to virtual viewpoints, and can display reconstructed images corresponding to different virtual viewpoints according to user instructions, and switch the viewing position and viewing angle.
在具体实现中,进行图像重建,得到重建图像的过程可以由显示设备13实施,也可以由位于内容分发网络(Content Delivery Network,CDN)的设备以边缘计算的方式实施。可以理解的是,图1仅为示例,并非对采集系统、服务器、终端设备以及具体实现方式的限制。In a specific implementation, the process of performing image reconstruction to obtain a reconstructed image may be implemented by the display device 13, or may be implemented by a device located in a content delivery network (Content Delivery Network, CDN) by means of edge computing. It can be understood that FIG. 1 is only an example, and is not a limitation on the collection system, the server, the terminal device and the specific implementation manner.
继续参考图1,用户可以通过显示设备13对待观看区域进行观看,在本实施例中,待观看区域为篮球场。如前所述,观看的位置和观看角度是可以切换的。Continuing to refer to FIG. 1 , the user can view the area to be viewed through the display device 13 , and in this embodiment, the area to be viewed is a basketball court. As mentioned earlier, the viewing position and viewing angle can be switched.
举例而言,用户可以在屏幕滑动,以切换虚拟视点。在本发明一实施例中,结合参考图2,用户手指沿D 22方向滑动屏幕时,可以切换进行观看的虚拟视点。继续参考图3,滑动前的虚拟视点的位置可以是VP 1,滑动屏幕切换虚拟视点后,虚拟视点的位置可以是VP 2。结合参考图4,在滑动屏幕后,屏幕展示的重建图像可以如图4所示。重建图像,可以是基于由实际采集情境中的多个采集设备采集到的图像生成的多角度自由视角数据进行图像重建得到的。 For example, the user can swipe across the screen to switch virtual viewpoints. In an embodiment of the present invention, referring to FIG. 2 , when the user slides the screen in the direction of D 22 , the virtual viewpoint for viewing can be switched. Continuing to refer to FIG. 3 , the position of the virtual viewpoint before sliding may be VP 1 , and after sliding the screen to switch the virtual viewpoint, the position of the virtual viewpoint may be VP 2 . With reference to FIG. 4 , after sliding the screen, the reconstructed image displayed on the screen may be as shown in FIG. 4 . The reconstructed image may be obtained by performing image reconstruction based on multi-angle free viewing angle data generated from images collected by multiple collection devices in an actual collection situation.
可以理解的是,切换前进行观看的图像,也可以是重建图像。重建图像可以是视频流 中的帧图像。另外,根据用户指示切换虚拟视点的方式可以是多样的,在此不做限制。It can be understood that the image viewed before switching may also be a reconstructed image. The reconstructed images may be frame images in the video stream. In addition, the manner of switching the virtual viewpoint according to the user's instruction may be various, which is not limited here.
在具体实施中,视点可以用6自由度(Degree of Freedom,DoF)的坐标表示,其中,视点的空间位置可以表示为(x,y,z),视角可以表示为三个旋转方向
Figure PCTCN2021108827-appb-000001
相应地,基于6自由度的坐标,可以确定虚拟视点,包括位置和视角。
In a specific implementation, the viewpoint can be represented by coordinates of 6 degrees of freedom (DoF), wherein the spatial position of the viewpoint can be represented as (x, y, z), and the viewing angle can be represented as three rotation directions
Figure PCTCN2021108827-appb-000001
Accordingly, based on the coordinates of 6 degrees of freedom, a virtual viewpoint, including position and viewing angle, can be determined.
虚拟视点是一个三维概念,生成重建图像需要三维信息。在一种具体实现方式中,多角度自由视角数据中可以包括深度图数据,用于提供平面图像外的第三维信息。相比于其它实现方式,例如通过点云数据提供三维信息,深度图数据的数据量较小。Virtual viewpoint is a three-dimensional concept, and three-dimensional information is required to generate reconstructed images. In a specific implementation manner, the multi-angle free viewing angle data may include depth map data, which is used to provide third-dimensional information outside the plane image. Compared with other implementations, such as providing three-dimensional information through point cloud data, the data volume of the depth map data is smaller.
在本发明实施例中,虚拟视点的切换可以在一定范围内进行,该范围即为多角度自由视角范围。也即,在多角度自由视角范围内,可以任意切换虚拟视点的位置以及视角。In the embodiment of the present invention, the switching of the virtual viewpoints may be performed within a certain range, which is a multi-angle free viewing angle range. That is, within the multi-angle free viewing angle range, the position and viewing angle of the virtual viewpoint can be switched arbitrarily.
多角度自由视角范围与采集设备的布置相关,采集设备的拍摄覆盖范围越广,则多角度自由视角范围越大。终端设备展示的画面质量,与采集设备的数量相关,通常,设置的采集设备的数量越多,展示的画面中空洞区域越少。The multi-angle free viewing angle range is related to the arrangement of the acquisition device. The wider the shooting coverage of the acquisition device, the larger the multi-angle free viewing angle range. The quality of the picture displayed by the terminal device is related to the number of collection devices. Generally, the more collection devices are set, the fewer empty areas in the displayed picture.
此外,多角度自由视角的范围与采集设备的空间分布相关。可以基于采集设备的空间分布关系设置多角度自由视角的范围以及在终端侧与显示设备的交互方式。In addition, the range of multi-angle free viewing angles is related to the spatial distribution of the acquisition devices. The range of multi-angle free viewing angles and the interaction mode with the display device on the terminal side can be set based on the spatial distribution relationship of the collection devices.
本领域技术人员可以理解的是,上述各实施例以及对应的附图仅为举例示意性说明,并非对采集设备的设置以及多角度自由视角范围之间关联关系的限定,也并非对交互方式以及显示设备展示效果的限定。It can be understood by those skilled in the art that the above-mentioned embodiments and the corresponding accompanying drawings are only illustrative and illustrative, and do not limit the setting of the collection device and the relationship between the multi-angle free viewing angle range, nor the interaction method and Display the limitations of the display effect of the device.
结合参照图5,为进行自由视点视频重建,需要进行纹理图的采集和深度图计算,包括了三个主要步骤,分别为多摄像机视频采集(Multi-camera Video Capturing),摄像机内外参计算(Camera Parameter Estimation),以及深度图计算(Depth Map Calculation)。对于多摄像机视频采集来说,要求各个摄像机采集的视频可以帧级对齐。其中,通过多摄像机的视频采集可以得到纹理图(Texture Image);通过摄像机内外参计算,可以得到摄像机参数(Camera Parameter),摄像机参数可以包括摄像机内部参数数据和外部参数数据;通过深度图计算,可以得到深度图(Depth Map),多个同步的纹理图及对应视角的深度图和摄像机参数,形成6DoF视频数据。Referring to Figure 5, in order to perform free-view video reconstruction, texture map acquisition and depth map calculation are required, including three main steps, namely Multi-camera Video Capturing, camera internal and external parameter calculation (Camera Parameter Estimation), and Depth Map Calculation. For multi-camera video capture, it is required that the video captured by each camera can be aligned at the frame level. Among them, the texture image (Texture Image) can be obtained through the video acquisition of multiple cameras; the camera parameters (Camera Parameter) can be obtained through the calculation of the internal and external parameters of the camera, and the camera parameters can include the internal parameter data of the camera and the external parameter data; Through the depth map calculation, The depth map, multiple synchronized texture maps, depth maps and camera parameters corresponding to the viewing angle can be obtained to form 6DoF video data.
在本说明书实施例方案中,并不需要特殊的摄像机,比如光场摄像机,来做视频的采集。同样的,也不需要在采集前先进行复杂的摄像机校准的工作。可以布局和安排多摄像机的位置,以更好的拍摄需要拍摄的物体或者场景。In the solution of the embodiment of the present specification, no special camera, such as a light field camera, is required to collect video. Likewise, there is no need for complex camera calibration prior to acquisition. Multiple camera positions can be laid out and arranged to better capture the object or scene that needs to be captured.
在以上的三个步骤处理完后,就得到了从多摄像机采集来的纹理图,所有摄像机的摄像机参数,以及每个摄像机的深度图。可以把这三部分数据称作为多角度自由视角视频数据中的数据文件,也可以称作6自由度视频数据(6DoF video data)。因为有了这些数据,用户端就可以根据虚拟的6自由度(Degree of Freedom,DoF)位置,来生成虚拟视点,从而提供6DoF的视频体验。After the above three steps are processed, the texture map collected from multiple cameras, the camera parameters of all cameras, and the depth map of each camera are obtained. These three parts of data can be referred to as data files in the multi-angle free-view video data, and can also be referred to as 6DoF video data. Because of these data, the client can generate virtual viewpoints according to the virtual 6 degrees of freedom (DoF) position, thereby providing a 6DoF video experience.
结合参考图6,6DoF视频数据以及指示性数据可以经过压缩和传输到达用户侧,用户 侧可以根据接收到的数据,获取用户侧6DoF表达,也即前述的6DoF视频数据和元数据。其中,指示性数据也可以称作元数据(Metadata),其中,视频数据包括多摄像机对应的各视点的纹理图和深度图数据,纹理图和深度图可以按照一定的拼接规则或拼接模式进行拼接,形成拼接图像。6, 6DoF video data and indicative data can reach the user side through compression and transmission, and the user side can obtain the user side 6DoF expression according to the received data, that is, the aforementioned 6DoF video data and metadata. The indicative data may also be called metadata, wherein the video data includes texture map and depth map data of each viewpoint corresponding to multiple cameras, and the texture map and depth map can be spliced according to certain splicing rules or splicing modes , forming a stitched image.
结合参考图7,元数据可以用来描述6DoF视频数据的数据模式,具体可以包括:拼接模式元数据(Stitching Pattern metadata),用来指示拼接图像中多个纹理图的像素数据以及深度图数据的存储规则;边缘保护元数据(Padding pattern metadata),可以用于指示对拼接图像中进行边缘保护的方式,以及其它元数据(Other metadata)。元数据可以存储于数据头文件,具体的存储顺序可以如图7所示,或者以其它顺序存储。7, metadata can be used to describe the data pattern of 6DoF video data, specifically can include: stitching pattern metadata (Stitching Pattern metadata), used to indicate the pixel data of multiple texture maps and depth map data in the stitched image. Storage rules; edge protection metadata (Padding pattern metadata), which can be used to indicate the way of edge protection in stitched images, and other metadata (Other metadata). The metadata may be stored in the data header file, and the specific storage sequence may be as shown in FIG. 7 , or stored in other sequences.
结合参考图8,用户侧得到了6DoF视频数据,其中包括了摄像机参数,拼接图像(纹理图以及深度图),以及描述元数据(元数据),除此之外,还有用户端的交互行为数据。通过这些数据,用户侧可以采用基于深度图的渲染(DIBR,Depth Image-Based Rendering)方式进行的6DoF渲染,从而在一个特定的根据用户行为产生的6DoF位置产生虚拟视点的图像,也即根据用户指示,确定与该指示对应的6DoF位置的虚拟视点。Referring to Figure 8, the user side obtains 6DoF video data, which includes camera parameters, stitched images (texture map and depth map), and description metadata (metadata), in addition to user-end interactive behavior data . Through these data, the user side can use Depth Image-Based Rendering (DIBR, Depth Image-Based Rendering) for 6DoF rendering, so as to generate a virtual viewpoint image at a specific 6DoF position generated according to user behavior, that is, according to the user's behavior. Indicate, determine the virtual viewpoint of the 6DoF position corresponding to the indication.
其中,在通常的深度图计算中,是在每一个帧时刻单独计算。发明人经研究发现,如此会在固定不动的背景中,求出不一致的深度值,从而导致时域上看到的画面的抖动。Among them, in the usual depth map calculation, it is calculated separately at each frame moment. The inventor has found through research that in this way, in a fixed background, inconsistent depth values are obtained, resulting in jitter of the picture seen in the time domain.
如前所述,针对前景物体遮挡区域的空洞填补,目前通常采用空洞周围有效纹理信息进行滤波,然而实际空洞修复的效果并不理想,容易产生伪影和模糊现象,导致重建得到的自由视点视频的图像质量较差。As mentioned above, for the hole filling in the occluded area of the foreground object, the effective texture information around the hole is usually used for filtering. However, the effect of the actual hole repair is not ideal, and it is easy to produce artifacts and blurring, resulting in the reconstructed free-view video. image quality is poor.
为此,本说明书实施例提供了一种自由视点视频重建方案,通过重建所述虚拟视点的完整的背景纹理图,来对合成得到的目标视频帧对应的虚拟视点的纹理图进行空洞填补后处理,相对于仅利用空洞周围的纹理进行滤波的方案,可以避免空洞填补不完整而导致的伪影和模糊的现象,提高空洞填补质量,进而可以提高自由视点视频的图像质量。To this end, the embodiments of this specification provide a free-view video reconstruction scheme, by reconstructing the complete background texture map of the virtual viewpoint, to perform hole filling post-processing on the texture map of the virtual viewpoint corresponding to the synthesized target video frame Compared with the scheme that only uses the texture around the hole for filtering, it can avoid the artifacts and blurring caused by incomplete hole filling, improve the quality of hole filling, and then improve the image quality of free-view video.
以下参照附图,结合具体应用场景对本说明书实施例自由视点视频重建过程中的空洞填补后处理的方案、原理、优点等进行详细描述。The following describes in detail the solution, principle, advantages, etc. of the post-processing of hole filling in the free-viewpoint video reconstruction process according to the embodiment of the present specification with reference to the accompanying drawings and in combination with specific application scenarios.
参照图9所示的自由视点视频重建方法的流程图,在具体实施中,若应用于图1所示的自由视点视频展示的具体应用系统,可以由服务器12或显示设备13实施,具体可以采用如下步骤进行自由视点视频重建:Referring to the flowchart of the free-viewpoint video reconstruction method shown in FIG. 9 , in the specific implementation, if it is applied to the specific application system of the free-viewpoint video display shown in FIG. 1 , it can be implemented by the server 12 or the display device 13. Free viewpoint video reconstruction is performed as follows:
S91,获取自由视点视频帧,所述视频帧包括同步的多个原始视点的原始纹理图和对应视点的原始深度图。S91: Acquire a free-view video frame, where the video frame includes original texture maps of multiple original viewpoints and original depth maps of corresponding viewpoints that are synchronized.
在具体实施中,自由视点视频帧可以包括同步的多个原始视点的原始纹理图和对应视点的原始深度图。作为一种可选示例,基于前述6DoF视频数据可以得到自由视点视频帧,其中,对应视角也即对应视点。In a specific implementation, a free-view video frame may include synchronized original texture maps of multiple original viewpoints and original depth maps of corresponding viewpoints. As an optional example, a free-view video frame may be obtained based on the aforementioned 6DoF video data, where the corresponding viewing angle is also the corresponding viewpoint.
在具体实施中,可以通过网络下载自由视点视频流,或者从本地存储的自由视点视频 文件获取自由视点视频帧。In a specific implementation, a free-view video stream can be downloaded through a network, or a free-view video frame can be obtained from a locally stored free-view video file.
S92,获取虚拟视点对应的目标视频帧。S92: Acquire a target video frame corresponding to the virtual viewpoint.
在具体实施中,可以根据用户交互行为,或者根据预先设置确定所述虚拟视点。若是基于用户交互行为确定,则可以通过获取用户交互操作对应的轨迹数据确定相应交互时刻的虚拟视点位置,确定虚拟视点。In a specific implementation, the virtual viewpoint may be determined according to user interaction behavior or preset. If it is determined based on the user interaction behavior, the virtual viewpoint position at the corresponding interaction moment can be determined by acquiring the trajectory data corresponding to the user interaction operation, and the virtual viewpoint can be determined.
在本说明书一些实施例中,也可以在服务端(如服务器或云端)预先设定相应视频帧对应的虚拟视点的位置信息,并在所述自由视点视频的头文件中传输所设定的虚拟视点的位置信息。In some embodiments of this specification, the location information of the virtual viewpoint corresponding to the corresponding video frame may also be preset on the server (such as the server or the cloud), and the set virtual viewpoint is transmitted in the header file of the free viewpoint video. The location information of the viewpoint.
在确定虚拟视点后,可以确定虚拟视点所对应的自由视点视频中对应的视频帧,作为目标视频帧。After the virtual viewpoint is determined, the corresponding video frame in the free viewpoint video corresponding to the virtual viewpoint may be determined as the target video frame.
S93,采用所述目标视频帧中多个原始视点的原始纹理图和对应的原始深度图,合成所述虚拟视点的纹理图。S93, using the original texture maps and corresponding original depth maps of multiple original viewpoints in the target video frame to synthesize the texture map of the virtual viewpoint.
在具体实施中,根据所述虚拟视点的位置信息,可以采用所述目标视频帧中包含的所有视点的原始纹理图和对应的原始深度图,合成所述虚拟视点的纹理图。In a specific implementation, according to the position information of the virtual viewpoint, the original texture maps and corresponding original depth maps of all viewpoints included in the target video frame may be used to synthesize the texture map of the virtual viewpoint.
而为了减少数据处理量,提高图像重建速度,在保证图像重建质量的情况下,也可以基于所述虚拟视点的位置信息,选取所述目标视频帧中部分视点的原始纹理图和对应的原始深度图,用以合成所述虚拟视点的纹理图。In order to reduce the amount of data processing and improve the image reconstruction speed, under the condition of ensuring the image reconstruction quality, the original texture map and corresponding original depth of some viewpoints in the target video frame can also be selected based on the position information of the virtual viewpoint. map for synthesizing the texture map of the virtual viewpoint.
具体而言,基于所述虚拟视点,可以按照预设规则选择所述目标视频帧中相应原始视点的原始纹理图和对应的原始深度图,进而采用所选择的相应原始视点的原始纹理图和对应的原始深度图,合成所述虚拟视点的纹理图。例如,可以基于所述虚拟视点与各原始视点位置的空间位置关系,选择与所述虚拟视点满足预设距离条件的相应原始视点的原始纹理图和对应的原始深度图。又如,可以选择与所述虚拟视点满足预设的空间位置关系且满足预设的数量阈值的相应原始视点的原始纹理图和对应的原始深度图。Specifically, based on the virtual viewpoint, the original texture map and the corresponding original depth map of the corresponding original viewpoint in the target video frame may be selected according to preset rules, and then the selected original texture map and corresponding original viewpoint of the corresponding original viewpoint are used. The original depth map is synthesized, and the texture map of the virtual viewpoint is synthesized. For example, an original texture map and a corresponding original depth map of a corresponding original viewpoint satisfying a preset distance condition from the virtual viewpoint may be selected based on the spatial positional relationship between the virtual viewpoint and the positions of each original viewpoint. For another example, an original texture map and a corresponding original depth map of a corresponding original viewpoint that satisfy a preset spatial position relationship with the virtual viewpoint and satisfy a preset number threshold may be selected.
可以理解的是,以上仅为选择部分原始视点的原始纹理图和对应的原始深度图的一些可选实施方式的示例,并非必要选择条件。It can be understood that the above is only an example of some optional implementations for selecting the original texture map and the corresponding original depth map of some original viewpoints, and it is not a necessary selection condition.
S94,获取所述目标视频帧对应视点的背景纹理图和背景深度图,并根据所述对应视点的背景纹理图和背景深度图,获取所述虚拟视点的背景纹理图。S94: Acquire a background texture map and a background depth map of a viewpoint corresponding to the target video frame, and acquire a background texture map of the virtual viewpoint according to the background texture map and background depth map of the corresponding viewpoint.
在具体实施中,有多种方式可以获得目标视频帧对应视点的背景纹理图和背景深度图,例如,可以采用时域滤波方式,或者可以采用预先采集方式获得。具体实现方式后文将结合具体应用场景进行详细说明。In a specific implementation, there are various ways to obtain the background texture map and the background depth map of the viewpoint corresponding to the target video frame. For example, the time domain filtering method can be used, or the pre-collection method can be used. The specific implementation will be described in detail later in combination with specific application scenarios.
获得所述目标视频帧对应视点的背景纹理图和背景深度图后,可以采用与步骤S93相同的方式,进行虚拟视点合成,得到所述虚拟视点的背景纹理图。After the background texture map and the background depth map of the viewpoint corresponding to the target video frame are obtained, virtual viewpoint synthesis may be performed in the same manner as in step S93 to obtain the background texture map of the virtual viewpoint.
在具体实施中,可以对所述虚拟视点的背景纹理图进行空洞填补后处理,以增强所述虚拟视点的背景纹理图的图像质量。作为一具体示例,可以采用联合双边滤波的方法对所 述虚拟视点的背景纹理图进行空洞填补后处理。In a specific implementation, post-processing of hole filling may be performed on the background texture map of the virtual viewpoint, so as to enhance the image quality of the background texture map of the virtual viewpoint. As a specific example, the method of joint bilateral filtering can be used to perform post-processing of hole filling on the background texture map of the virtual viewpoint.
在具体实施中,为得到所述虚拟视点的更完整的背景纹理图,可以采用多个视点的背景纹理图和背景深度图,其中,所选取的视点,可以比目标视频帧对应的视点的密度更大。In a specific implementation, in order to obtain a more complete background texture map of the virtual viewpoint, background texture maps and background depth maps of multiple viewpoints may be used, wherein the selected viewpoint may be higher than the density of the viewpoint corresponding to the target video frame. bigger.
S95,采用所述虚拟视点的背景纹理图,对所述虚拟视点的纹理图中的空洞区域进行空洞填补后处理,得到所述虚拟视点的重建图像。S95 , using the background texture map of the virtual viewpoint to perform post-filling processing on the hollow area in the texture map of the virtual viewpoint to obtain a reconstructed image of the virtual viewpoint.
采用所述虚拟视点的背景纹理图,可以有多种方式对所述虚拟视点的纹理图中的空洞区域进行空洞填补后处理。Using the background texture map of the virtual viewpoint, there may be various ways to perform post-filling and post-processing on the hollow area in the texture map of the virtual viewpoint.
在具体实施中,可以将所述虚拟视点的纹理图与所述虚拟视点的背景纹理图按照像素进行比较,确定所述虚拟视点的纹理图中背景区域与所述虚拟视点的背景纹理图不一致之处,或者确定像素值的差值大于预设阈值的像素,并将所述虚拟视点的纹理图中相应像素的值修改为所述虚拟视点的背景纹理图中相应像素点的值。In a specific implementation, the texture map of the virtual viewpoint and the background texture map of the virtual viewpoint may be compared pixel by pixel, and it is determined that the background area in the texture map of the virtual viewpoint is inconsistent with the background texture map of the virtual viewpoint. , or determine a pixel whose pixel value difference is greater than a preset threshold, and modify the value of the corresponding pixel in the texture map of the virtual viewpoint to the value of the corresponding pixel in the background texture map of the virtual viewpoint.
又如,可以使用联合双边滤波方法对所述虚拟视点的纹理图中的空洞区域进行插值处理,得到所述虚拟视点的重建图像。在具体实施中,可以采用专门的联合双边滤波器进行执行,或者采用调用相应的软件执行逻辑进行实施。通过联合双边滤波,可以保护虚拟视点的纹理图中的前景边缘并去除背景噪声。For another example, a joint bilateral filtering method may be used to perform interpolation processing on the hollow area in the texture map of the virtual viewpoint to obtain the reconstructed image of the virtual viewpoint. In a specific implementation, a special joint bilateral filter may be used for implementation, or a corresponding software execution logic may be invoked for implementation. Through joint bilateral filtering, the foreground edges in the texture map of virtual viewpoints can be protected and background noise can be removed.
在本说明书另一些实施例中,将所述虚拟视点的背景纹理图作为引导图,采用导向滤波方法对所述虚拟视点的纹理图中的空洞区域进行空洞填补,得到所述虚拟视点的重建图像。In other embodiments of the present specification, the background texture map of the virtual viewpoint is used as a guide image, and a guided filtering method is used to fill the hollow area in the texture map of the virtual viewpoint to obtain a reconstructed image of the virtual viewpoint .
在具体实施中,还可以采用其他的滤波方法,例如双边滤波方法、中值平滑滤波方法等,基于输入的所述虚拟视点的背景纹理图对所述虚拟视点的纹理图进行空洞填补后处理,不再一一例举。In specific implementation, other filtering methods, such as bilateral filtering method, median smoothing filtering method, etc., can also be used, and the texture map of the virtual viewpoint is subjected to hole filling post-processing based on the input background texture map of the virtual viewpoint, No more examples.
在步骤S95对所述虚拟视点的纹理图中的空洞区域进行空洞填补后处理之后,为进一步提高重建得到的图像质量,还可以对空洞填补后处理后得到的虚拟视点的纹理图中的前景边缘进行滤波处理,以得到所述虚拟视点的重建图像。In step S95, after the hole filling and post-processing is performed on the hole region in the texture map of the virtual viewpoint, in order to further improve the quality of the reconstructed image, the foreground edge in the texture map of the virtual viewpoint obtained after the hole filling and post-processing can also be processed. A filtering process is performed to obtain a reconstructed image of the virtual viewpoint.
采用上述实施例,通过重建所述虚拟视点的完整的背景纹理图,来对合成得到的目标视频帧对应的虚拟视点的纹理图进行空洞填补后处理,相对于仅利用空洞周围的纹理进行滤波的方案,可以避免空洞填补不完整而导致的伪影和模糊的现象,提高空洞填补质量,进而可以提高自由视点视频的图像质量。Using the above-mentioned embodiment, by reconstructing the complete background texture map of the virtual viewpoint, the texture map of the virtual viewpoint corresponding to the synthesized target video frame is subjected to hole-filling post-processing. The solution can avoid artifacts and blurring caused by incomplete hole filling, improve the quality of hole filling, and then improve the image quality of free-view video.
为使本领域技术人员更好地理解和实施,以下首先给出一些获取所述目标视频帧对应视点的背景纹理图和背景深度图的具体实现方式的示例。For better understanding and implementation by those skilled in the art, some examples of specific implementations for obtaining the background texture map and the background depth map of the viewpoint corresponding to the target video frame are first given below.
示例一,选择所述目标视频帧对应视点的参考纹理图序列和参考深度图序列,进而获得所述目标视频帧对应视点的背景纹理图和背景深度图。Example 1: Select a reference texture map sequence and a reference depth map sequence of a viewpoint corresponding to the target video frame, and then obtain a background texture map and a background depth map of the viewpoint corresponding to the target video frame.
其中,对于参考纹理图序列和参考深度图序列的选择,在一些实施例中,采用如下的方式:Wherein, for the selection of the reference texture map sequence and the reference depth map sequence, in some embodiments, the following methods are adopted:
方式一,对于所述目标视频帧包含的原始纹理图和原始深度图对应的所有原始视点中的任一原始视点,均获取对应的参考纹理图序列和参考深度图序列。Manner 1: For any original viewpoint among all original viewpoints corresponding to the original texture map and the original depth map included in the target video frame, obtain the corresponding reference texture map sequence and reference depth map sequence.
例如,所述目标视频帧包含30个视点的原始纹理图和对应的原始深度图,则对这30个视点,分别获取相应的参考纹理图序列和参考深度图序列。For example, if the target video frame includes the original texture maps and corresponding original depth maps of 30 viewpoints, for these 30 viewpoints, corresponding reference texture map sequences and reference depth map sequences are obtained respectively.
方式二,采用与合成所述虚拟视点的纹理图所选择的原始视点的参考纹理图序列和对应的参考深度图序列。In a second manner, a reference texture map sequence and a corresponding reference depth map sequence of the original viewpoint selected for synthesizing the texture map of the virtual viewpoint are used.
具体而言,可以获取与所选择的原始视点相应的参考纹理图序列和参考深度图序列,并对所述参考纹理图序列和参考深度图序列分别进行时域滤波,得到所选择的相应原始视点的背景纹理图和背景深度图。Specifically, a reference texture map sequence and a reference depth map sequence corresponding to the selected original viewpoint can be obtained, and temporal filtering is performed on the reference texture map sequence and the reference depth map sequence respectively to obtain the selected corresponding original viewpoint background texture map and background depth map.
例如,若合成所述虚拟视点的纹理图时,仅选择了离所述虚拟视点最近的两个原始视点的原始纹理图和对应的原始深度图,则可以仅获取离所述虚拟视点最近的两个原始视点的参考纹理图序列和参考深度图序列,从而可以减少数据运算量,提高所述虚拟视点背景纹理图的生成效率。For example, if only the original texture maps and the corresponding original depth maps of the two original viewpoints closest to the virtual viewpoint are selected when synthesizing the texture maps of the virtual viewpoint, only the two closest to the virtual viewpoint can be obtained. The reference texture map sequence and the reference depth map sequence of the original viewpoint can reduce the amount of data operation and improve the generation efficiency of the background texture map of the virtual viewpoint.
此外,所选取的参考纹理图序列和参考深度图序列,可以从独立于包含所述目标视频帧的视频片段中选取,也可以从包含所述目标视频帧的视频片段中选取。In addition, the selected reference texture map sequence and reference depth map sequence may be selected from a video clip independent of the target video frame, or may be selected from a video clip including the target video frame.
在选择好所述目标视频帧对应视点的参考纹理图序列和参考深度图序列后,可以对所述参考纹理图序列和参考深度图序列分别进行时域滤波,得到所述目标视频帧对应视点的背景纹理图和背景深度图。After selecting the reference texture map sequence and the reference depth map sequence of the viewpoint corresponding to the target video frame, temporal filtering may be performed on the reference texture map sequence and the reference depth map sequence, respectively, to obtain the corresponding viewpoint of the target video frame. Background texture map and background depth map.
在具体实施中,有多种方式可以实现时域滤波。In a specific implementation, there are various ways to implement temporal filtering.
例如,可以采用平均值滤波法,更具体地,可以采用算术平均滤波、中位值平均滤波、滑动平均滤波等方法。For example, the average filtering method can be used, and more specifically, arithmetic average filtering, median value average filtering, moving average filtering and other methods can be used.
又如,可以采用中值滤波法。具体而言,可以分别对所述参考纹理图序列中的像素和对应的参考深度图序列中的像素进行时域中值滤波,得到所述目标视频帧对应视点的背景纹理图和背景深度图。For another example, a median filter method can be used. Specifically, temporal median filtering may be performed on the pixels in the reference texture map sequence and the pixels in the corresponding reference depth map sequence, respectively, to obtain the background texture map and the background depth map of the viewpoint corresponding to the target video frame.
作为一可选示例,可以从一与目标深度图相同视点的视频X中选取t 1至t 2时刻的视频帧序列,作为这一时间段的参考纹理图序列,以及这一参考纹理图序列对应的参考深度图序列,可以将参考纹理图序列和参考深度图序列中相应像素位置的采样值按照大小排列,取中间值分别作为背景纹理图和背景深度图相应像素位置的有效值。为便于取中值,t 1至t 2时刻采样得到的参考纹理图序列和对应的参考深度图序列中图像的数目应为奇数,例如,取连续3帧,5帧,7帧等。可以采用公式可以表示如下: As an optional example, a video frame sequence from time t 1 to t 2 can be selected from a video X of the same viewpoint as the target depth map, as the reference texture map sequence for this time period, and the reference texture map sequence corresponds to The reference depth map sequence of the reference depth map sequence, the sampling values of the corresponding pixel positions in the reference texture map sequence and the reference depth map sequence can be arranged according to the size, and the median value is taken as the valid value of the corresponding pixel position of the background texture map and the background depth map respectively. For the convenience of taking the median value, the number of images in the reference texture map sequence and the corresponding reference depth map sequence sampled from time t 1 to t 2 should be odd, for example, take 3 consecutive frames, 5 frames, 7 frames, etc. The formula can be expressed as follows:
P(x t)=med({I x,i|i∈[t 1,t 2]}) P(x t )=med({I x, i |i∈[t 1 , t 2 ]})
其中,P(x t)表示背景纹理图或背景深度图中的任一像素,I x,i表示t 1至t 2时刻参考纹理图序列或参考深度图序列中与P(x t)相同像素位置的像素值序列,med表示取I x,i中的中间值。 Among them, P(x t ) represents any pixel in the background texture map or background depth map, and I x,i represents the same pixel as P(x t ) in the reference texture map sequence or reference depth map sequence from t 1 to t 2 The sequence of pixel values of the position, med means to take the intermediate value in I x, i .
可以理解的是,在具体实施中,根据具体视频所涉及的环境特点,以及具体要求等因素,还可以采用其他的时域滤波方法,例如,还可以采用限幅滤波,一阶滞后滤波等方法。It can be understood that, in the specific implementation, according to the environmental characteristics involved in the specific video, as well as the specific requirements and other factors, other time-domain filtering methods can also be used, for example, limiting filtering, first-order lag filtering and other methods can also be used. .
示例二,预先采集目标视频帧对应视点不存在前景对象的背景纹理图,进而对应视点的背景深度图。Example 2: Pre-collect the background texture map of the target video frame corresponding to the viewpoint without the foreground object, and then corresponding to the background depth map of the viewpoint.
在具体实施中,可以预先采集所述目标视频帧所针对的视场中对应视点不存在前景对象的背景纹理图,根据所述目标视频帧所针对的视场中对应视点不存在前景对象的背景纹理图,获取对应视点的背景深度图。In a specific implementation, a background texture map in which there is no foreground object at the corresponding viewpoint in the field of view targeted by the target video frame may be pre-collected, and according to the background texture map in which there is no foreground object at the corresponding viewpoint in the field of view targeted by the target video frame Texture map to obtain the background depth map of the corresponding viewpoint.
由于图像中的背景是相对于采集视点而言固定不变的对象,基于此,在本说明书一些实施例中,可以在所述目标视频帧所针对的视场中对应视点不存在前景对象时,在所述对应视点预先采集纹理图像,则所述纹理图像中包含的仅有背景纹理信息,因此,可以将所述对应视点预先采集的纹理图像作为对应视点的背景纹理图,进而根据所述背景纹理图,可以获得对应视点的背景深度图。Since the background in the image is a fixed object relative to the acquisition viewpoint, based on this, in some embodiments of this specification, when there is no foreground object at the corresponding viewpoint in the field of view targeted by the target video frame, If the texture image is pre-collected at the corresponding viewpoint, the texture image contains only background texture information. Therefore, the texture image pre-collected at the corresponding viewpoint can be used as the background texture image of the corresponding viewpoint, and then according to the background texture image Texture map, the background depth map of the corresponding viewpoint can be obtained.
例如,对于一场篮球赛直播,可以在比赛开场前,可以在相应视点采集一幅或多幅没有前景对象的图像,若采集一幅图像,则可以直接将这一幅图像作为背景纹理图,若采集多幅图像,则可以将这多幅图像作为参考纹理图序列,并进行时域滤波,得到对应视点的背景纹理图。相应地,通过深度计算,可以估计得到采集的参考纹理图的参考深度图,若对于单幅参考纹理图对应的参考深度图,可以直接作为背景深度图,对于多幅参考纹理图序列,可以得到相应的参考深度图序列,进而通过时域滤波运算,可以得到相应的背景深度图。For example, for the live broadcast of a basketball game, one or more images without foreground objects can be collected at the corresponding viewpoint before the game starts. If one image is collected, this image can be directly used as the background texture map. If multiple images are collected, the multiple images can be used as a reference texture map sequence, and time domain filtering can be performed to obtain the background texture map of the corresponding viewpoint. Correspondingly, through depth calculation, the reference depth map of the collected reference texture map can be estimated. If the reference depth map corresponding to a single reference texture map can be directly used as the background depth map, for multiple reference texture map sequences, it can be obtained. The corresponding reference depth map sequence, and then through the temporal filtering operation, the corresponding background depth map can be obtained.
参照图10所示的一具体应用场景的自由视点视频重建方法的示意图,在本说明书一实施例中,首先可以获得多个自由视点视频帧I,其中,任一自由视点视频帧I包括同步的多个原始视点的原始纹理图和对应视点的原始深度图,基于虚拟视点,获取目标视频帧I 0,基于目标视频帧I 0中包含的多个原始视点的原始纹理图和对应的原始深度图,通过虚拟视点重建,可以合成所述虚拟视点的纹理图T 0。并且,基于目标视频帧,可以获得目标视频帧对应视点的背景纹理图Tb和背景深度图Db,基于所述对应视点的背景纹理图Tb和背景深度图Db,通过虚拟视点重建,可以获得所述虚拟视点的背景纹理图Tb0,采用所述虚拟视点的背景纹理图Tb0对所述虚拟视点的纹理图T 0进行空洞填补后处理,可以得到最终的自由视点视频重建图像Te。 Referring to the schematic diagram of a free-viewpoint video reconstruction method for a specific application scenario shown in FIG. 10 , in an embodiment of this specification, a plurality of free-viewpoint video frames I can be obtained first, wherein any free-viewpoint video frame I includes a synchronized The original texture maps of multiple original viewpoints and the original depth maps of the corresponding viewpoints, based on the virtual viewpoint, the target video frame I 0 is acquired, and the original texture maps and corresponding original depth maps of multiple original viewpoints contained in the target video frame I 0 are based on , through virtual viewpoint reconstruction, the texture map T 0 of the virtual viewpoint can be synthesized. In addition, based on the target video frame, the background texture map Tb and the background depth map Db of the viewpoint corresponding to the target video frame can be obtained, and based on the background texture map Tb and the background depth map Db of the corresponding viewpoint, the virtual viewpoint reconstruction can obtain the For the background texture map Tb0 of the virtual viewpoint, the texture map T 0 of the virtual viewpoint is post-processed by filling holes by using the background texture map Tb0 of the virtual viewpoint to obtain the final free viewpoint video reconstruction image Te.
以上通过一些具体示例对自由视点视频重建方法进行了详细的介绍,本说明书实施例还提供了相应的自由视点视频播放处理方法,参照图11所示的自由视点视频播放处理方法的流程图,具体可以包括如下步骤:The above describes the free-view video reconstruction method in detail through some specific examples. The embodiments of this specification also provide a corresponding free-view video playback processing method. Referring to the flowchart of the free-view video playback processing method shown in FIG. Can include the following steps:
S111,确定虚拟视点,根据虚拟视点,确定目标视频帧。S111: Determine a virtual viewpoint, and determine a target video frame according to the virtual viewpoint.
在具体实施中,虚拟视点可以在自由视点视频播放过程中实时产生,也可以预先设置。更具体而言,可以响应于用户的手势交互操作,确定虚拟视点。例如,通过获取用户交互 操作对应的轨迹数据确定相应交互时刻的虚拟视点。或者,可以预先在服务端(如服务器或云端)预先设定相应视频帧对应的虚拟视点的位置信息,并在自由视点视频流的头文件中传输所设定的虚拟视点的位置信息,因此可以基于视频流中包含的虚拟视点位置信息,确定虚拟视点。In a specific implementation, the virtual viewpoint may be generated in real time during the playback of the free viewpoint video, or may be preset. More specifically, the virtual viewpoint may be determined in response to a user's gesture interaction operation. For example, the virtual viewpoint at the corresponding interaction moment is determined by acquiring the trajectory data corresponding to the user interaction operation. Alternatively, the position information of the virtual viewpoint corresponding to the corresponding video frame can be preset on the server (such as the server or the cloud), and the set position information of the virtual viewpoint can be transmitted in the header file of the free-view video stream. The virtual viewpoint is determined based on the virtual viewpoint position information contained in the video stream.
在确定虚拟视点后,可以根据虚拟视点,确定相应的帧时刻,及相应帧时刻的视频帧作为目标视频帧。After the virtual viewpoint is determined, the corresponding frame moment and the video frame at the corresponding frame moment may be determined as the target video frame according to the virtual viewpoint.
S112,采用所述目标视频帧中多个原始视点的原始纹理图和对应的原始深度图,合成所述虚拟视点的纹理图。S112 , synthesizing the texture map of the virtual viewpoint by using the original texture maps and corresponding original depth maps of multiple original viewpoints in the target video frame.
在确定虚拟视点后,为节约数据处理资源,可以基于虚拟视点位置,以及目标视频帧对应的参数数据,按照预设规则选取所述目标视频帧中部分原始视点的原始纹理图和对应视点的原始深度图进行组合渲染,合成所述虚拟视点的纹理图。例如,可以选取所述目标视频帧中离所述虚拟视点位置最近的2至N个视点对应的原始纹理图和原始深度图。其中,N为所述目标视频帧中原始纹理图的数量,也即原始纹理图对应的采集设备的数量。在具体实施中,数量关系值可以为固定的,也可以为变化的。After the virtual viewpoint is determined, in order to save data processing resources, based on the virtual viewpoint position and the parameter data corresponding to the target video frame, the original texture map of some original viewpoints in the target video frame and the original texture map of the corresponding viewpoint can be selected according to preset rules. The depth map is combined and rendered, and the texture map of the virtual viewpoint is synthesized. For example, original texture maps and original depth maps corresponding to 2 to N viewpoints closest to the virtual viewpoint position in the target video frame may be selected. Wherein, N is the number of original texture images in the target video frame, that is, the number of acquisition devices corresponding to the original texture images. In a specific implementation, the quantitative relationship value may be fixed or variable.
S113,获取所述目标视频帧对应视点的背景纹理图和背景深度图,并根据所述对应视点的背景纹理图和背景深度图,获取所述虚拟视点的背景纹理图。S113: Acquire a background texture map and a background depth map of a viewpoint corresponding to the target video frame, and acquire a background texture map of the virtual viewpoint according to the background texture map and background depth map of the corresponding viewpoint.
获取所述目标视频帧对应视点的背景纹理图和背景深度图的方式,以及根据所述对应视点的背景纹理图和背景深度图,获取所述虚拟视点的背景纹理图的具体实现方式,可以参见前述实施例中步骤S113及具体实施方式的介绍,此处不再展开描述。For the method of acquiring the background texture map and the background depth map of the viewpoint corresponding to the target video frame, and the specific implementation method of acquiring the background texture map of the virtual viewpoint according to the background texture map and the background depth map of the corresponding viewpoint, please refer to The introduction of step S113 and the specific implementation manner in the foregoing embodiment will not be described here again.
S114,采用所述虚拟视点的背景纹理图,对所述虚拟视点的纹理图中的空洞区域进行空洞填补后处理,得到所述虚拟视点的重建图像。S114. Using the background texture map of the virtual viewpoint, perform post-filling processing on the hollow area in the texture map of the virtual viewpoint to obtain a reconstructed image of the virtual viewpoint.
在具体实施中,可以采用双边滤波、联合双边滤波、导向滤波等其中任意一种或多种滤波方式对所述虚拟视点的纹理图中的空洞区域进行空洞填补后处理,得到所述虚拟视点的重建图像。In a specific implementation, any one or more filtering methods such as bilateral filtering, joint bilateral filtering, guided filtering, etc. may be used to perform hole filling and post-processing on the hole region in the texture map of the virtual view point, and obtain the virtual view point. Rebuild the image.
采用上述自由视点视频播放处理方法,由于播放所展示的所有视频均基于相应虚拟视点的纹理图进行空洞后处理,而相应虚拟视点的纹理图均基于合成的虚拟视点的纹理图对应的原始视点的参考纹理图和参考深度图进行时域滤波得到,因而所述虚拟视点的背景纹理图中包含稳定完整的背景纹理信息,故采用所述虚拟视点的背景纹理图对所述虚拟视点的纹理图进行空洞填补后处理,可以提高虚拟视点的重建图像质量。By adopting the above-mentioned free-view video playback processing method, since all videos displayed during playback are post-processed based on the texture maps of the corresponding virtual viewpoints, and the texture maps of the corresponding virtual viewpoints are based on the texture maps of the synthesized virtual viewpoints corresponding to the original viewpoints The reference texture map and the reference depth map are obtained by temporal filtering. Therefore, the background texture map of the virtual viewpoint contains stable and complete background texture information. The post-processing of hole filling can improve the reconstructed image quality of the virtual viewpoint.
为减少图像空洞,可以通过具体的视点配置算法或系统来配置摄像头(采集装置)的位置。在具体实施中,可以获取视场的三维空间信息,可选择的视点数量、摄像头的内外参数(包括摄像头的水平视场角、垂直视场角等参数),按照预设的配置模型进行匹配及运算,可以输出建议的摄像头布置方式及对应的摄像头位置。In order to reduce image voids, the position of the camera (acquisition device) can be configured through a specific viewpoint configuration algorithm or system. In the specific implementation, the three-dimensional space information of the field of view can be obtained, the number of selectable viewpoints, the internal and external parameters of the camera (including the horizontal field of view, vertical field of view and other parameters of the camera), matching and matching according to the preset configuration model. The operation can output the suggested camera arrangement and the corresponding camera position.
在具体实施中,还可以在上述实施例基础上对自由视点视频的播放方式作进一步的优 化和扩展。以下给出一示例性扩展方式。In a specific implementation, further optimization and expansion of the playback mode of the free-viewpoint video can be made on the basis of the above-mentioned embodiment. An exemplary extension is given below.
为丰富用户视觉体验,可以在重建得到的自由视点图像中植入增强现实(Augmented Reality,AR)特效。在本说明一些实施例中,参照图12所示的自由视点视频播放处理方法的流程图,可以采用如下方式实现AR特效的植入:In order to enrich the user's visual experience, Augmented Reality (AR) special effects can be implanted in the reconstructed free-viewpoint images. In some embodiments of this specification, referring to the flowchart of the free-viewpoint video playback processing method shown in FIG. 12 , the implantation of AR special effects can be implemented in the following manner:
S121,获取所述虚拟视点的重建图像中的虚拟渲染目标对象。S121: Acquire a virtual rendering target object in the reconstructed image of the virtual viewpoint.
在具体实施中,可以基于某些指示信息确定自由视点视频的图像中的某些对象作为虚拟渲染目标对象,所述指示信息可以基于用户交互生成,也可以基于某些预设触发条件或第三方指令得到。在本说明书一可选实施例中,响应于特效生成交互控制指令,可以获取所述虚拟视点的重建图像中的虚拟渲染目标对象。In a specific implementation, certain objects in the image of the free-view video may be determined as virtual rendering target objects based on certain indication information, and the indication information may be generated based on user interaction, or may be based on certain preset trigger conditions or a third party. command is obtained. In an optional embodiment of the present specification, the virtual rendering target object in the reconstructed image of the virtual viewpoint may be acquired in response to the special effect generating the interactive control instruction.
S122,获取基于所述虚拟渲染目标对象的增强现实特效输入数据所生成的虚拟信息图像。S122: Acquire a virtual information image generated based on the augmented reality special effect input data of the virtual rendering target object.
在本说明书实施例中,所植入的AR特效以虚拟信息图像的形式呈现。所述虚拟信息图像可以基于所述目标对象的增强现实特效输入数据生成。在确定虚拟渲染目标对象后,可以获取基于所述虚拟渲染目标对象的增强现实特效输入数据所生成的虚拟信息图像。In the embodiments of this specification, the implanted AR special effects are presented in the form of virtual information images. The virtual information image may be generated based on augmented reality special effect input data of the target object. After the virtual rendering target object is determined, a virtual information image generated based on the augmented reality special effect input data of the virtual rendering target object may be acquired.
在本说明书实施例中,所述虚拟渲染目标对象对应的虚拟信息图像可以预先生成,也可以响应于特效生成指令即时生成。In the embodiment of this specification, the virtual information image corresponding to the virtual rendering target object may be generated in advance, or may be generated immediately in response to the special effect generation instruction.
在具体实施中,可以基于三维标定得到的所述虚拟渲染目标对象在重建图像中的位置,得到与所述虚拟渲染目标对象位置匹配的虚拟信息图像,从而可以使得到的虚拟信息图像与所述虚拟渲染目标对象在三维空间中的位置更加匹配,进而所展示的虚拟信息图像更加符合三维空间中的真实状态,因而所展示的合成图像更加真实生动,增强用户的视觉体验。In a specific implementation, a virtual information image matching the position of the virtual rendering target object can be obtained based on the position of the virtual rendering target object obtained by 3D calibration in the reconstructed image, so that the obtained virtual information image can be matched with the virtual information image. The position of the virtual rendering target object in the three-dimensional space is more matched, and the displayed virtual information image is more in line with the real state in the three-dimensional space, so the displayed composite image is more realistic and vivid, and the user's visual experience is enhanced.
在具体实施中,可以基于虚拟渲染目标对象的增强现实特效输入数据,按照预设的特效生成方式,生成所述目标对象对应的虚拟信息图像。In a specific implementation, a virtual information image corresponding to the target object may be generated according to a preset special effect generation method based on the augmented reality special effect input data of the virtual rendering target object.
在具体实施中,可以采用多种特效生成方式。In a specific implementation, a variety of special effect generation methods can be adopted.
例如,可以将所述目标对象的增强现实特效输入数据输入至预设的三维模型,基于三维标定得到的所述虚拟渲染目标对象在所述图像中的位置,输出与所述虚拟渲染目标对象匹配的虚拟信息图像;For example, the augmented reality special effect input data of the target object may be input into a preset three-dimensional model, and the output matches the virtual rendering target object based on the position of the virtual rendering target object obtained by the three-dimensional calibration in the image. virtual information images;
又如,可以将所述虚拟渲染目标对象的增强现实特效输入数据,输入至预设的机器学习模型,基于三维标定得到的所述虚拟渲染目标对象在所述图像中的位置,输出与所述虚拟渲染目标对象匹配的虚拟信息图像。For another example, the augmented reality special effects input data of the virtual rendering target object can be input into a preset machine learning model, and the position of the virtual rendering target object in the image obtained based on the three-dimensional calibration can be output and the same as that of the virtual rendering target object. A virtual information image that matches the virtual render target object.
S123,将所述虚拟信息图像与所述虚拟视点的图像进行合成处理并展示。S123 , synthesizing and displaying the virtual information image and the image of the virtual viewpoint.
在具体实施中,可以有多种方式将所述虚拟信息图像与所述虚拟视点的重建图像进行合成处理并展示,以下给出两种具体可实现示例:In a specific implementation, the virtual information image and the reconstructed image of the virtual viewpoint can be synthesized and displayed in various ways, and two specific implementation examples are given below:
示例一:将所述虚拟信息图像与对应的重建图像进行融合处理,得到融合图像,对所 述融合图像进行展示;Example 1: The virtual information image and the corresponding reconstructed image are fused to obtain a fused image, and the fused image is displayed;
示例二:将所述虚拟信息图像叠加在对应的重建图像之上,得到叠加合成图像,对所述叠加合成图像进行展示。Example 2: The virtual information image is superimposed on the corresponding reconstructed image to obtain a superimposed composite image, and the superimposed composite image is displayed.
在具体实施中,可以将得到的合成图像直接展示,也可以将得到的合成图像插入待播放的视频流进行播放展示。例如,可以将所述融合图像插入待播放视频流进行播放展示。In a specific implementation, the obtained composite image may be displayed directly, or the obtained composite image may be inserted into a video stream to be played for playback and display. For example, the fused image may be inserted into the video stream to be played for display.
自由视点视频中可以包括特效展示标识,在具体实施中,可以基于特效展示标识,确定所述虚拟信息图像在所述虚拟视点的图像中的叠加位置,之后,可以将所述虚拟信息图像在所确定的叠加位置进行叠加展示。The free viewpoint video may include a special effect display identifier. In a specific implementation, the superimposed position of the virtual information image in the image of the virtual viewpoint may be determined based on the special effect display identifier, and then the virtual information image may be placed in the image of the virtual viewpoint. The determined superposition position is displayed in superposition.
为使本领域技术人员更好地理解和实施,以下通过一交互终端的图像展示过程进行详细说明。参照图13至图17所示的交互终端的视频播放画面示意图,交互终端T1实时地进行视频的播放。其中,参照图13,展示视频帧P1,接下来,交互终端所展示的视频帧P2中包含特效展示标识I1等多个特效展示标识,视频帧P2中通过指向目标对象的倒三角符号表示,如图14所示。可以理解的是,也可以采用其他的方式展示所述特效展示标识。终端用户触摸点击所述特效展示标识I1,则系统自动获取对应于所述特效展示标识I1的虚拟信息图像,将所述虚拟信息图像叠加展示在视频帧P3中,如图15所示,以运动员Q1站立的场地位置为中心,渲染出一个立体圆环R1。接下来,如图16及图17所示,终端用户触摸点击视频帧P3中的特效展示标识I2,系统自动获取对应于所述特效展示标识I2的虚拟信息图像,将所述虚拟信息图像叠加展示在视频帧P3上,得到叠加图像,即视频帧P4,其中展示了命中率信息展示板M0。命中率信息展示板M0上展示了目标对象即运动员Q2的号位、姓名及命中率信息。In order to make those skilled in the art better understand and implement, a detailed description is given below through an image display process of an interactive terminal. Referring to the schematic diagrams of the video playback screens of the interactive terminal shown in FIG. 13 to FIG. 17 , the interactive terminal T1 plays the video in real time. 13 , the video frame P1 is displayed. Next, the video frame P2 displayed by the interactive terminal includes a plurality of special effects display identifiers such as the special effect display identifier I1. The video frame P2 is represented by an inverted triangle symbol pointing to the target object, such as Figure 14. It can be understood that, the special effect display logo may also be displayed in other manners. The terminal user touches and clicks on the special effect display identifier I1, then the system automatically acquires the virtual information image corresponding to the special effect display identifier I1, and superimposes the virtual information image on the video frame P3 and displays it in the video frame P3, as shown in FIG. The position of the site where Q1 stands is the center, and a three-dimensional ring R1 is rendered. Next, as shown in FIG. 16 and FIG. 17 , the end user touches and clicks the special effect display identifier I2 in the video frame P3, and the system automatically acquires the virtual information image corresponding to the special effect display identifier I2, and displays the virtual information image in a superimposed manner. On the video frame P3, a superimposed image is obtained, that is, the video frame P4, in which the hit rate information display board M0 is displayed. The hit rate information display board M0 displays the number position, name and hit rate information of the target object, namely the athlete Q2.
如图13至图17所示,终端用户可以继续点击视频帧中展示的其他特效展示标识,观看展示各特效展示标识相应的AR特效的视频。As shown in FIG. 13 to FIG. 17 , the end user can continue to click other special effect display signs displayed in the video frame to watch the video showing the AR special effect corresponding to each special effect display sign.
可以理解的是,可以通过不同类型的特效展示标识区分不同类型的植入特效。It can be understood that different types of implanted special effects can be distinguished by different types of special effect display signs.
本说明书实施例还提供了相应的自由视点视频重建装置,参见图18所示的自由视点视频重建装置的结构示意图,其中,自由视点视频重建装置180可以包括:视频帧获取单元181、目标视频帧确定单元182、虚拟视点纹理图合成单元183、虚拟视点背景纹理图合成单元184和后处理单元185,其中:The embodiments of this specification also provide a corresponding free-viewpoint video reconstruction apparatus. Referring to the schematic structural diagram of the free-viewpoint video reconstruction apparatus shown in FIG. 18 , the free-viewpoint video reconstruction apparatus 180 may include: a video frame obtaining unit 181 , a target video frame The determination unit 182, the virtual viewpoint texture map synthesis unit 183, the virtual viewpoint background texture map synthesis unit 184 and the post-processing unit 185, wherein:
所述视频帧获取单元181,适于获取自由视点视频帧,所述视频帧包括同步的多个原始视点的原始纹理图和对应视点的原始深度图;The video frame obtaining unit 181 is adapted to obtain a free-view video frame, the video frame including the original texture maps of multiple original viewpoints and the original depth maps of the corresponding viewpoints;
所述目标视频帧确定单元182,适于获取虚拟视点对应的目标视频帧;The target video frame determining unit 182 is adapted to obtain the target video frame corresponding to the virtual viewpoint;
所述虚拟视点纹理图合成单元183,适于采用所述目标视频帧中多个原始视点的原始纹理图和对应的原始深度图,合成所述虚拟视点的纹理图;The virtual viewpoint texture map synthesis unit 183 is adapted to use the original texture maps and corresponding original depth maps of multiple original viewpoints in the target video frame to synthesize the texture maps of the virtual viewpoints;
所述虚拟视点背景纹理图合成单元184,适于获取所述目标视频帧对应视点的背景纹理图和背景深度图,并根据所述对应视点的背景纹理图和背景深度图,获取所述虚拟视点 的背景纹理图;The virtual viewpoint background texture map synthesis unit 184 is adapted to obtain the background texture map and background depth map of the viewpoint corresponding to the target video frame, and obtain the virtual viewpoint according to the background texture map and background depth map of the corresponding viewpoint The background texture map of ;
所述后处理单元185,适于采用所述虚拟视点的背景纹理图,对所述虚拟视点的纹理图中的空洞区域进行空洞填补后处理,得到所述虚拟视点的重建图像。The post-processing unit 185 is adapted to use the background texture map of the virtual viewpoint to perform post-processing for filling voids in the texture map of the virtual viewpoint to obtain a reconstructed image of the virtual viewpoint.
采用上述自由视点视频重建装置180,采用重建得到所述虚拟视点的完整的背景纹理图,来对合成得到的目标视频帧对应的虚拟视点的纹理图进行空洞填补后处理,相对于仅利用空洞周围的纹理进行滤波的方案,可以避免空洞填补不完整而导致的伪影和模糊的现象,提高空洞填补质量,进而可以提高自由视点视频的图像质量。Using the above-mentioned free viewpoint video reconstruction device 180, the complete background texture map of the virtual viewpoint obtained by reconstruction is used to perform hole filling post-processing on the texture map of the virtual viewpoint corresponding to the synthesized target video frame. The scheme of filtering the texture based on the filter can avoid the artifacts and blurring caused by incomplete hole filling, improve the quality of hole filling, and then improve the image quality of free-view video.
在具体实施中,所述虚拟视点视频重建装置中各单元可以采用前述自由视点视频重建方法中相应步骤的具体方法示例、具体方式等进行实施,具体可以参见前述实施例详述。In a specific implementation, each unit in the virtual viewpoint video reconstruction apparatus may be implemented by using the specific method examples and specific manners of the corresponding steps in the aforementioned free viewpoint video reconstruction method. For details, please refer to the foregoing embodiments for details.
本说明书实施例还提供了相应的自由视点视频播放处理装置,参照图19所示的自由视点视频播放处理装置的结构示意图,在本说明书一些实施例中,如图19所示,自由视点视频播放处理装置190可以包括:虚拟视点确定单元191、目标视频帧确定单元192、虚拟视点纹理图合成单元193、虚拟视点背景纹理图合成单元194和后处理单元195,其中:The embodiments of this specification also provide a corresponding free-viewpoint video playback processing apparatus. Referring to the schematic structural diagram of the free-viewpoint video playback processing apparatus shown in FIG. 19 , in some embodiments of this specification, as shown in FIG. 19 , the free-viewpoint video playback The processing device 190 may include: a virtual viewpoint determination unit 191, a target video frame determination unit 192, a virtual viewpoint texture map synthesis unit 193, a virtual viewpoint background texture map synthesis unit 194, and a post-processing unit 195, wherein:
虚拟视点确定单元191,适于确定虚拟视点;a virtual viewpoint determination unit 191, adapted to determine a virtual viewpoint;
目标视频帧确定单元192,适于根据所述虚拟视点,确定目标视频帧;a target video frame determining unit 192, adapted to determine a target video frame according to the virtual viewpoint;
虚拟视点纹理图合成单元193,适于采用所述目标视频帧中多个原始视点的原始纹理图和对应的原始深度图,合成所述虚拟视点的纹理图;The virtual viewpoint texture map synthesizing unit 193 is adapted to use the original texture maps and corresponding original depth maps of multiple original viewpoints in the target video frame to synthesize the texture maps of the virtual viewpoints;
虚拟视点背景纹理图合成单元194,适于获取所述目标视频帧对应视点的背景纹理图和背景深度图,并根据所述对应视点的背景纹理图和背景深度图,获取所述虚拟视点的背景纹理图;The virtual viewpoint background texture map synthesis unit 194 is adapted to obtain the background texture map and background depth map of the viewpoint corresponding to the target video frame, and obtain the background texture map and background depth map of the corresponding viewpoint according to the background texture map and background depth map of the virtual viewpoint. texture map;
后处理单元195,适于采用所述虚拟视点的背景纹理图,对所述虚拟视点的纹理图中的空洞区域进行空洞填补后处理,得到所述虚拟视点的重建图像。The post-processing unit 195 is adapted to use the background texture map of the virtual viewpoint to perform post-processing for filling voids in the texture map of the virtual viewpoint to obtain a reconstructed image of the virtual viewpoint.
采用上述自由视点视频播放处理单元,采用重建得到所述虚拟视点的完整的背景纹理图,来对合成得到的目标视频帧对应的虚拟视点的纹理图进行空洞填补后处理,相对于仅利用空洞周围的纹理进行滤波的方案,可以避免空洞填补不完整而导致的伪影和模糊的现象,提高空洞填补质量,进而可以提高自由视点视频的图像质量。Using the above-mentioned free-viewpoint video playback processing unit, and reconstructing the complete background texture map of the virtual viewpoint, the texture map of the virtual viewpoint corresponding to the synthesized target video frame is subjected to hole-filling post-processing. The scheme of filtering the texture based on the filter can avoid the artifacts and blurring caused by incomplete hole filling, improve the quality of hole filling, and then improve the image quality of free-view video.
在具体实施中,所述虚拟视点视频播放处理装置中各单元可以采用前述自由视点视频重建方法中相应步骤的具体方法示例、具体方式等进行实施,具体可以参见前述实施例详述。In a specific implementation, each unit in the virtual viewpoint video playback processing device can be implemented by using the specific method examples and specific manners of the corresponding steps in the aforementioned free viewpoint video reconstruction method, and details can be found in the foregoing embodiments.
在本说明书实施例中,所述虚拟视点视频重建装置、所述虚拟视点视频播放处理装置等各具体单元可以由软件、硬件或者软硬件结合的方式实施。In the embodiments of this specification, each specific unit such as the virtual viewpoint video reconstruction apparatus, the virtual viewpoint video playback processing apparatus, etc. may be implemented by software, hardware, or a combination of software and hardware.
参照图20所示的电子设备的结构示意图,在本说明书一些实施例中,如图20所示,电子设备200可以包括存储器201和处理器202,所述存储器201上存储有可在所述处理 器202上运行的计算机指令,其中,所述处理器运行所述计算机指令时可以执行前述任一实施例所述方法的步骤。Referring to the schematic structural diagram of the electronic device shown in FIG. 20 , in some embodiments of the present specification, as shown in FIG. 20 , the electronic device 200 may include a memory 201 and a processor 202 . computer instructions running on the processor 202, wherein, when the processor executes the computer instructions, the steps of the method described in any of the foregoing embodiments can be performed.
基于所述电子设备在整个视频处理系统所处位置,所述电子设备还可以包括其他的电子部件或组件。Based on the location of the electronic device in the entire video processing system, the electronic device may also include other electronic components or assemblies.
参照图21所示的另一种电子设备的结构示意图,在本说明书另一些实施例中,如图21所示,电子设备210可以包括通信组件211、处理器212和显示组件213,其中:Referring to the schematic structural diagram of another electronic device shown in FIG. 21, in other embodiments of this specification, as shown in FIG. 21, the electronic device 210 may include a communication component 211, a processor 212, and a display component 213, wherein:
所述通信组件211,适于获取自由视点视频;The communication component 211 is adapted to obtain free-view video;
所述处理器212,适于执行前述任一实施例所述方法的步骤;The processor 212 is adapted to execute the steps of the method in any of the foregoing embodiments;
所述显示组件213,适于显示所述处理器处理后得到的虚拟视点的重建图像。The display component 213 is adapted to display the reconstructed image of the virtual viewpoint obtained after processing by the processor.
在具体实施中,显示组件213具体可以是显示器、触摸屏、投影仪等其中一种或多种。In a specific implementation, the display component 213 may specifically be one or more of a display, a touch screen, a projector, and the like.
在具体实施中,通信组件211和显示组件213等可以为设置在所述电子设备210内部的组件,也可以为通过扩展接口、扩展坞、扩展线等扩展组件连接的外接设备。In a specific implementation, the communication component 211 and the display component 213 may be components disposed inside the electronic device 210, or may be external devices connected through expansion components such as expansion interfaces, docking stations, and expansion cables.
在具体实施中,所述处理器212可以通过中央处理器(Central Processing Unit,CPU)(例如单核处理器、多核处理器)、CPU组、图形处理器(Graphics Processing Unit,GPU)、人工智能(Artificial Intelligence,AI)芯片、现场可编程门阵列(Field Programmable Gate Array,FPGA)芯片等其中任意一种或多种协同实施。In a specific implementation, the processor 212 can use a central processing unit (Central Processing Unit, CPU) (such as a single-core processor, a multi-core processor), a CPU group, a graphics processing unit (Graphics Processing Unit, GPU), artificial intelligence (Artificial Intelligence, AI) chip, Field Programmable Gate Array (Field Programmable Gate Array, FPGA) chip, etc. any one or more of them are implemented collaboratively.
在本说明书一些实施例中,电子设备中的存储器、处理器、通信组件和显示组件之间可以通过总线网络进行通信。In some embodiments of this specification, the memory, the processor, the communication component and the display component in the electronic device may communicate through a bus network.
为使本领域技术人员更好地理解和实施,以下以一个具体的应用场景进行说明。参照图22所示的视频处理系统的结构示意图,如图22所示,为一种应用场景中视频处理系统的结构示意图,其中,示出了一场篮球赛的数据处理系统的布置场景,所述视频处理系统A0包括由多个采集设备组成的采集阵列A1、数据处理设备A2、云端的服务器集群A3、播放控制设备A4,播放终端A5和交互终端A6。For better understanding and implementation by those skilled in the art, a specific application scenario is described below. Referring to the schematic structural diagram of the video processing system shown in FIG. 22, as shown in FIG. 22, it is a schematic structural diagram of the video processing system in an application scenario, wherein the layout scene of the data processing system of a basketball game is shown, so The video processing system A0 includes a collection array A1 composed of multiple collection devices, a data processing device A2, a server cluster A3 in the cloud, a playback control device A4, a playback terminal A5 and an interactive terminal A6.
参照图21,以左侧的篮球框作为核心看点,以核心看点为圆心,与核心看点位于同一平面的扇形区域作为预设的多角度自由视角范围。所述采集阵列A1中各采集设备可以根据所述预设的多角度自由视角范围,成扇形置于现场采集区域不同位置,可以分别从相应角度实时同步采集视频数据流。Referring to FIG. 21 , the basketball hoop on the left is used as the core point of view, the core point of view is the center of the circle, and the fan-shaped area on the same plane as the core point of view is used as the preset multi-angle free viewing angle range. According to the preset multi-angle free viewing angle range, each acquisition device in the acquisition array A1 can be placed in a fan shape at different positions in the on-site acquisition area, and can synchronously acquire video data streams from corresponding angles in real time.
在具体实施中,采集设备还可以设置在篮球场馆的顶棚区域、篮球架上等。各采集设备可以沿直线、扇形、弧线、圆形或者不规则形状排列分布。具体排列方式可以根据具体的现场环境、采集设备数量、采集设备的特点、成像效果需求等一种或多种因素进行设置。所述采集设备可以是任何具有摄像功能的设备,例如,普通的摄像机、手机、专业摄像机等。In a specific implementation, the collection device may also be arranged in the ceiling area of the basketball stadium, on the basketball hoop, and the like. The collection devices can be arranged and distributed along a straight line, a fan shape, an arc line, a circle or an irregular shape. The specific arrangement can be set according to one or more factors such as the specific on-site environment, the number of acquisition devices, the characteristics of the acquisition devices, and imaging effect requirements. The collection device may be any device with a camera function, such as a common camera, a mobile phone, a professional camera, and the like.
而为了不影响采集设备工作,所述数据处理设备A2可以置于现场非采集区域,可视为现场服务器。所述数据处理设备A2可以通过无线局域网向所述采集阵列A1中各采集设 备分别发送拉流指令,所述采集阵列A1中各采集设备基于所述数据处理设备A2发送的拉流指令,将获得的视频数据流实时传输至所述数据处理设备A3。其中,所述采集阵列A1中各采集设备可以通过交换机A7将获得的视频数据流实时传输至所述数据处理设备A2。采集阵列A1和交换机A7一起形成采集系统。In order not to affect the work of the acquisition device, the data processing device A2 can be placed in a non-acquisition area on site, and can be regarded as an on-site server. The data processing device A2 may send a stream pull instruction to each acquisition device in the acquisition array A1 through a wireless local area network, respectively, and each acquisition device in the acquisition array A1 will obtain a stream based on the stream pull instruction sent by the data processing device A2. The video data stream is transmitted to the data processing device A3 in real time. Wherein, each acquisition device in the acquisition array A1 can transmit the obtained video data stream to the data processing device A2 in real time through the switch A7. The acquisition array A1 and the switch A7 together form an acquisition system.
当所述数据处理设备A2接收到视频帧截取指令时,从接收到的多路视频数据流中对指定帧时刻的视频帧截取得到多个同步视频帧的帧图像,并将获得的所述指定帧时刻的多个同步视频帧上传至云端的服务器集群A3。When the data processing device A2 receives the video frame interception instruction, it intercepts the video frame at the specified frame moment from the received multi-channel video data stream to obtain frame images of multiple synchronized video frames, and uses the obtained specified Multiple synchronized video frames at frame moments are uploaded to the server cluster A3 in the cloud.
相应地,云端的服务器集群A3将接收的多个同步视频帧的原始纹理图作为图像组合,确定所述图像组合相应的参数数据及所述图像组合中各原始纹理图对应的原始深度图,并基于所述图像组合相应的参数数据、所述图像组合中纹理图的像素数据和对应深度图的深度数据,基于获取到的虚拟视点进行图像拼接,获得相应的多角度自由视角视频数据。Correspondingly, the server cluster A3 in the cloud uses the received original texture maps of multiple synchronized video frames as an image combination, determines the parameter data corresponding to the image combination and the original depth map corresponding to each original texture map in the image combination, and Based on the corresponding parameter data of the image combination, the pixel data of the texture map and the depth data of the corresponding depth map in the image combination, image stitching is performed based on the acquired virtual viewpoint to obtain corresponding multi-angle free-view video data.
服务器可以置于云端,并且为了能够更快速地并行处理数据,可以按照处理数据的不同,由多个不同的服务器或服务器组组成云端的服务器集群A3。The server can be placed in the cloud, and in order to process data in parallel more quickly, a server cluster A3 in the cloud can be composed of multiple different servers or server groups according to different data processed.
例如,所述云端的服务器集群A3可以包括:第一云端服务器A31,第二云端服务器A32,第三云端服务器A33,第四云端服务器A34。其中,第一云端服务器A31可以用于确定所述图像组合相应的参数数据;第二云端服务器A32可以用于确定所述图像组合中各视点的原始纹理图的估计深度图以及进行深度图校正处理;第三云端服务器A33可以根据虚拟视点的位置信息,基于所述图像组合相应的参数数据、所述图像组合的纹理图和深度图,使用基于深度图的虚拟视点重建(Depth Image Based Rendering,DIBR)算法,进行帧图像重建,得到虚拟视点的图像;所述第四云端服务器A34可以用于生成自由视点视频(多角度自由视角视频)。For example, the cloud server cluster A3 may include: a first cloud server A31, a second cloud server A32, a third cloud server A33, and a fourth cloud server A34. The first cloud server A31 can be used to determine the corresponding parameter data of the image combination; the second cloud server A32 can be used to determine the estimated depth map of the original texture map of each viewpoint in the image combination and perform depth map correction processing The third cloud server A33 can be based on the position information of the virtual viewpoint, based on the corresponding parameter data of the image combination, the texture map and the depth map of the image combination, use the virtual viewpoint reconstruction based on the depth map (Depth Image Based Rendering, DIBR ) algorithm to reconstruct frame images to obtain images of virtual viewpoints; the fourth cloud server A34 can be used to generate free viewpoint videos (multi-angle free viewpoint videos).
可以理解的是,所述第一云端服务器A31、第二云端服务器A32、第三云端服务器A33、第四云端服务器A34也可以为服务器阵列或服务器子集群组成的服务器组,本发明实施例不做限制。It can be understood that the first cloud server A31, the second cloud server A32, the third cloud server A33, and the fourth cloud server A34 may also be a server group composed of a server array or a server sub-cluster, which is not required in this embodiment of the present invention. limit.
然后,播放控制设备A4可以将接收到的自由视点视频帧插入待播放视频流中,播放终端A5接收来自所述播放控制设备A4的待播放视频流并进行实时播放。其中,播放控制设备A4可以为人工播放控制设备,也可以为虚拟播放控制设备。在具体实施中,可以设置专门的可以自动切换视频流的服务器作为虚拟播放控制设备进行数据源的控制。导播控制设备如导播台可以作为本发明实施例中的一种播放控制设备A4。Then, the playback control device A4 can insert the received free-view video frame into the to-be-played video stream, and the playback terminal A5 receives the to-be-played video stream from the playback control device A4 and plays it in real time. The playback control device A4 may be a manual playback control device or a virtual playback control device. In a specific implementation, a dedicated server that can automatically switch video streams can be set up as a virtual playback control device to control the data source. A broadcast director control device, such as a broadcast director station, can be used as a playback control device A4 in the embodiment of the present invention.
交互设备A6可以基于用户交互,进行自由视点视频的播放。The interaction device A6 can play free-view video based on user interaction.
可以理解的是,所述采集阵列A1中各采集设备与所述数据处理设备A2之间可以通过交换机A7和/或局域网进行连接,播放终端A5、交互终端A6数量均可以是一个或多个,所述播放终端A5与所述交互终端A6可以为同一终端设备,所述数据处理设备A2可以根据具体情景置于现场非采集区域或云端,所述服务器集群A3和播放控制设备A4可以根据 具体情景置于现场非采集区域,云端或者终端接入侧,本实施例并不用于限制本发明的具体实现和保护范围。It can be understood that each acquisition device in the acquisition array A1 and the data processing device A2 can be connected through a switch A7 and/or a local area network, and the number of playback terminals A5 and interactive terminals A6 can be one or more, The playback terminal A5 and the interactive terminal A6 may be the same terminal device, the data processing device A2 may be placed in a non-collection area or in the cloud according to specific scenarios, and the server cluster A3 and playback control device A4 may be based on specific scenarios. It is placed in the non-collection area of the site, on the cloud or terminal access side, and this embodiment is not used to limit the specific implementation and protection scope of the present invention.
本说明书实施例还提供了一种计算机可读存储介质,其上存储有计算机指令,其中,所述计算机指令运行时执行前述任一实施例所述方法的步骤,具体可以参见前述实施例介绍,此处不再赘述。The embodiments of the present specification further provide a computer-readable storage medium on which computer instructions are stored, wherein, when the computer instructions are executed, the steps of the methods described in any of the foregoing embodiments may be performed. For details, reference may be made to the introduction of the foregoing embodiments. It will not be repeated here.
在具体实施中,所述计算机可读存储介质可以是光盘、机械硬盘、固态硬盘等各种适当的可读存储介质。In a specific implementation, the computer-readable storage medium may be various suitable readable storage mediums such as an optical disc, a mechanical hard disk, and a solid-state hard disk.
虽然本说明书实施例披露如上,但本发明并非限定于此。任何本领域技术人员,在不脱离本说明书实施例的精神和范围内,均可作各种更动与修改,因此本发明的保护范围应当以权利要求所限定的范围为准。Although the embodiments of the present specification are disclosed as above, the present invention is not limited thereto. Any person skilled in the art can make various changes and modifications without departing from the spirit and scope of the embodiments of this specification. Therefore, the protection scope of the present invention should be based on the scope defined by the claims.

Claims (17)

  1. 一种自由视点视频重建方法,其中,包括:A free-viewpoint video reconstruction method, comprising:
    获取自由视点视频帧,所述视频帧包括同步的多个原始视点的原始纹理图和对应视点的原始深度图;Obtaining a free-view video frame, the video frame includes the original texture maps of multiple original viewpoints and the original depth maps of the corresponding viewpoints;
    获取虚拟视点对应的目标视频帧;Obtain the target video frame corresponding to the virtual viewpoint;
    采用所述目标视频帧中多个原始视点的原始纹理图和对应的原始深度图,合成所述虚拟视点的纹理图;Using the original texture maps of multiple original viewpoints and the corresponding original depth maps in the target video frame to synthesize the texture maps of the virtual viewpoints;
    获取所述目标视频帧对应视点的背景纹理图和背景深度图,并根据所述对应视点的背景纹理图和背景深度图,获取所述虚拟视点的背景纹理图;Obtain the background texture map and the background depth map of the viewpoint corresponding to the target video frame, and obtain the background texture map of the virtual viewpoint according to the background texture map and the background depth map of the corresponding viewpoint;
    采用所述虚拟视点的背景纹理图,对所述虚拟视点的纹理图中的空洞区域进行空洞填补后处理,得到所述虚拟视点的重建图像。Using the background texture map of the virtual viewpoint, post-processing is performed on the void area in the texture map of the virtual viewpoint to obtain a reconstructed image of the virtual viewpoint.
  2. 根据权利要求1所述的方法,其中,所述获取所述目标视频帧对应视点的背景纹理图和背景深度图,包括:The method according to claim 1, wherein the acquiring the background texture map and the background depth map of the viewpoint corresponding to the target video frame comprises:
    选择所述目标视频帧对应视点的参考纹理图序列和参考深度图序列;selecting the reference texture map sequence and the reference depth map sequence of the viewpoint corresponding to the target video frame;
    对所述参考纹理图序列和参考深度图序列分别进行时域滤波,得到所述目标视频帧对应视点的背景纹理图和背景深度图。Temporal filtering is performed on the reference texture map sequence and the reference depth map sequence, respectively, to obtain a background texture map and a background depth map of the viewpoint corresponding to the target video frame.
  3. 根据权利要求2所述的方法,其中,所述对所述参考纹理图序列和参考深度图序列分别进行时域滤波,得到所述目标视频帧对应视点的背景纹理图和背景深度图,包括:The method according to claim 2, wherein the performing temporal filtering on the reference texture map sequence and the reference depth map sequence respectively to obtain a background texture map and a background depth map of the viewpoint corresponding to the target video frame, comprising:
    对所述参考纹理图序列和参考深度图序列中的像素分别进行时域中值滤波,得到所述目标视频帧对应视点的背景纹理图和背景深度图。Temporal median filtering is performed on the pixels in the reference texture map sequence and the reference depth map sequence, respectively, to obtain a background texture map and a background depth map of the viewpoint corresponding to the target video frame.
  4. 根据权利要求1所述的方法,其中,所述采用所述目标视频帧中多个原始视点的原始纹理图和对应的原始深度图,合成所述虚拟视点的纹理图,包括:The method according to claim 1, wherein, using the original texture maps and corresponding original depth maps of multiple original viewpoints in the target video frame to synthesize the texture maps of the virtual viewpoints, comprising:
    基于所述虚拟视点,按照预设规则选择所述目标视频帧中相应原始视点的原始纹理图和对应的原始深度图;Based on the virtual viewpoint, select the original texture map and the corresponding original depth map of the corresponding original viewpoint in the target video frame according to a preset rule;
    采用所选择的相应原始视点的原始纹理图和对应的原始深度图,合成所述虚拟视点的纹理图。Using the selected original texture map of the corresponding original viewpoint and the corresponding original depth map, the texture map of the virtual viewpoint is synthesized.
  5. 根据权利要求4所述的方法,其中,所述获取所述目标视频帧对应视点的背景纹理图和背景深度图,包括:The method according to claim 4, wherein the acquiring a background texture map and a background depth map of the viewpoint corresponding to the target video frame comprises:
    获取与所选择的原始视点相应的参考纹理图序列和参考深度图序列;Obtain a reference texture map sequence and a reference depth map sequence corresponding to the selected original viewpoint;
    对所述参考纹理图序列和参考深度图序列分别进行时域滤波,得到所选择的相应原始视点的背景纹理图和背景深度图。Temporal filtering is performed on the reference texture map sequence and the reference depth map sequence, respectively, to obtain a background texture map and a background depth map of the selected corresponding original viewpoint.
  6. 根据权利要求1所述的方法,其中,所述获取所述目标视频帧对应视点的背景纹理图和背景深度图,包括:The method according to claim 1, wherein the acquiring the background texture map and the background depth map of the viewpoint corresponding to the target video frame comprises:
    预先采集所述目标视频帧所针对的视场中对应视点不存在前景对象的背景纹理图;Pre-collecting a background texture map in which there is no foreground object at the corresponding viewpoint in the field of view targeted by the target video frame;
    根据所述目标视频帧所针对的视场中对应视点不存在前景对象的背景纹理图,获取对应视点的背景深度图。The background depth map of the corresponding viewpoint is acquired according to the background texture map in which the corresponding viewpoint does not have a foreground object in the field of view targeted by the target video frame.
  7. 根据权利要求1至6任一项所述的方法,其中,所述采用所述虚拟视点的背景纹理图,对所述虚拟视点的纹理图中的空洞区域进行空洞填补后处理,得到所述虚拟视点的重建图像,包括:The method according to any one of claims 1 to 6, wherein the background texture map of the virtual viewpoint is used to perform post-filling post-processing on a hollow area in the texture map of the virtual viewpoint to obtain the virtual viewpoint. Reconstructed images of viewpoints, including:
    采用所述虚拟视点的背景纹理图,使用联合双边滤波方法对所述虚拟视点的纹理图中的空洞区域进行插值处理,得到所述虚拟视点的重建图像。The background texture map of the virtual viewpoint is used, and a joint bilateral filtering method is used to perform interpolation processing on the hollow area in the texture map of the virtual viewpoint to obtain a reconstructed image of the virtual viewpoint.
  8. 根据权利要求1至6任一项所述的方法,其中,在对所述虚拟视点的纹理图中的空洞区域进行空洞填补后处理之后,得到所述虚拟视点的重建图像之前,还包括:The method according to any one of claims 1 to 6, wherein after the post-processing of hole filling is performed on the hole region in the texture map of the virtual viewpoint, and before the reconstructed image of the virtual viewpoint is obtained, the method further comprises:
    对空洞填补后处理后得到的虚拟视点的纹理图中的前景边缘进行滤波处理,以得到所述虚拟视点的重建图像。Filtering is performed on the foreground edge in the texture map of the virtual viewpoint obtained after the post-processing of hole filling, so as to obtain the reconstructed image of the virtual viewpoint.
  9. 一种自由视点视频播放处理方法,其中,包括:A free-viewpoint video playback processing method, comprising:
    确定虚拟视点,根据虚拟视点,确定目标视频帧;Determine the virtual viewpoint, and determine the target video frame according to the virtual viewpoint;
    采用所述目标视频帧中多个原始视点的原始纹理图和对应的原始深度图,合成所述虚拟视点的纹理图;Using the original texture maps of multiple original viewpoints and the corresponding original depth maps in the target video frame to synthesize the texture maps of the virtual viewpoints;
    获取所述目标视频帧对应视点的背景纹理图和背景深度图,并根据所述对应视点的背景纹理图和背景深度图,获取所述虚拟视点的背景纹理图;Obtain the background texture map and the background depth map of the viewpoint corresponding to the target video frame, and obtain the background texture map of the virtual viewpoint according to the background texture map and the background depth map of the corresponding viewpoint;
    采用所述虚拟视点的背景纹理图,对所述虚拟视点的纹理图中的空洞区域进行空洞填补后处理,得到所述虚拟视点的重建图像。Using the background texture map of the virtual viewpoint, post-processing is performed on the void area in the texture map of the virtual viewpoint to obtain a reconstructed image of the virtual viewpoint.
  10. 根据权利要求9所述的方法,其中,所述确定虚拟视点,包括以下至少一种:The method according to claim 9, wherein the determining a virtual viewpoint comprises at least one of the following:
    响应于用户交互行为,确定虚拟视点;determining a virtual viewpoint in response to user interaction;
    基于视频流中包含的虚拟视点位置信息,确定虚拟视点。The virtual viewpoint is determined based on the virtual viewpoint position information contained in the video stream.
  11. 根据权利要求9所述的方法,其中,还包括:The method of claim 9, further comprising:
    获取所述虚拟视点的重建图像中的虚拟渲染目标对象;acquiring the virtual rendering target object in the reconstructed image of the virtual viewpoint;
    获取基于所述虚拟渲染目标对象的增强现实特效输入数据所生成的虚拟信息图像;acquiring a virtual information image generated based on the augmented reality special effect input data of the virtual rendering target object;
    将所述虚拟信息图像与所述虚拟视点的重建图像进行合成处理并展示。The virtual information image and the reconstructed image of the virtual viewpoint are synthesized and displayed.
  12. 根据权利要求11所述的方法,其中,所述获取基于所述虚拟渲染目标对象的增强现实特效输入数据所生成的虚拟信息图像,包括:The method according to claim 11, wherein the acquiring the virtual information image generated based on the augmented reality special effect input data of the virtual rendering target object comprises:
    根据三维标定得到的所述虚拟渲染目标对象在所述虚拟视点的重建图像中的位置,得到与所述虚拟渲染目标对象位置匹配的虚拟信息图像。According to the position of the virtual rendering target object in the reconstructed image of the virtual viewpoint obtained by 3D calibration, a virtual information image matching the position of the virtual rendering target object is obtained.
  13. 一种自由视点视频重建装置,其中,包括:A free-viewpoint video reconstruction device, comprising:
    视频帧获取单元,适于获取自由视点视频帧,所述视频帧包括同步的多个原始视点的原始纹理图和对应视点的原始深度图;a video frame acquisition unit, adapted to acquire a free-view video frame, the video frame including the original texture maps of multiple original viewpoints and the original depth maps of the corresponding viewpoints;
    目标视频帧确定单元,适于获取虚拟视点对应的目标视频帧;a target video frame determination unit, adapted to obtain the target video frame corresponding to the virtual viewpoint;
    虚拟视点纹理图合成单元,适于采用所述目标视频帧中多个原始视点的原始纹理图和对应的原始深度图,合成所述虚拟视点的纹理图;a virtual viewpoint texture map synthesis unit, adapted to use the original texture maps and corresponding original depth maps of multiple original viewpoints in the target video frame to synthesize the texture maps of the virtual viewpoints;
    虚拟视点背景纹理图合成单元,适于获取所述目标视频帧对应视点的背景纹理图和背景深度图,并根据所述对应视点的背景纹理图和背景深度图,获取所述虚拟视点的背景纹理图;A virtual viewpoint background texture map synthesis unit, adapted to obtain the background texture map and background depth map of the viewpoint corresponding to the target video frame, and obtain the background texture of the virtual viewpoint according to the background texture map and background depth map of the corresponding viewpoint picture;
    后处理单元,适于采用所述虚拟视点的背景纹理图,对所述虚拟视点的纹理图中的空洞区域进行空洞填补后处理,得到所述虚拟视点的重建图像。The post-processing unit is adapted to use the background texture map of the virtual view point to perform post-processing of hole filling on the hole area in the texture map of the virtual view point to obtain a reconstructed image of the virtual view point.
  14. 一种自由视点视频播放处理装置,其中,包括:A free-view video playback processing device, comprising:
    虚拟视点确定单元,适于确定虚拟视点;a virtual viewpoint determination unit, adapted to determine a virtual viewpoint;
    目标视频帧确定单元,适于根据所述虚拟视点,确定目标视频帧;a target video frame determination unit, adapted to determine a target video frame according to the virtual viewpoint;
    虚拟视点纹理图合成单元,适于采用所述目标视频帧中多个原始视点的原始纹理图和对应的原始深度图,合成所述虚拟视点的纹理图;a virtual viewpoint texture map synthesis unit, adapted to use the original texture maps and corresponding original depth maps of multiple original viewpoints in the target video frame to synthesize the texture maps of the virtual viewpoints;
    虚拟视点背景纹理图合成单元,适于获取所述目标视频帧对应视点的背景纹理图和背景深度图,并根据所述对应视点的背景纹理图和背景深度图,获取所述虚拟视点的背景纹理图;A virtual viewpoint background texture map synthesis unit, adapted to obtain the background texture map and background depth map of the viewpoint corresponding to the target video frame, and obtain the background texture of the virtual viewpoint according to the background texture map and background depth map of the corresponding viewpoint picture;
    后处理单元,适于采用所述虚拟视点的背景纹理图,对所述虚拟视点的纹理图中的空洞区域进行空洞填补后处理,得到所述虚拟视点的重建图像。The post-processing unit is adapted to use the background texture map of the virtual view point to perform post-processing of hole filling on the hole area in the texture map of the virtual view point to obtain a reconstructed image of the virtual view point.
  15. 一种电子设备,包括存储器和处理器,所述存储器上存储有可在所述处理器上运行的计算机指令,其中,所述处理器运行所述计算机指令时执行权利要求1至8或权利要求9至12任一项所述方法的步骤。An electronic device comprising a memory and a processor, the memory storing computer instructions executable on the processor, wherein the processor executes claims 1 to 8 or claims when the processor executes the computer instructions The steps of any one of 9 to 12.
  16. 一种电子设备,包括:通信组件、处理器和显示组件,其中:An electronic device comprising: a communication component, a processor and a display component, wherein:
    所述通信组件,适于获取自由视点视频;the communication component, adapted to obtain free-view video;
    所述处理器,适于执行权利要求1至8或权利要求9至12任一项所述方法的步骤;the processor, adapted to perform the steps of the method of any one of claims 1 to 8 or claims 9 to 12;
    所述显示组件,适于显示所述处理器处理后得到的虚拟视点的重建图像。The display component is adapted to display the reconstructed image of the virtual viewpoint obtained after processing by the processor.
  17. 一种计算机可读存储介质,其上存储有计算机指令,其中,所述计算机指令运行时执行权利要求1至8或权利要求9至12任一项所述方法的步骤。A computer-readable storage medium having computer instructions stored thereon, wherein the computer instructions, when executed, perform the steps of the method of any one of claims 1 to 8 or claims 9 to 12.
PCT/CN2021/108827 2020-07-31 2021-07-28 Free viewpoint video reconstruction and playing processing method, device, and storage medium WO2022022548A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010759861.3A CN114071115A (en) 2020-07-31 2020-07-31 Free viewpoint video reconstruction and playing processing method, device and storage medium
CN202010759861.3 2020-07-31

Publications (1)

Publication Number Publication Date
WO2022022548A1 true WO2022022548A1 (en) 2022-02-03

Family

ID=80037141

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/108827 WO2022022548A1 (en) 2020-07-31 2021-07-28 Free viewpoint video reconstruction and playing processing method, device, and storage medium

Country Status (2)

Country Link
CN (1) CN114071115A (en)
WO (1) WO2022022548A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101771893A (en) * 2010-01-05 2010-07-07 浙江大学 Video frequency sequence background modeling based virtual viewpoint rendering method
CN104618797A (en) * 2015-02-06 2015-05-13 腾讯科技(北京)有限公司 Information processing method and device and client
WO2017201751A1 (en) * 2016-05-27 2017-11-30 北京大学深圳研究生院 Hole filling method and device for virtual viewpoint video or image, and terminal
US20180214777A1 (en) * 2015-07-24 2018-08-02 Silver Curve Games, Inc. Augmented reality rhythm game
CN109361913A (en) * 2015-05-18 2019-02-19 韩国电子通信研究院 For providing the method and apparatus of 3-D image for head-mounted display
CN110602476A (en) * 2019-08-08 2019-12-20 南京航空航天大学 Hole filling method of Gaussian mixture model based on depth information assistance

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110660131B (en) * 2019-09-24 2022-12-27 宁波大学 Virtual viewpoint hole filling method based on deep background modeling

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101771893A (en) * 2010-01-05 2010-07-07 浙江大学 Video frequency sequence background modeling based virtual viewpoint rendering method
CN104618797A (en) * 2015-02-06 2015-05-13 腾讯科技(北京)有限公司 Information processing method and device and client
CN109361913A (en) * 2015-05-18 2019-02-19 韩国电子通信研究院 For providing the method and apparatus of 3-D image for head-mounted display
US20180214777A1 (en) * 2015-07-24 2018-08-02 Silver Curve Games, Inc. Augmented reality rhythm game
WO2017201751A1 (en) * 2016-05-27 2017-11-30 北京大学深圳研究生院 Hole filling method and device for virtual viewpoint video or image, and terminal
CN110602476A (en) * 2019-08-08 2019-12-20 南京航空航天大学 Hole filling method of Gaussian mixture model based on depth information assistance

Also Published As

Publication number Publication date
CN114071115A (en) 2022-02-18

Similar Documents

Publication Publication Date Title
US10650574B2 (en) Generating stereoscopic pairs of images from a single lens camera
US11197038B2 (en) Systems and methods for synchronizing surface data management operations for virtual reality
US7573475B2 (en) 2D to 3D image conversion
CN113099204B (en) Remote live-action augmented reality method based on VR head-mounted display equipment
WO2022002181A1 (en) Free viewpoint video reconstruction method and playing processing method, and device and storage medium
US11501118B2 (en) Digital model repair system and method
CN107274469A (en) The coordinative render method of Virtual reality
JP6778163B2 (en) Video synthesizer, program and method for synthesizing viewpoint video by projecting object information onto multiple surfaces
US10699749B2 (en) Methods and systems for customizing virtual reality data
WO2017128887A1 (en) Method and system for corrected 3d display of panoramic image and device
Lee et al. Free viewpoint video (FVV) survey and future research direction
US11417060B2 (en) Stereoscopic rendering of virtual 3D objects
WO2022022501A1 (en) Video processing method, apparatus, electronic device, and storage medium
US20220174257A1 (en) Videotelephony with parallax effect
TW202029742A (en) Image synthesis
CN106780759A (en) Method, device and the VR systems of scene stereoscopic full views figure are built based on picture
WO2022001865A1 (en) Depth map and video processing and reconstruction methods and apparatuses, device, and storage medium
JP2004326179A (en) Image processing device, image processing method, image processing program, and recording medium storing it
WO2022022548A1 (en) Free viewpoint video reconstruction and playing processing method, device, and storage medium
CN114788287A (en) Encoding and decoding views on volumetric image data
BR112021014724A2 (en) APPARATUS FOR RENDERING IMAGES, APPARATUS FOR GENERATING AN IMAGE SIGNAL, METHOD FOR RENDERING IMAGES, METHOD FOR GENERATING AN IMAGE SIGNAL AND COMPUTER PROGRAM PRODUCT
Budd et al. Web delivery of free-viewpoint video of sport events
Li et al. Texture Blending for Photorealistic Composition on Mobile AR Platform
CN114007058A (en) Depth map correction method, video processing method, video reconstruction method and related devices

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21850630

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21850630

Country of ref document: EP

Kind code of ref document: A1